Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Store] Block dataframe as source when running remotely #1517

Merged
merged 8 commits into from
Dec 7, 2021

Conversation

katyakats
Copy link
Contributor

@katyakats katyakats commented Nov 24, 2021

@@ -386,6 +394,10 @@ def ingest(
"featureset.spec.engine must be set to 'spark' to ingest with spark"
)
if featureset.spec.engine == "spark":
if isinstance(source, pd.DataFrame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@katyakats katyakats Nov 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with a df as a source it would fail (and not with a clear error)

@@ -393,7 +393,7 @@ def ingest(
raise mlrun.errors.MLRunInvalidArgumentError(
"featureset.spec.engine must be set to 'spark' to ingest with spark"
)
if featureset.spec.engine == "spark":
if featureset.spec.engine == "spark" and run_config is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But now it'll just continue to a non-spark ingest.
It has to be added in the if isinstance in the next line.

@katyakats katyakats closed this Nov 30, 2021
@katyakats katyakats reopened this Nov 30, 2021
@katyakats katyakats closed this Dec 2, 2021
@katyakats katyakats reopened this Dec 2, 2021
@@ -311,6 +315,10 @@ def ingest(

if mlrun_context:
# extract ingestion parameters from mlrun context
if isinstance(source, pd.DataFrame):
raise mlrun.errors.MLRunInvalidArgumentError(
"DataFrame source is illegal when running ingest with mlrun context"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"with mlrun context" won't mean anything to a user, explain it in their words

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlrun context is a parameter that the user passes to ingest function

Comment on lines 86 to 95
try:
df = fs.ingest(
stocks_set,
stocks,
infer_options=fs.InferOptions.default(),
run_config=fs.RunConfig(local=True),
)
assert False
except mlrun.errors.MLRunInvalidArgumentError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to assert something is failed is with pytest.raises(... we use it a lot, search the code base

"measurements", entities=[fs.Entity("name")], engine="spark",
)

try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@katyakats katyakats closed this Dec 6, 2021
@katyakats katyakats reopened this Dec 6, 2021
@katyakats katyakats closed this Dec 7, 2021
@katyakats katyakats reopened this Dec 7, 2021
@Hedingber Hedingber changed the title [Feature store] - block dataframe as source when running remotely [Feature Store] Block dataframe as source when running remotely Dec 7, 2021
@Hedingber Hedingber merged commit ad3580a into mlrun:development Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants