-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Store] Block dataframe as source when running remotely #1517
Conversation
mlrun/feature_store/api.py
Outdated
@@ -386,6 +394,10 @@ def ingest( | |||
"featureset.spec.engine must be set to 'spark' to ingest with spark" | |||
) | |||
if featureset.spec.engine == "spark": | |||
if isinstance(source, pd.DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? What about simple ingest with spark?
https://docs.mlrun.org/en/latest/feature-store/transformations.html#using-spark-execution-engine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with a df as a source it would fail (and not with a clear error)
mlrun/feature_store/api.py
Outdated
@@ -393,7 +393,7 @@ def ingest( | |||
raise mlrun.errors.MLRunInvalidArgumentError( | |||
"featureset.spec.engine must be set to 'spark' to ingest with spark" | |||
) | |||
if featureset.spec.engine == "spark": | |||
if featureset.spec.engine == "spark" and run_config is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But now it'll just continue to a non-spark ingest.
It has to be added in the if isinstance in the next line.
mlrun/feature_store/api.py
Outdated
@@ -311,6 +315,10 @@ def ingest( | |||
|
|||
if mlrun_context: | |||
# extract ingestion parameters from mlrun context | |||
if isinstance(source, pd.DataFrame): | |||
raise mlrun.errors.MLRunInvalidArgumentError( | |||
"DataFrame source is illegal when running ingest with mlrun context" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"with mlrun context" won't mean anything to a user, explain it in their words
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mlrun context is a parameter that the user passes to ingest function
try: | ||
df = fs.ingest( | ||
stocks_set, | ||
stocks, | ||
infer_options=fs.InferOptions.default(), | ||
run_config=fs.RunConfig(local=True), | ||
) | ||
assert False | ||
except mlrun.errors.MLRunInvalidArgumentError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way to assert something is failed is with pytest.raises(...
we use it a lot, search the code base
"measurements", entities=[fs.Entity("name")], engine="spark", | ||
) | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
https://jira.iguazeng.com/browse/ML-1346