Clear Feathr UDF state and configuration template in work directory #557
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The first issue
In the existing code, we didn't remove
generated_feathr_pyspark_metadata
when building the features. This is problematic in a few end users, in particular: If UDFs are defined and then are removed, since this file is not cleared, the code will still think there are UDFs, which will either yield wrong results, or exist incorrectly.This PR makes sure that we remove
generated_feathr_pyspark_metadata
and a few UDF files every time users build features.The second issue (#559 )
This is an issue which isn't very obvious. Sometimes after running
get_offline_features
, then runningmaterialize_features
API, thematerialize_features
API will not be successful, and in many cases there's no values in the online store such as Redis.This only happens when using databricks.
This is caused by the fact that if the databricks configuration is not a string (i.e. end users use a dict to provide all the required configurations), then there's a line in the code
submission_params = self.config_template
Since self.config_template is a dict, this is actually a reference rather than a copy of
self.config_template
. In the code later,submission_params
will be modified and the value will be carried over across jobs, which will cause different jobs share the same state, and will cause unexpected behaviors.Other issues
This PR also fixes a few OS compatibility issues (when parsing paths we always assume it's Linux style which isn't true), and fix a few typos.