Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: several small fixes and updates #12

Merged
merged 5 commits into from
Apr 20, 2024
Merged

ENH: several small fixes and updates #12

merged 5 commits into from
Apr 20, 2024

Conversation

mbaak
Copy link
Contributor

@mbaak mbaak commented Apr 19, 2024

  • Add functions to set format and storage options of spark dataframes when calling spark namematching save. Example usage: nm_obj.write().format('parquet').options(**options_dict).save(path)
  • Get rid of np.array_split on pd series future deprecation warning
  • Remove any extra whitespace from processed names for n-char indexing.
  • Pass thru correct-match column if present in SparkAggregator; useful for accuracy testing.

…rames

Functions to set format and storage options of spark dataframes when calling spark namematching save.
Example usage:
   nm_obj.write().format('parquet').options(**options_dict).save(path)
@mbaak
Copy link
Contributor Author

mbaak commented Apr 19, 2024

Solves:
#11
#9

For spark, add all missing pass-thru column, i.e. correct match.
@mbaak mbaak changed the title ENH: added functions to set format and storage options of spark dfs ENH: several small fixes and updates Apr 20, 2024
@mbaak mbaak requested a review from sbrugman April 20, 2024 11:03
@mbaak mbaak merged commit 86948b9 into main Apr 20, 2024
4 checks passed
@mbaak mbaak deleted the format_file_option branch April 20, 2024 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants