Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Set divisions for proc_cols directly from original dataset #2187

Merged
merged 5 commits into from
Jun 23, 2022

Conversation

jeffreyftang
Copy link
Collaborator

@jeffreyftang jeffreyftang commented Jun 23, 2022

Depending on how the dataframe is originally read, it may or may not have known divisions. In the former case, attempting to perform the "manual join" in DaskEngine with certain feature types (image, audio) will fail due to proc_cols not having known divisions to match.

This PR simply sets the divisions of each proc_col directly from the originating dataset.

Copy link
Collaborator

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test that repros the issue without this fix.

@github-actions
Copy link

github-actions bot commented Jun 23, 2022

Unit Test Results

       6 files  ±  0         6 suites  ±0   2h 4m 32s ⏱️ - 21m 31s
2 839 tests +  7  2 805 ✔️ +  7    34 💤 ±0  0 ±0 
8 517 runs  +21  8 411 ✔️ +21  106 💤 ±0  0 ±0 

Results for commit 481f352. ± Comparison against base commit 038dbc5.

♻️ This comment has been updated with latest results.

@jeffreyftang jeffreyftang merged commit cf9060e into master Jun 23, 2022
@jeffreyftang jeffreyftang deleted the divisions branch June 23, 2022 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants