Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deterministic group_shuffle_split #1839

Merged
merged 14 commits into from
Feb 28, 2024
Merged

Conversation

nilsleh
Copy link
Collaborator

@nilsleh nilsleh commented Jan 31, 2024

Sets are unordered and therefore, repeated calls were yielding different train and val sizes for Cyclone dataset.

@github-actions github-actions bot added the datamodules PyTorch Lightning datamodules label Jan 31, 2024
@adamjstewart adamjstewart added this to the 0.5.2 milestone Jan 31, 2024
@adamjstewart
Copy link
Collaborator

Let's find a way to test this so the same bug doesn't happen again.

@isaaccorley
Copy link
Collaborator

@nilsleh Would love to get this in the 0.5.2 release!

@github-actions github-actions bot added the testing Continuous integration testing label Feb 26, 2024
@nilsleh
Copy link
Collaborator Author

nilsleh commented Feb 26, 2024

@nilsleh Would love to get this in the 0.5.2 release!

Thanks for the reminder :)

@nilsleh
Copy link
Collaborator Author

nilsleh commented Feb 26, 2024

@adamjstewart and @isaaccorley not sure how to simulate repeated calls to the function after restarting script/kernel, so I thought separate processes might be a way to go, but actually not sure

@adamjstewart
Copy link
Collaborator

@adamjstewart and @isaaccorley not sure how to simulate repeated calls to the function after restarting script/kernel, so I thought separate processes might be a way to go, but actually not sure

This feels like overkill. Depending on the size of our fake dataset, can we just run the test once, print the order, then hardcode that in the test code? As long as it is always the same, it's deterministic.

@adamjstewart adamjstewart changed the title Fix deterministc group_shuffle_split Fix deterministic group_shuffle_split Feb 26, 2024
adamjstewart
adamjstewart previously approved these changes Feb 28, 2024
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better, thanks!

tests/datamodules/test_utils.py Outdated Show resolved Hide resolved
@adamjstewart adamjstewart enabled auto-merge (squash) February 28, 2024 09:48
@adamjstewart adamjstewart merged commit 54b4ddc into microsoft:main Feb 28, 2024
24 checks passed
@nilsleh nilsleh deleted the group_split branch February 28, 2024 10:06
isaaccorley pushed a commit that referenced this pull request Mar 2, 2024
* order sets

* suggestion

* add unit test

* fix

* updated test

* fix

* indices from file

* test util update

* path

* no file

* no file

* comment

* i cannot spell
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datamodules PyTorch Lightning datamodules testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants