-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use percent slices for splits #883
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks.
Codecov Report
@@ Coverage Diff @@
## master #883 +/- ##
==========================================
- Coverage 95.65% 95.59% -0.07%
==========================================
Files 21 21
Lines 2256 2247 -9
==========================================
- Hits 2158 2148 -10
- Misses 98 99 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff! Thanks!
bliss/simulator/simulated_dataset.py
Outdated
for idx in self.val_split_file_idxs: | ||
filename = f"{self.file_prefix}_{idx}.pt" | ||
self.valid += self.read_file(f"{self.cached_data_path}/{filename}") | ||
def pct_to_idx(self, x, length): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we just have this be percent_to_idx
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
filename = f"{self.file_prefix}_{idx}.pt" | ||
self.test += self.read_file(f"{self.cached_data_path}/{filename}") | ||
def parse_slices(self, splits: str, length: int): | ||
slices = [slice(0, 0) for _ in range(3)] # default to empty slice for each split |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we default to 100% for train?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only time I can see that being needed is if a user passes in an empty string for splits
, which imo should be considered bad input. In that case I think failing is the right result, instead of silently setting it to 100%.
bliss/simulator/simulated_dataset.py
Outdated
def parse_slices(self, splits: str, length: int): | ||
slices = [slice(0, 0) for _ in range(3)] # default to empty slice for each split | ||
for i, data_split in enumerate(splits.split("/")): | ||
# map "start_pct:stop_pct" to slice(start_idx, stop_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: let's use _percent
here (for readability)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Fixes #880. Parsing full slices ("0:80/80:90/90:100") was easier than relative values ("80/10/10") so I just went with this for now, since it allows for more flexibility anyway. Not sure if we need more sophisticated input validation - let me know if you think I should add it.