Use percent slices for splits #883

aakashdp6548 · 2023-07-18T20:35:55Z

Fixes #880. Parsing full slices ("0:80/80:90/90:100") was easier than relative values ("80/10/10") so I just went with this for now, since it allows for more flexibility anyway. Not sure if we need more sophisticated input validation - let me know if you think I should add it.

jeff-regier

Nice! Thanks.

codecov · 2023-07-19T13:27:11Z

Codecov Report

Merging #883 (8c73c60) into master (355f5db) will decrease coverage by 0.07%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #883      +/-   ##
==========================================
- Coverage   95.65%   95.59%   -0.07%     
==========================================
  Files          21       21              
  Lines        2256     2247       -9     
==========================================
- Hits         2158     2148      -10     
- Misses         98       99       +1

Flag	Coverage Δ
unittests	`95.59% <100.00%> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
bliss/api.py	`87.17% <100.00%> (-0.14%)`	⬇️
bliss/simulator/simulated_dataset.py	`91.20% <100.00%> (-1.23%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

zhixiangteoh

Good stuff! Thanks!

zhixiangteoh · 2023-07-19T13:36:17Z

bliss/simulator/simulated_dataset.py

-        for idx in self.val_split_file_idxs:
-            filename = f"{self.file_prefix}_{idx}.pt"
-            self.valid += self.read_file(f"{self.cached_data_path}/{filename}")
+    def pct_to_idx(self, x, length):


Nit: can we just have this be percent_to_idx?

zhixiangteoh · 2023-07-19T13:54:08Z

bliss/simulator/simulated_dataset.py

-            filename = f"{self.file_prefix}_{idx}.pt"
-            self.test += self.read_file(f"{self.cached_data_path}/{filename}")
+    def parse_slices(self, splits: str, length: int):
+        slices = [slice(0, 0) for _ in range(3)]  # default to empty slice for each split


Should we default to 100% for train?

The only time I can see that being needed is if a user passes in an empty string for splits, which imo should be considered bad input. In that case I think failing is the right result, instead of silently setting it to 100%.

zhixiangteoh · 2023-07-19T13:54:47Z

bliss/simulator/simulated_dataset.py

+    def parse_slices(self, splits: str, length: int):
+        slices = [slice(0, 0) for _ in range(3)]  # default to empty slice for each split
+        for i, data_split in enumerate(splits.split("/")):
+            # map "start_pct:stop_pct" to slice(start_idx, stop_idx)


Nit: let's use _percent here (for readability)

use percent slices for splits

d24d71e

aakashdp6548 requested review from jeff-regier and zhixiangteoh July 18, 2023 20:35

aakashdp6548 marked this pull request as ready for review July 18, 2023 20:36

jeff-regier approved these changes Jul 18, 2023

View reviewed changes

Update api.test_train_on_cached_data to use splits

3b608a3

zhixiangteoh approved these changes Jul 19, 2023

View reviewed changes

Change pct -> percent

8c73c60

aakashdp6548 merged commit 05dfd35 into master Jul 19, 2023
3 checks passed

aakashdp6548 deleted the data-splits branch July 19, 2023 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use percent slices for splits #883

Use percent slices for splits #883

aakashdp6548 commented Jul 18, 2023 •

edited

Loading

jeff-regier left a comment

codecov bot commented Jul 19, 2023 •

edited

Loading

zhixiangteoh left a comment

zhixiangteoh Jul 19, 2023

aakashdp6548 Jul 19, 2023

zhixiangteoh Jul 19, 2023

aakashdp6548 Jul 19, 2023

zhixiangteoh Jul 19, 2023

aakashdp6548 Jul 19, 2023

Use percent slices for splits #883

Use percent slices for splits #883

Conversation

aakashdp6548 commented Jul 18, 2023 • edited Loading

jeff-regier left a comment

Choose a reason for hiding this comment

codecov bot commented Jul 19, 2023 • edited Loading

Codecov Report

zhixiangteoh left a comment

Choose a reason for hiding this comment

zhixiangteoh Jul 19, 2023

Choose a reason for hiding this comment

aakashdp6548 Jul 19, 2023

Choose a reason for hiding this comment

zhixiangteoh Jul 19, 2023

Choose a reason for hiding this comment

aakashdp6548 Jul 19, 2023

Choose a reason for hiding this comment

zhixiangteoh Jul 19, 2023

Choose a reason for hiding this comment

aakashdp6548 Jul 19, 2023

Choose a reason for hiding this comment

aakashdp6548 commented Jul 18, 2023 •

edited

Loading

codecov bot commented Jul 19, 2023 •

edited

Loading