Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_splits_seq val_num and test_num #11

Closed
sebastianffx opened this issue Aug 11, 2020 · 2 comments
Closed

create_splits_seq val_num and test_num #11

sebastianffx opened this issue Aug 11, 2020 · 2 comments

Comments

@sebastianffx
Copy link

First, thanks for making such a nice work public!
More than an issue I have just the doubt about what exactly should be the val_num and test_num in the create_splits_seq file,
are the indexes for fixing some of the WSI as val and test? or are their ID's?
Since they are required, I've just set them as val_num, test_num = (1,2),(3,4), and it seems that the partitions are created OK.
Thanks!

@fedshyvana
Copy link
Collaborator

fedshyvana commented Aug 11, 2020

Hi Sebastian, val_num and test_num are supposed to be tuples (or arrays) of integers representing the number of WSIs (or cases) you want to sample into the validation set and test set respectively, for each class you have. e.g. if you have 3 classes, val_num = (8, 5, 20) means you want to draw 8 WSIs (or cases) from the 1st class into the validations et, 5 from the 2nd class and 20 from the 3rd class.
I say WSIs or cases because if you set patient_strat = True in the dataset constructor, the splits are created at a patient/case-level instead of the WSI-level, and all WSIs from the same case are drawn together if that case is drawn (if you have multiple WSIs for that case). If patient_strat = False, then each WSI is treated as a single case.
You can also automatically calculate the number of cases to draw if you want lets say, 10% of cases in the validation set and 10% of cases in the test set. This can be done by using for example:

 num_slides_cls = np.array([len(cls_ids) for cls_ids in dataset.patient_cls_ids])
 val_num = np.floor(num_slides_cls * 0.1).astype(int)
 test_num = np.floor(num_slides_cls * 0.1).astype(int)

after the dataset construction and before the if name == 'main' body of the code.
and you should check the partitions after they're created, in the splits folder. There should also be a description file, letting you know how many cases from each class were sampled into each split.

@sebastianffx
Copy link
Author

Perfect, this is clear now, Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants