feat: Add interface for multi-instance splitting (closes #126)#127
Conversation
|
Hi Jannis, thanks so much for the quick, excellent and elegant solution! I think it makes lots of sense to replace And regarding the other ToDos, (1) it would be great to add an example in the doc and I will also duplicate it in the website; (2) uni test would be great! After you are done, I will also do some testing on the other dataloaders that use the |
|
Thanks Kexin for the quick feedback. I think the PR is ready for review!
|
|
Thank you @jannisborn ! Looking great! I will review it this week and will keep you posted! |
|
Hey @jannisborn thanks again for the commit! I go through the code and conduct several tests. They all worked perfectly! Merging it now |
|
Also, just added an example on the website! https://tdcommons.ai/functions/data_split/ |
Addressing #126 (support for multi-instance splitting).
This is a draft PR, please give some feedback @kexinhuang12345
Done:
Following your suggestion, I implemented a method called
create_fold_setting_cold_multithat splits on multiple instance. I tried to follow the package conventions in coding style. The code logic is to iterate over all entities on which the split should be done (e.g.,['Drug_ID', 'Cell Line_ID']) and randomly assign entity-instances to test. The test-set is created by selecting samples where all entity instances belong to test. This is then repeated for validation.The proposed method is strictly more powerful than the current
create_fold_setting_cold. Indeed, if a single entity is passed for splitting the result is always identical. I verified like this for multiple configurations.Therefore, I wanted to suggest to replace the current
create_fold_setting_coldwith the body of the newcreate_fold_setting_cold_multi. This would be perfectly backwards compatible. The only difference would be that for the positionalentityargument, instead of astryou could also pass a List of strings. The beauty of this solution is that it would avoid duplicate code snippets and require minimal updates in the existing codebase. Let me know what you thinkToDo:
create_fold_setting_coldis superseded bycreate_fold_setting_cold_multi