Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support strict split for drug synergy prediction task #136

Closed
ZOE-V opened this issue Jan 20, 2022 · 1 comment
Closed

Support strict split for drug synergy prediction task #136

ZOE-V opened this issue Jan 20, 2022 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@ZOE-V
Copy link
Contributor

ZOE-V commented Jan 20, 2022

Describe the bug
The current splitting strategies we can use on the available datasets for drug synergy prediction tasks are limited to:

  • the classic random split
  • combination split, such that test samples contain drug pairs unseen to the model after training
  • cold split which can only split by instances of one modality, but not on multiple modalities, such that test samples contain either drugs or cell lines unseen to the model after training
    Hence, OncoPolyPharmacology and DrugComb datasets cannot be split such that test samples contain drugs and cell lines that are both unseen to the model after training.

Additional information
This issue was already addressed for the drug response prediction task by @jannisborn: [https://github.com//issues/126].

Idea of solution
A functionality like for the drug response prediction task. For example:
split = synergy_data.get_split(method = 'cold_split', column_names = ['Drug1_ID', 'Drug2_ID', 'Cell_Line_ID'])
(Which currently yields to AttributeError: Please select from random_split, or cold_split, if cold split. please specify the column name!)
I believe that the issue is related to the conditions on line 96 in https://github.com/mims-harvard/TDC/blob/main/tdc/multi_pred/multi_pred_dataset.py. I suggest to replace lines 96-97-98 by something like:

#cast to list if needed
if isinstance(column_name, str):
    column_name = [column_name]
    
#test whether all columns are present in df
if (column_name is not None) and all([x in df.columns.values for x in column_names]): 
    if method == 'cold_split':		
	    return create_fold_setting_cold(df, seed, frac, column_name)

Do you believe this would be a valuable contribution to the package?

Thank you for your help.

Bests,

Zoe

@kexinhuang12345
Copy link
Collaborator

Hi Zoe! I think this would be great! I would suggest editing to something Jannis made to the bi_pred_dataset.py class:

elif method == 'cold_split':
if (
column_name is None or
not all(list(map(lambda x: x in df.columns.values, column_name)))
):
raise AttributeError(
"For cold_split, please provide one or multiple column names "
"that are contained in the dataframe."
)
return create_fold_setting_cold(df, seed, frac, column_name)

Feel free to make a PR! Thanks in advance!

@kexinhuang12345 kexinhuang12345 added the bug Something isn't working label Jan 21, 2022
ZOE-V added a commit to ZOE-V/TDC that referenced this issue Jan 21, 2022
Editing to suggest a way to solve issue raised in mims-harvard#136.
@kexinhuang12345 kexinhuang12345 self-assigned this Jan 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants