Describe the bug
The Davis dataset is assumed to contain a unique affinity value for a drug-target pair. However, in TDC, there are duplicated drug-target pairs with different affinity values.
To Reproduce
from tdc.multi_pred import DTI
data = DTI('DAVIS', path='./data/TDC')
df = data.get_data()
df = df.drop(columns=['Drug', 'Target'])
df = df[(df['Drug_ID'] == 25243800) & (df['Target_ID'] == 'RET(V804M)')]
print(df)
Expected behavior
The expected output is given below. Different Y values were labeled for drug 25243800 and target RET(V804M).
Drug_ID Target_ID Y
18196 25243800 RET(V804M) 4.8
18197 25243800 RET(V804M) 4.0
18198 25243800 RET(V804M) 350.0
18199 25243800 RET(V804M) 340.0
Environment:
- TDC version: 0.3.0
davis.tab version on dataverse: 2021-01-09 (UNF:6:x6TTv0Um70rEZT/eL8eCtA==)
Additional context
When compared to the raw data of the Davis et al. paper, it looks like the four affinities values shown above should be assigned to targets RET, RET(M918T), RET(V804L), and RET(V804M), respectively. It seems all target IDs were overwritten by RET(V804M).
Describe the bug
The Davis dataset is assumed to contain a unique affinity value for a drug-target pair. However, in TDC, there are duplicated drug-target pairs with different affinity values.
To Reproduce
Expected behavior
The expected output is given below. Different
Yvalues were labeled for drug25243800and targetRET(V804M).Environment:
davis.tabversion on dataverse: 2021-01-09 (UNF:6:x6TTv0Um70rEZT/eL8eCtA==)Additional context
When compared to the raw data of the Davis et al. paper, it looks like the four affinities values shown above should be assigned to targets
RET,RET(M918T),RET(V804L), andRET(V804M), respectively. It seems all target IDs were overwritten byRET(V804M).