Skip to content

Same drug-target pair has different affinities in Davis #98

@luoyunan

Description

@luoyunan

Describe the bug
The Davis dataset is assumed to contain a unique affinity value for a drug-target pair. However, in TDC, there are duplicated drug-target pairs with different affinity values.

To Reproduce

from tdc.multi_pred import DTI
data = DTI('DAVIS', path='./data/TDC')
df = data.get_data()
df = df.drop(columns=['Drug', 'Target'])
df = df[(df['Drug_ID'] == 25243800) & (df['Target_ID'] == 'RET(V804M)')]
print(df)

Expected behavior
The expected output is given below. Different Y values were labeled for drug 25243800 and target RET(V804M).

        Drug_ID   Target_ID      Y
18196  25243800  RET(V804M)    4.8
18197  25243800  RET(V804M)    4.0
18198  25243800  RET(V804M)  350.0
18199  25243800  RET(V804M)  340.0

Environment:

  • TDC version: 0.3.0
  • davis.tab version on dataverse: 2021-01-09 (UNF:6:x6TTv0Um70rEZT/eL8eCtA==)

Additional context
When compared to the raw data of the Davis et al. paper, it looks like the four affinities values shown above should be assigned to targets RET, RET(M918T), RET(V804L), and RET(V804M), respectively. It seems all target IDs were overwritten by RET(V804M).

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions