🚧 Add `drug_interactions` #12

hayesall · 2021-11-30T19:35:18Z

This modifies the ddi data on the starling page to pass the linting checks.

Here's a sample of the original facts:

Target("3-hydroxy-3-methylglutaryl-coenzyme_A_reductase","Pravastatin").
Target("Gamma-aminobutyric_acid_type_B_receptor_subunit_1","Baclofen").
Target("Gamma-aminobutyric_acid_type_B_receptor_subunit_2","Baclofen").
Target("Synaptic_vesicular_amine_transporter","Amphetamine").
Target("Sodium-dependent_dopamine_transporter","Amphetamine").
Target("Cocaine-_and_amphetamine-regulated_transcript_protein","Amphetamine").
Target("Trace_amine-associated_receptor_1","Amphetamine").

and here is what it looks like following the changes:

target(3_hydroxy_3_methylglutaryl_coenzyme_a_reductase,pravastatin).
target(gamma_aminobutyric_acid_type_b_receptor_subunit_1,baclofen).
target(gamma_aminobutyric_acid_type_b_receptor_subunit_2,baclofen).
target(synaptic_vesicular_amine_transporter,amphetamine).
target(sodium_dependent_dopamine_transporter,amphetamine).
target(cocaine__and_amphetamine_regulated_transcript_protein,amphetamine).
target(trace_amine_associated_receptor_1,amphetamine).

This was done by replacing -, ", and / in the drug names with underscores _ and converting everything to lowercase:

corrected = [name.replace("-", "_").replace('"', "").replace("/", "").lower() for name in names]

As a precaution, I recorded each "corrected" value to a dictionary, and would throw an error if the same key was mapped to two different values in the original set:

{
 "3_hydroxy_3_methylglutaryl_coenzyme_a_reductase": [
  "\"3-hydroxy-3-methylglutaryl-coenzyme_A_reductase\""
 ],
 "pravastatin": [
  "\"Pravastatin\""
 ],
 "gamma_aminobutyric_acid_type_b_receptor_subunit_1": [
  "\"Gamma-aminobutyric_acid_type_B_receptor_subunit_1\""
 ],
 "baclofen": [
  "\"Baclofen\""
 ]
}

I was worried there might be cases like: Warfarin/other and Warfarin_other that would get mapped into the same bucket, but this did not occur and structures should be equivalent up to renaming.

The code I used to do this is copied below, but it isn't interesting enough to commit to the repository:

Python script to clean DDI data to pass linter

from collections import defaultdict


def load_file(filename):
    with open(filename, "r") as fh:
        return fh.read().splitlines()

def split_into_parts(input_line):
    head, tail = input_line.split("(")
    first, _ = tail.split(")")
    names = first.split(",")

    correct_head = head.lower()
    corrected = [name.replace("-", "_").replace('"', "").replace("/", "").lower() for name in names]

    return names, correct_head, corrected

if __name__ == "__main__":

    mapping = defaultdict(set)
    output = []

    for line in load_file("drug_interactions/train/train_facts.txt"):

        values, correct_head, corrected = split_into_parts(line)

        # Assert that a "new" key doesn't map to two "old" keys.
        for old, new in zip(values, corrected):
            mapping[new].add(old)

            if len(mapping[b]) > 1:
                print("Encountered duplicate")
                print(mapping[b])
                exit(2)

        result = f"{correct_head}({','.join([a for a in corrected])})."
        output.append(result)

    with open("../ddi2/ddi2/train/train_facts.txt", "w") as fh:
        for line in output:
            fh.write(line + "\n")

hayesall added 4 commits August 27, 2021 17:25

🚧 Add drug_interactions

051af07

🚚 Move background.txt for ddi

9563e39

♻️ Rework ddi data to pass data linting

f5a658c

📝 Add dataset notes for DDI

4e7ef34

hayesall merged commit e6e41ec into main Nov 30, 2021

hayesall deleted the ddi branch November 30, 2021 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚧 Add `drug_interactions` #12

🚧 Add `drug_interactions` #12

Uh oh!

hayesall commented Nov 30, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🚧 Add drug_interactions #12

🚧 Add drug_interactions #12

Uh oh!

Conversation

hayesall commented Nov 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🚧 Add `drug_interactions` #12

🚧 Add `drug_interactions` #12

hayesall commented Nov 30, 2021 •

edited

Loading