## DGIdb

You can download the data from [https://www.dgidb.org/downloads](https://www.dgidb.org/downloads)

In the era of clinical sequencing and personalized medicine, investigators are frequently presented with lists of mutated or otherwise altered genes implicated in disease for a specific patient or cohort. Numerous resources exist to help form hypotheses about how such genomic events might be targeted therapeutically. However, utilizing these resources typically involves tedious manual review of literature, clinical trial records, and knowledgebases. Few currently exist which collect and curate these resources and provide a simple interface for searching lists of genes against the existing compendia of known or potential drug-gene interactions. The drug-gene interaction database (DGIdb) attempts to address this challenge. Using a combination of **expert curation** and **text-mining**, drug-gene interactions have been mined from DrugBank, PharmGKB, ChEMBL, Drug Target Commons, and others. Genes have also been categorized as potentially druggable according to membership in selected pathways, molecular functions and gene families from the Gene Ontology, the Human Protein Atlas, IDG, "druggable genome" lists from Hopkins and Groom (2002) and Russ and Lampel (2005), and others. Drug and gene grouping is provided by the VICC Gene and Therapy Normalizer services. DGIdb contains over 10,000 genes and 20,000 drugs involved in nearly 70,000 drug-gene interactions or belonging to one of 43 potentially druggable gene categories.

### Download the data

In [1]:
!wget https://www.dgidb.org/data/latest/interactions.tsv -O interactions.tsv
!wget https://www.dgidb.org/data/latest/genes.tsv -O genes.tsv
!wget https://www.dgidb.org/data/latest/drugs.tsv -O drugs.tsv
!wget https://www.dgidb.org/data/latest/categories.tsv -O categories.tsv

--2024-10-02 22:49:55--  https://www.dgidb.org/data/latest/interactions.tsv
Resolving www.dgidb.org (www.dgidb.org)... 52.36.252.244
Connecting to www.dgidb.org (www.dgidb.org)|52.36.252.244|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12178745 (12M) [application/octet-stream]
Saving to: ‘interactions.tsv’


2024-10-02 22:50:15 (604 KB/s) - ‘interactions.tsv’ saved [12178745/12178745]

--2024-10-02 22:50:16--  https://www.dgidb.org/data/latest/genes.tsv
Resolving www.dgidb.org (www.dgidb.org)... 52.36.252.244
Connecting to www.dgidb.org (www.dgidb.org)|52.36.252.244|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4356295 (4.2M) [application/octet-stream]
Saving to: ‘genes.tsv’


2024-10-02 22:50:22 (758 KB/s) - ‘genes.tsv’ saved [4356295/4356295]

--2024-10-02 22:50:22--  https://www.dgidb.org/data/latest/drugs.tsv
Resolving www.dgidb.org (www.dgidb.org)... 52.36.252.244
Connecting to www.dgidb.org (www.dgidb.org)|52.36.252.244|:44

### Reformat the data files as the BioMedGPS format

More details on the data format can be found [here](https://open-prophetdb.github.io/biomedgps-data/graph_data_index/#knowledge-graph-file).

**Examples:**

| relation_type                  | resource | source_id | source_type | target_id   | target_type | source_name                    | target_name |
|--------------------------------|----------|-----------|-------------|-------------|-------------|--------------------------------|-------------|
| DGIDB::INHIBITOR::Gene:Compound| DGIDB    | ENTREZ:4311 | Gene        | MESH:D015244| Compound    | membrane metalloendopeptidase  | Thiorphan   |
| DGIDB::INHIBITOR::Gene:Compound| DGIDB    | ENTREZ:4311 | Gene        | MESH:C097292| Compound    | membrane metalloendopeptidase  | aladotrilat |

**NOTE:**
> You can know more about the relation types from the [DGIdb website](https://www.dgidb.org/about#interaction-types).


In [None]:
import os
import os.path as osp
import subprocess


def format_dgidb(filename):
    def get_project_root():
        try:
            return osp.dirname(osp.dirname(os.getcwd()))
        except Exception as e:
            raise RuntimeError(f"Failed to determine project root: {e}")

    try:
        root_dir = get_project_root()
        print(f"Project root directory: {root_dir}")
    except RuntimeError as e:
        print(e)
        exit(1)

    database = "customdb"
    relations_path = osp.join(
        root_dir,
        "relations",
        "dgidb",
        filename,
    )
    output_dir = osp.join(root_dir, "formatted_relations", "dgidb")
    entities_path = osp.join(root_dir, "entities.tsv")
    log_file = osp.join(output_dir, "log.txt")

    command = [
        "graph-builder",
        "--database",
        database,
        "-d",
        relations_path,
        "-o",
        output_dir,
        "-f",
        entities_path,
        "-n",
        "20",
        "--download",
        "--skip",
        "-l",
        log_file,
        "--debug",
    ]

    print("Executing command:", " ".join(command))

    try:
        subprocess.run(command, check=True)
    except FileNotFoundError:
        print(
            "Error: 'graph-builder' command not found. Make sure it is installed and available in the PATH."
        )
        exit(1)
    except subprocess.CalledProcessError as e:
        print(f"Error: Command execution failed with return code {e.returncode}")
        print(f"Output: {e.output}")
        exit(1)
    except Exception as e:
        print(f"Unexpected error: {e}")
        exit(1)

In [1]:
import pandas as pd

df = pd.read_csv("interactions.tsv", sep="\t")
df.head()

Unnamed: 0,gene_claim_name,gene_concept_id,gene_name,interaction_source_db_name,interaction_source_db_version,interaction_type,interaction_score,drug_claim_name,drug_concept_id,drug_name,approved,immunotherapy,anti_neoplastic
0,CYP2D6,hgnc:2625,CYP2D6,DTC,9/2/20,,0.017709,RACLOPRIDE,ncit:C152139,RACLOPRIDE,False,False,False
1,PPARG,hgnc:9236,PPARG,DTC,9/2/20,,0.840123,KALOPANAX-SAPONIN F,chembl:CHEMBL1833984,CHEMBL:CHEMBL1833984,False,False,False
2,ATAD5,hgnc:25752,ATAD5,DTC,9/2/20,,0.177992,RO-5-3335,chembl:CHEMBL91609,CHEMBL:CHEMBL91609,False,False,False
3,RGS4,hgnc:10000,RGS4,DTC,9/2/20,,0.034319,"3,4-DICHLOROISOCOUMARIN",drugbank:DB04459,"3,4-DICHLOROISOCOUMARIN",False,False,False
4,MAPK1,hgnc:6871,MAPK1,DTC,9/2/20,,0.050007,WITHAFERIN A,iuphar.ligand:13097,WITHAFERIN A,False,False,False


In [2]:
df["interaction_type"].unique()

array([nan, 'agonist', 'inhibitor', 'activator', 'blocker',
       'immunotherapy', 'antibody', 'modulator', 'negative modulator',
       'positive modulator', 'potentiator', 'cleavage', 'inverse agonist',
       'other/unknown', 'binder', 'vaccine', 'antisense oligonucleotide'],
      dtype=object)

In [3]:
formatted_df = pd.DataFrame()
formatted_df["source_name"] = df["gene_claim_name"]
formatted_df["source_type"] = "Gene"
formatted_df["source_id"] = df["gene_concept_id"]
formatted_df["target_name"] = df["drug_name"]
formatted_df["target_type"] = "Compound"
formatted_df["target_id"] = df["drug_concept_id"]

relation_type_map = {
    "nan": "DGIDB::OTHER::Gene:Compound",
    "agonist": "DGIDB::AGONIST::Gene:Compound",
    "inhibitor": "DGIDB::INHIBITOR::Gene:Compound",
    "activator": "DGIDB::ACTIVATOR::Gene:Compound",
    "blocker": "DGIDB::BLOCKER::Gene:Compound",
    "immunotherapy": "",
    "antibody": "DGIDB::ANTIBODY::Gene:Compound",
    "modulator": "DGIDB::MODULATOR::Gene:Compound",
    "negative modulator": "DGIDB::MODULATOR::Gene:Compound",
    "positive modulator": "DGIDB::POSITIVE ALLOSTERIC MODULATOR::Gene:Compound",
    "potentiator": "DGIDB::AGONIST::Gene:Compound",
    "cleavage": "CTD::increases^cleavage::Compound:Gene",
    "inverse agonist": "DGIDB::ANTAGONIST::Gene:Compound",
    "other/unknown": "DGIDB::OTHER::Gene:Compound",
    "binder": "DGIDB::BINDER::Gene:Compound",
    "vaccine": "",
    "antisense oligonucleotide": "",
}
formatted_df["relation_type"] = df["interaction_type"].str.lower().map(relation_type_map)
formatted_df.head()

Unnamed: 0,source_name,source_type,source_id,target_name,target_type,target_id,relation_type
0,CYP2D6,Gene,hgnc:2625,RACLOPRIDE,Compound,ncit:C152139,
1,PPARG,Gene,hgnc:9236,CHEMBL:CHEMBL1833984,Compound,chembl:CHEMBL1833984,
2,ATAD5,Gene,hgnc:25752,CHEMBL:CHEMBL91609,Compound,chembl:CHEMBL91609,
3,RGS4,Gene,hgnc:10000,"3,4-DICHLOROISOCOUMARIN",Compound,drugbank:DB04459,
4,MAPK1,Gene,hgnc:6871,WITHAFERIN A,Compound,iuphar.ligand:13097,


In [4]:
formatted_df["target_id"].str.split(":").str[0].unique()

array(['ncit', 'chembl', 'drugbank', 'iuphar.ligand', 'rxcui', 'hemonc',
       'wikidata', nan, 'drugsatfda.nda', 'chemidplus'], dtype=object)

In [5]:
formatted_df["source_id"] = formatted_df["source_id"].str.replace("ncbigene:", "ENTREZ:")
formatted_df["source_id"] = formatted_df["source_id"].str.replace("hgnc:", "HGNC:")
formatted_df["target_id"] = formatted_df["target_id"].str.replace("ncit:", "NCIT:")
formatted_df["target_id"] = formatted_df["target_id"].str.replace("chembl:", "CHEMBL:")
formatted_df["target_id"] = formatted_df["target_id"].str.replace("rxcui:", "RXCUI:")
formatted_df["target_id"] = formatted_df["target_id"].str.replace("drugbank:", "DrugBank:")
formatted_df = formatted_df[formatted_df["source_id"].notna()]
formatted_df = formatted_df[formatted_df["target_id"].notna()]
formatted_df = formatted_df[formatted_df["relation_type"].notna()]
formatted_df["resource"] = "DGIDB"

formatted_df

Unnamed: 0,source_name,source_type,source_id,target_name,target_type,target_id,relation_type,resource
8,NCBIGENE:318,Gene,HGNC:8049,ETORPHINE,Compound,NCIT:C80578,DGIDB::AGONIST::Gene:Compound,DGIDB
9,NCBIGENE:1838,Gene,HGNC:3058,COMPOUND 8E [PMID: 24432909],Compound,iuphar.ligand:8137,DGIDB::INHIBITOR::Gene:Compound,DGIDB
10,NCBIGENE:2159,Gene,HGNC:3528,GDC-0339,Compound,iuphar.ligand:12708,DGIDB::INHIBITOR::Gene:Compound,DGIDB
11,NCBIGENE:749,Gene,HGNC:10485,RYANODINE,Compound,iuphar.ligand:4303,DGIDB::ACTIVATOR::Gene:Compound,DGIDB
22,NCBIGENE:358,Gene,HGNC:633,[125I]TYR11-SRIF-14,Compound,iuphar.ligand:2060,DGIDB::AGONIST::Gene:Compound,DGIDB
...,...,...,...,...,...,...,...,...
98206,NCBIGENE:1503,Gene,HGNC:2519,GSK269962A,Compound,iuphar.ligand:8037,DGIDB::INHIBITOR::Gene:Compound,DGIDB
98207,NCBIGENE:1504,Gene,HGNC:2521,COMPOUND 8 [PMID: 25898023],Compound,iuphar.ligand:10498,DGIDB::INHIBITOR::Gene:Compound,DGIDB
98213,NCBIGENE:737,Gene,HGNC:20573,CARBENOXOLONE,Compound,NCIT:C63669,DGIDB::INHIBITOR::Gene:Compound,DGIDB
98223,NCBIGENE:2184,Gene,HGNC:3579,GW5074,Compound,iuphar.ligand:8072,DGIDB::INHIBITOR::Gene:Compound,DGIDB


In [6]:
formatted_df.to_csv("formatted_dgidb.tsv", sep="\t", index=False)

In [None]:
format_dgidb("formatted_dgidb.tsv")