# FibROAD 
a manually curated resource for multi-omics level evidence integration of fibrosis research

### Step 1: Reformat the data files to the BioMedGPS format

Go to the subfolder `Gene-Disease`, `Gene-Drug` and reformat all data files in each subfolder.

You should see a main.ipynb file in each subfolder. Open the main.ipynb file and run the codes to reformat the data files to the BioMedGPS format.

### Step 2: Merge all formatted data files into one file

In [53]:
import os
import pandas as pd

files = [
    "./Gene_Disease/formatted_fibroad_Disease.tsv",
    "./Gene_Drug/formatted_fibroad_drug.tsv",
]

merged = pd.DataFrame()
for file in files:
    if not os.path.exists(file):
        raise Exception(f"File {file} does not exist")
    
    d = pd.read_csv(file, sep="\t")
    d.drop(columns=[c for c in d.columns if c.startswith("ttd_")], inplace=True)

    merged = pd.concat([merged, d], ignore_index=True)

merged.to_csv("formatted_fibroad.tsv", sep="\t", index=False)

In [54]:
merged

Unnamed: 0,source_name,source_type,source_id,target_name,target_type,target_id,relation_type,resource,pmid
0,ABCB4,Gene,ENTREZ:5244,fibrotic liver disease,Disease,MONDO:0100430,GNBR::D::Gene:Disease,FIBROAD,31071368.0
1,ABCB4,Gene,ENTREZ:5244,fibrosis of bile duct,Disease,MONDO:0041959,GNBR::D::Gene:Disease,FIBROAD,16223543.0
2,ABCB4,Gene,ENTREZ:5244,fibrotic liver disease,Disease,MONDO:0100430,GNBR::D::Gene:Disease,FIBROAD,33340584.0
3,ABCB4,Gene,ENTREZ:5244,fibrotic liver disease,Disease,MONDO:0100430,GNBR::D::Gene:Disease,FIBROAD,21868490.0
4,ACE,Gene,ENTREZ:1636,endomyocardial fibrosis,Disease,MONDO:0006746,GNBR::G::Gene:Disease,FIBROAD,11425779.0
...,...,...,...,...,...,...,...,...,...
1303,PARP1,Gene,ENTREZ:142,Nicotinamide,Compound,DrugBank:DB02701,DGIDB::BINDER::Gene:Compound,FIBROAD,
1304,PARP1,Gene,ENTREZ:142,Olaparib,Compound,DrugBank:DB09074,DGIDB::INHIBITOR::Gene:Compound,FIBROAD,
1305,PARP1,Gene,ENTREZ:142,Rucaparib,Compound,DrugBank:DB12332,DGIDB::ANTAGONIST::Gene:Compound,FIBROAD,
1306,PARP1,Gene,ENTREZ:142,Niraparib,Compound,DrugBank:DB11793,DGIDB::ANTAGONIST::Gene:Compound,FIBROAD,


In [55]:
import pandas as pd

# We assume the entity file is already generated and placed in the ROOT_DIR/graph_data/entities.tsv. The ROOT_DIR is the root directory of the BioMedGPS Data Repository.
entity_file = "/Users/zhuzhixing/KG/biomedgps-data/graph_data/entities.tsv"

entity_df = pd.read_csv(entity_file, sep="\t", low_memory=False)

In [56]:
entity_df

Unnamed: 0,id,label,name,description,resource,synonyms,pmids,taxid,xrefs
0,CLO:0000000,CellLine,cell line cell culturing,a maintaining cell culture process that keeps ...,CLO,,,,
1,CLO:0000001,CellLine,cell line cell,A cultured cell that is part of a cell line - ...,CLO,,,,
2,CLO:0000002,CellLine,suspension cell line culturing,suspension cell line culturing is a cell line ...,CLO,,,,
3,CLO:0000003,CellLine,adherent cell line culturing,adherent cell line culturing is a cell line cu...,CLO,,,,
4,CLO:0000004,CellLine,cell line cell modification,a material processing that modifies an existin...,CLO,,,,
...,...,...,...,...,...,...,...,...,...
936596,WikiPathways:WP88,Pathway,Toll-like receptor signaling,Toll-like receptors (TLRs) are a class of prot...,WikiPathways,,,10090.0,
936597,WikiPathways:WP89,Pathway,FAS pathway and stress induction of HSP regula...,This pathway describes the Fas induced apoptos...,WikiPathways,,,10116.0,
936598,WikiPathways:WP93,Pathway,IL-4 signaling pathway,,WikiPathways,,,10090.0,
936599,WikiPathways:WP94,Pathway,Hepatocyte growth factor receptor signaling,Signaling pathway of the Hepatocyte Growth Fac...,WikiPathways,,,10116.0,


In [57]:
entity_df[entity_df["id"] == "MONDO:0100430"]

Unnamed: 0,id,label,name,description,resource,synonyms,pmids,taxid,xrefs
739181,MONDO:0100430,Disease,fibrotic liver disease,A liver disease characterized by the presence ...,Mondo,hepatic fibrosis (disease)|liver fibrosis (dis...,,,MONDO:0100430


In [60]:
merged["relation_type"].unique()

array(['GNBR::D::Gene:Disease', 'GNBR::G::Gene:Disease',
       'GNBR::J::Gene:Disease', 'DGIDB::INHIBITOR::Gene:Compound',
       'DGIDB::AGONIST::Gene:Compound', 'DGIDB::MODULATOR::Gene:Compound',
       'DGIDB::ANTAGONIST::Gene:Compound', 'DGIDB::BINDER::Gene:Compound',
       'DGIDB::ANTIBODY::Gene:Compound', 'DGIDB::OTHER::Gene:Compound',
       'DGIDB::ACTIVATOR::Gene:Compound',
       'CTD::decreases^expression::Compound:Gene',
       'DGIDB::PARTIAL AGONIST::Gene:Compound',
       'CTD::increases^cleavage::Compound:Gene',
       'PrimeKG::target::Gene:Compound'], dtype=object)

In [61]:
import os
import os.path as osp
import subprocess


def format_fibroad(filename):
    def get_project_root():
        try:
            return osp.dirname(osp.dirname(os.getcwd()))
        except Exception as e:
            raise RuntimeError(f"Failed to determine project root: {e}")

    try:
        root_dir = get_project_root()
        print(f"Project root directory: {root_dir}")
    except RuntimeError as e:
        print(e)
        exit(1)

    database = "customdb"
    relations_path = osp.join(
        root_dir,
        "relations",
        "FibROAD",
        filename,
    )
    output_dir = osp.join(
        root_dir, "formatted_relations", "FibROAD"
    )
    entities_path = osp.join(root_dir, "entities.tsv")
    log_file = osp.join(output_dir, "log.txt")
    relation_types_file = osp.join(root_dir, "relation_types.tsv")

    command = [
        "graph-builder",
        "--database",
        database,
        "-d",
        relations_path,
        "-o",
        output_dir,
        "-f",
        entities_path,
        "-n",
        "20",
        "--download",
        "--skip",
        "-l",
        log_file,
        "--debug",
        "--relation-type-dict-fpath",
        relation_types_file,
    ]

    print("Executing command:", " ".join(command))

    try:
        subprocess.run(command, check=True)
    except FileNotFoundError:
        print(
            "Error: 'graph-builder' command not found. Make sure it is installed and available in the PATH."
        )
        exit(1)
    except subprocess.CalledProcessError as e:
        print(f"Error: Command execution failed with return code {e.returncode}")
        print(f"Output: {e.output}")
        exit(1)
    except Exception as e:
        print(f"Unexpected error: {e}")
        exit(1)

In [62]:
format_fibroad("formatted_fibroad.tsv")

Project root directory: /Users/zhuzhixing/KG/biomedgps-data/graph_data
Executing command: graph-builder --database customdb -d /Users/zhuzhixing/KG/biomedgps-data/graph_data/relations/FibROAD/formatted_fibroad.tsv -o /Users/zhuzhixing/KG/biomedgps-data/graph_data/formatted_relations/FibROAD -f /Users/zhuzhixing/KG/biomedgps-data/graph_data/entities.tsv -n 20 --download --skip -l /Users/zhuzhixing/KG/biomedgps-data/graph_data/formatted_relations/FibROAD/log.txt --debug --relation-type-dict-fpath /Users/zhuzhixing/KG/biomedgps-data/graph_data/relation_types.tsv


2025-04-02 11:29:54 - cli:171 - INFO - Run jobs with (output_dir: /Users/zhuzhixing/KG/biomedgps-data/graph_data/formatted_relations/FibROAD, db file/directory: /Users/zhuzhixing/KG/biomedgps-data/graph_data/relations/FibROAD/formatted_fibroad.tsv, databases: ('customdb',), download: True, skip: True)
2025-04-02 11:29:57 - base_parser:229 - INFO - Using allow_ignore_checking_errors=all to ignore the checking errors.
2025-04-02 11:29:57 - customdb_parser:104 - INFO - Get 1308 relations
2025-04-02 11:29:57 - base_parser:478 - INFO - Found 1308 relations.
2025-04-02 11:29:57 - base_parser:789 - INFO - Found entity id map file, skip to generate it. If you want to regenerate it, please delete the file: /Users/zhuzhixing/KG/biomedgps-data/graph_data/formatted_relations/FibROAD/customdb.entity_id_map.json
2025-04-02 11:29:57 - base_parser:480 - INFO - Found 233 entity ids in entity id map.
2025-04-02 11:29:57 - base_parser:494 - INFO - The number of relations before dropna: 1308
2025-04-02 11