# Databases download

To download the databases, we provide code that will produce the terminal commands required to download most databases. \
After setting up the correct data paths, execute the code cells to download and process the files. \
We provide commands for both UNIX and Windows systems.
### Disclaimer
- The ChEMBL database requires the generation of the `.csv` file from the [ChEMBL website](https://www.ebi.ac.uk/chembl/web_components/explore/activities/STATE_ID:GnsHH1n7OPU1JAjGvKLgvQ==). More information is provided in the respective section below.
- The DrugBank database requires an academic license to download, please refer to the [DrugBank website](https://go.drugbank.com/releases/latest) for further instructions.

In [1]:
from datetime import datetime
import os
import subprocess
import pandas as pd
import collections
import xml.etree.ElementTree as ET

Data paths

In [2]:
# We recommend downloading the databases in the data/ folder
data = "data" 

# Unix systems

In [None]:
commands = f"ls {data}"

# Execute command
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)
print('stderr:', result.stderr)

## BindingDB

BDB usually updates its database every month. The link includes the year and the month. \
If the link returns a 404 error, try manually updating the link with the previous month.

In [None]:
m = datetime.now().strftime('%m')
y = datetime.now().strftime('%Y') 

commands = f"wget -P {data}/ https://www.bindingdb.org/bind/downloads/BindingDB_All_{y}{m}_tsv.zip -O {data}/BindingDB.tsv.zip && \
            unzip {data}/BindingDB.tsv.zip -d {data}/ && rm {data}/BindingDB.tsv.zip && \
            mv {data}/BindingDB*.tsv {data}/BindingDB.tsv"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

## ChEMBL

The ChEMBL database requires the generation of the `.csv` file from the [ChEMBL website](https://www.ebi.ac.uk/chembl/web_components/explore/activities/). In the page, select the *Homo sapiens* target organism in the filtering section on the right. Then, clicking on the csv download button will start the file generation. Once the generation is complete, press the download button. The images below show the step by step procedure to download the file.

![Filtering section button for Homo sapiens](images/ChEMBL-filtering.png)

![CSV button to generate annotations file](images/ChEMBL-csv.png)

![Button to download annotations file](images/ChEMBL-download.png)


The downloaded `.zip` file will contain multiple csv files that will need to be merged

Unzip file

In [None]:
commands = f"mkdir -p {data}/chembl && \
            unzip {data}/ChEMBL.zip -d {data}/chembl && \
            rm {data}/ChEMBL.zip"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

Merge files with the following code

In [18]:
def merge_csv_files(folder_path, output_file):
    
    files = os.listdir(folder_path)
    
    csv_files = [file for file in files if file.endswith('.csv')]
    
    if len(csv_files) == 0:
        print("No CSV files found in the folder.")
        return
    
    merged_df = pd.DataFrame()
    
    for file in csv_files:
        file_path = os.path.join(folder_path, file)
        df = pd.read_csv(file_path)
        merged_df = pd.concat([merged_df, df], ignore_index=True)
    
    merged_df.to_csv(output_file, index=False)

folder_path = f'{data}/chembl'
output_file = f'{data}/ChEMBL.csv'
merge_csv_files(folder_path, output_file)


Remove downloaded data

In [None]:
commands = f"rm -r {data}/chembl/"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)
print('stderr:', result.stderr)

## CTD

In [None]:
commands = f"wget https://ctdbase.org/reports/CTD_chem_gene_ixns.csv.gz -O {data}/CTD.csv.gz"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

# Extract csv using pandas
ctd = pd.read_csv(f"{data}/CTD.csv.gz", compression='gzip', on_bad_lines='skip', skiprows=28)
ctd.to_csv(f"{data}/CTD.csv", sep=',')

Remove downloaded data

In [None]:
commands = f"rm {data}/CTD.csv.gz"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

## Drugbank

The DrugBank database requires an academic license to download, please refer to the [DrugBank website](https://go.drugbank.com/releases/latest) for further instructions.
Once the academic license has been issued to the account, please download the complete database from the [Drugbank download page](https://go.drugbank.com/releases/latest), or use the commands below. \
Then, unzip the file to obtain the `.xml` file. 
This file can be converted to `.csv` using the code provided below. 

![DrugBank complete database download](images/Drugbank.png)

Download the zip file using your profile's data

In [None]:
email = 'YOURUSERNAME'
password = 'YOURPASSWORD'

commands = f"wget --user={email} --password={password} https://go.drugbank.com/releases/latest/downloads/all-full-database -O {data}/drugbank_all_full_database.xml.zip"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

Unzip the file

In [None]:
commands = f"unzip {data}/drugbank_all_full_database.xml.zip -d {data}/ && \
            rm {data}/drugbank_all_full_database.xml.zip"
# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

In [None]:
def collapse_list_values(row):
    for key, value in row.items():
        if isinstance(value, list):
            row[key] = '|'.join(value)
    return row

def xml2csv(file_path, output_file):
    tree = ET.parse(file_path)
    root = tree.getroot()

    ns = '{http://www.drugbank.ca}'
    inchikey_template = "{ns}calculated-properties/{ns}property[{ns}kind='InChIKey']/{ns}value"
    inchi_template = "{ns}calculated-properties/{ns}property[{ns}kind='InChI']/{ns}value"

    rows = list()
    for i, drug in enumerate(root):
        row = collections.OrderedDict()
        assert drug.tag == ns + 'drug'
        row['type'] = drug.get('type')
        row['drugbank-id'] = drug.findtext(ns + "drugbank-id[@primary='true']")
        row['name'] = drug.findtext(ns + "name")
        row['description'] = drug.findtext(ns + "description")
        row['InChIKey'] = drug.findtext(inchikey_template.format(ns = ns))
        rows.append(row)

    rows = list(map(collapse_list_values, rows))

    columns = ['drugbank-id', 'name', 'type', 'InChIKey', 'description']
    drugbank_df = pd.DataFrame.from_dict(rows)[columns]

    protein_rows = list()
    for i, drug in enumerate(root):
        drugbank_id = drug.findtext(ns + "drugbank-id[@primary='true']")
        for category in ['target', 'enzyme', 'carrier', 'transporter']:
            proteins = drug.findall('{ns}{cat}s/{ns}{cat}'.format(ns=ns, cat=category))
            for protein in proteins:
                row = {'drugbank-id': drugbank_id, 'protein_type': category}
                row['protein_name'] = protein.findtext('{}name'.format(ns))
                row['organism'] = protein.findtext('{}organism'.format(ns))
                actions = protein.findall('{ns}actions/{ns}action'.format(ns=ns))
                row['actions'] = '|'.join(action.text for action in actions)
                uniprot_ids = [polypep.text for polypep in protein.findall(
                    "{ns}polypeptide/{ns}external-identifiers/{ns}external-identifier[{ns}resource='UniProtKB']/{ns}identifier".format(ns=ns))]            
                if len(uniprot_ids) == 1:
                    row['uniprot_id'] = uniprot_ids[0]
                hgnc_ids = [polypep.text for polypep in protein.findall(
                    "{ns}polypeptide/{ns}external-identifiers/{ns}external-identifier[{ns}resource='HUGO Gene Nomenclature Committee (HGNC)']/{ns}identifier".format(ns=ns))]            
                if len(hgnc_ids) == 1:
                    row['HGNC'] = hgnc_ids[0]
                protein_rows.append(row)

    protein_df = pd.DataFrame.from_dict(protein_rows)

    drugbank = pd.merge(drugbank_df, protein_df, on='drugbank-id', how='left')

    drugbank.to_csv(output_file, sep=',', index=False)

file_path = f'{data}/full database.xml'
output_file = f'{data}/DB.csv'
xml2csv(file_path, output_file)


Remove downloaded data

In [None]:
commands = f'rm "{data}/full database.xml"'

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

## DrugCentral

In [None]:
commands = f"wget https://unmtid-dbs.net/download/DrugCentral/2021_09_01/drug.target.interaction.tsv.gz -O {data}/DC.tsv.gz && \
            gzip -d {data}/DC.tsv.gz && \
            wget https://unmtid-dbs.net/download/DrugCentral/2021_09_01/structures.smiles.tsv -O {data}/DC_comps.tsv"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

Merge the two files to add SMILES and InChI information for compounds

In [20]:
DC = pd.read_csv(f'{data}/DC.tsv', sep='\t')
DC_comps = pd.read_csv(f'{data}/DC_comps.tsv', sep='\t')

DC_comps = DC_comps[['ID', 'SMILES', 'InChI', 'InChIKey']].rename(columns={'ID':'STRUCT_ID'})

DC = pd.merge(DC, DC_comps, on='STRUCT_ID', how='left')

DC.to_csv(f'{data}/DrugCentral.csv', index=False)

Remove downloaded data

In [None]:
commands = f"rm {data}/DC.tsv && rm {data}/DC_comps.tsv"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

## DTC

In [None]:
commands = f"wget --no-check-certificate https://drugtargetcommons.fimm.fi/static/Excell_files/DTC_data.csv -O {data}/DTC.csv"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

## STITCH

In [None]:
commands = f"wget http://stitch.embl.de/download/protein_chemical.links.detailed.v5.0/9606.protein_chemical.links.detailed.v5.0.tsv.gz -O {data}/STITCH.tsv.gz && \
            gzip -d {data}/STITCH.tsv.gz"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

# Windows systems

In [None]:
commands = f"dir {data}"

# Execute command
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

## BindingDB

BDB usually updates its database every month. The link includes the year and the month. \
If the link returns a 404 error, try manually updating the link with the previous month.

In [11]:
m = datetime.now().strftime('%m')
y = datetime.now().strftime('%Y') 

commands = f"curl -o {data}\\BindingDB.tsv.zip https://www.bindingdb.org/bind/downloads/BindingDB_All_{y}{m}_tsv.zip && \
            tar -xf {data}\\BindingDB.tsv.zip -C {data}\\ && del {data}\\BindingDB.tsv.zip && \
                ren {data}\\BindingDB*.tsv BindingDB.tsv"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

stdout: 
stderr: 


## ChEMBL

The ChEMBL database requires the generation of the `.csv` file from the [ChEMBL website](https://www.ebi.ac.uk/chembl/web_components/explore/activities/). In the page, select the *Homo sapiens* target organism in the filtering section on the right. Then, clicking on the csv download button will start the file generation. Once the generation is complete, press the download button and **save the file into the data folder**. The images below show the step by step procedure to download the file.

![Filtering section button for Homo sapiens](images/ChEMBL-filtering.png)

![CSV button to generate annotations file](images/ChEMBL-csv.png)

![Button to download annotations file](images/ChEMBL-download.png)


The downloaded `.zip` file will contain multiple csv files that will need to be merged

Unzip file

In [18]:
files = os.listdir(f"{data}\\")

# # Downloaded ChEMBL file name will start with "DOWNLOAD"
for file in files:
    if file.startswith('DOWNLOAD') and file.endswith('.zip'):
        zipped = file
        break

commands = f'mkdir {data}\\chembl && \
            tar -xf {data}\\{zipped} -C {data}\\chembl && \
            del "{data}\\{zipped}"'

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

stdout: 
stderr: 


Merge files with the following code

In [None]:
def merge_csv_files(folder_path, output_file):
    
    files = os.listdir(folder_path)
    
    csv_files = [file for file in files if file.endswith('.csv')]
    
    if len(csv_files) == 0:
        print("No CSV files found in the folder.")
        return
    
    merged_df = pd.DataFrame()
    
    for file in csv_files:
        file_path = os.path.join(folder_path, file)
        df = pd.read_csv(file_path, on_bad_lines='skip')
        merged_df = pd.concat([merged_df, df], ignore_index=True)
    
    merged_df.to_csv(output_file, index=False)

folder_path = f'{data}\\chembl'
output_file = f'{data}\\ChEMBL.csv'
merge_csv_files(folder_path, output_file)


Remove downloaded data

In [6]:
commands = f"rmdir /s /q {data}\\chembl"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

stdout: 
stderr: 


## CTD

In [14]:
commands = f"curl -L -o {data}\\CTD.csv.gz https://ctdbase.org/reports/CTD_chem_gene_ixns.csv.gz"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

# Extract csv using pandas
ctd = pd.read_csv(f"{data}\\CTD.csv.gz", compression='gzip', on_bad_lines='skip', skiprows=28)
ctd.to_csv(f"{data}\\CTD.csv", sep=',')

stdout: 
stderr: 


Remove downloaded data

In [None]:
commands = f"del {data}\\CTD.csv.gz"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

## Drugbank

The DrugBank database requires an academic license to download, please refer to the [DrugBank website](https://go.drugbank.com/releases/latest) for further instructions.
Once the academic license has been issued to the account, please download the complete database from the [Drugbank download page](https://go.drugbank.com/releases/latest), or use the commands below. \
Then, unzip the file to obtain the `.xml` file. 
This file can be converted to `.csv` using the code provided below. 

![DrugBank complete database download](images/Drugbank.png)

Download the zip file using your profile's data

In [17]:
email = 'YOURUSERNAME'
password = 'YOURPASSWORD'

commands = f'curl -L --user {email}:{password} -o {data}\\drugbank_all_full_database.xml.zip https://go.drugbank.com/releases/latest/downloads/all-full-database'

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

stdout: 
stderr:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   477    0   477    0     0   1025      0 --:--:-- --:--:-- --:--:--  1030

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0  152M    0  8481    0     0   4936      0  8:58:29  0:00:01  8:58:28  9989
  0  152M    0  433k    0     0   160k      0  0:16:11  0:00:02  0:16:09  236k
  2  152M    2 4305k    0     0  1124k      0  0:02:18  0:00:03  0:02:15 1454k
  5  152M    5 7873k    0     0  1630k      0  0:01:35  0:00:04  0:01:31 1988k
  6  152M    6 10.4M    0     0  1874k      0  0:01:23  0:00:05  0:01:18 2212k
  9  152M    9 13.7M    0     0  2091k      0  0:01:14  0:00:06  0:01:08 2808k
 11  152M   11 17.2M    0     0  2288k      0  0:01:08  0:00:07  0:01:01 3438k
 13  152M   13 20.7M    0     0  

Unzip the file

In [18]:
commands = f"tar -xf {data}\\drugbank_all_full_database.xml.zip -C {data}\\ && \
            del {data}\\drugbank_all_full_database.xml.zip"
# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

stdout: 
stderr: 


In [19]:
def collapse_list_values(row):
    for key, value in row.items():
        if isinstance(value, list):
            row[key] = '|'.join(value)
    return row

def xml2csv(file_path, output_file):
    tree = ET.parse(file_path)
    root = tree.getroot()

    ns = '{http://www.drugbank.ca}'
    inchikey_template = "{ns}calculated-properties/{ns}property[{ns}kind='InChIKey']/{ns}value"
    inchi_template = "{ns}calculated-properties/{ns}property[{ns}kind='InChI']/{ns}value"

    rows = list()
    for i, drug in enumerate(root):
        row = collections.OrderedDict()
        assert drug.tag == ns + 'drug'
        row['type'] = drug.get('type')
        row['drugbank-id'] = drug.findtext(ns + "drugbank-id[@primary='true']")
        row['name'] = drug.findtext(ns + "name")
        row['description'] = drug.findtext(ns + "description")
        row['InChIKey'] = drug.findtext(inchikey_template.format(ns = ns))
        rows.append(row)

    rows = list(map(collapse_list_values, rows))

    columns = ['drugbank-id', 'name', 'type', 'InChIKey', 'description']
    drugbank_df = pd.DataFrame.from_dict(rows)[columns]

    protein_rows = list()
    for i, drug in enumerate(root):
        drugbank_id = drug.findtext(ns + "drugbank-id[@primary='true']")
        for category in ['target', 'enzyme', 'carrier', 'transporter']:
            proteins = drug.findall('{ns}{cat}s/{ns}{cat}'.format(ns=ns, cat=category))
            for protein in proteins:
                row = {'drugbank-id': drugbank_id, 'protein_type': category}
                row['protein_name'] = protein.findtext('{}name'.format(ns))
                row['organism'] = protein.findtext('{}organism'.format(ns))
                actions = protein.findall('{ns}actions/{ns}action'.format(ns=ns))
                row['actions'] = '|'.join(action.text for action in actions)
                uniprot_ids = [polypep.text for polypep in protein.findall(
                    "{ns}polypeptide/{ns}external-identifiers/{ns}external-identifier[{ns}resource='UniProtKB']/{ns}identifier".format(ns=ns))]            
                if len(uniprot_ids) == 1:
                    row['uniprot_id'] = uniprot_ids[0]
                hgnc_ids = [polypep.text for polypep in protein.findall(
                    "{ns}polypeptide/{ns}external-identifiers/{ns}external-identifier[{ns}resource='HUGO Gene Nomenclature Committee (HGNC)']/{ns}identifier".format(ns=ns))]            
                if len(hgnc_ids) == 1:
                    row['HGNC'] = hgnc_ids[0]
                protein_rows.append(row)

    protein_df = pd.DataFrame.from_dict(protein_rows)

    drugbank = pd.merge(drugbank_df, protein_df, on='drugbank-id', how='left')

    drugbank.to_csv(output_file, sep=',', index=False)

file_path = f'{data}\\full database.xml'
output_file = f'{data}\\DB.csv'
xml2csv(file_path, output_file)


Remove downloaded data

In [20]:
commands = f'del "{data}\\full database.xml"'

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

stdout: 
stderr: 


## DrugCentral

In [21]:
commands = f"curl -o {data}\\DC.tsv.gz https://unmtid-dbs.net/download/DrugCentral/2021_09_01/drug.target.interaction.tsv.gz && \
            curl -o {data}\\DC_comps.tsv https://unmtid-dbs.net/download/DrugCentral/2021_09_01/structures.smiles.tsv"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

stdout: 
stderr:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  4  761k    4 31949    0     0  38012      0  0:00:20 --:--:--  0:00:20 38079
100  761k  100  761k    0     0   581k      0  0:00:01  0:00:01 --:--:--  582k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 1063k  100 1063k    0     0   754k      0  0:00:01  0:00:01 --:--:--  756k



Merge the two files to add SMILES and InChI information for compounds

In [26]:
DC = pd.read_csv(f'{data}\\DC.tsv.gz', sep='\t', compression='gzip', on_bad_lines='skip')
DC_comps = pd.read_csv(f'{data}\\DC_comps.tsv', sep='\t')

DC_comps = DC_comps[['ID', 'SMILES', 'InChI', 'InChIKey']].rename(columns={'ID':'STRUCT_ID'})

DC = pd.merge(DC, DC_comps, on='STRUCT_ID', how='left')

DC.to_csv(f'{data}\\DrugCentral.csv', index=False)

Remove downloaded data

In [27]:
commands = f"del {data}\\DC.tsv.gz && del {data}\\DC_comps.tsv"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

stdout: 
stderr: 


## DTC

In [3]:
commands = f"curl -o {data}\\DTC.csv https://drugtargetcommons.fimm.fi/static/Excell_files/DTC_data.csv"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

stdout: 
stderr:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0 2168M    0  135k    0     0  80468      0  7:50:57  0:00:01  7:50:56 80532
  0 2168M    0 1223k    0     0   460k      0  1:20:23  0:00:02  1:20:21  460k
  1 2168M    1 24.8M    0     0  6966k      0  0:05:18  0:00:03  0:05:15 6969k
  2 2168M    2 53.4M    0     0  11.4M      0  0:03:08  0:00:04  0:03:04 11.4M
  3 2168M    3 78.5M    0     0  13.8M      0  0:02:36  0:00:05  0:02:31 15.8M
  4 2168M    4  106M    0     0  15.9M      0  0:02:15  0:00:06  0:02:09 21.5M
  6 2168M    6  134M    0     0  17.6M      0  0:02:03  0:00:07  0:01:56 26.7M
  7 2168M    7  165M    0     0  19.1M      0  0:01:53  0:00:08  0:01:45 28.2M
  8 2168M    8  190M    0     0  1

## STITCH

In [None]:
commands = f"curl -o {data}\\STITCH.tsv.gz http://stitch.embl.de/download/protein_chemical.links.detailed.v5.0/9606.protein_chemical.links.detailed.v5.0.tsv.gz"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

# Extract csv using pandas
stitch = pd.read_csv(f"{data}\\STICH.tsv.gz", sep='\t', compression='gzip', on_bad_lines='skip')
stitch.to_csv(f"{data}\\STITCH.tsv", sep='\t')

Delete downloaded data

In [7]:
commands = f"del {data}\\STITCH.tsv.gz"

# Execute commands
result = subprocess.run(commands, shell=True, capture_output=True, text=True)

# Print the standard output
print('stdout:', result.stdout)

stdout: 
stderr: 
