### Drug Repurposing Data Processing

**Objective:** This code is designed to process drug repurposing data by applying specific filtering criteria to the results. It also retrieves additional information, such as PubChem CID and SMILES notation, which can be used for further analysis and input into SwissADME.

---

### Step 1: Filtering Drug Repurposing Results

- **Input:** `export.csv`

- **Output:** Four distinct CSV files:
  - `filtered_cp_above_90.csv`: Contains filtered data for drug type 'cp' with scores exceeding 90.
  - `filtered_kd_above_90.csv`: Contains filtered data for drug type 'kd' with scores exceeding 90.
  - `filtered_oe_above_90.csv`: Contains filtered data for drug type 'oe' with scores exceeding 90.
  - `filtered_cc_above_90.csv`: Contains filtered data for drug type 'cc' with scores exceeding 90.

In [14]:
import pandas as pd
from IPython.display import display

# Step 1: Filtering Drug Repurposing Results
# Input: export.csv
# Output: Four separate CSV files for types 'cp', 'kd', 'oe', 'cc' with scores above 90

data = pd.read_csv('export.csv')
print("Contents of export.csv:")
display(data)

Contents of export.csv:


Unnamed: 0,Rank,Score,Type,ID,Name,Description
0,1,99.98,oe,ccsbBroad304_01966,RUVBL1,ATPases / AAA-type
1,2,99.98,kd,CGS001-8848,TSC22D1,-
2,3,99.96,oe,ccsbBroad304_00841,IKBKB,IKK family
3,4,99.94,kd,CGS001-1196,CLK2,CDC-like kinases
4,5,99.88,cp,BRD-A02333338,cyclopamine,Smoothened receptor antagonist
...,...,...,...,...,...,...
8023,8024,-99.95,kd,CGS001-64786,TBC1D15,-
8024,8025,-99.96,oe,ccsbBroad304_07172,ZNF195,"Zinc fingers, C2H2-type"
8025,8026,-99.96,cp,BRD-K17306061,aprepitant,Tachykinin antagonist
8026,8027,-99.98,kd,CGS001-9326,ZNHIT3,"Zinc fingers, HIT-type"


In [15]:
# Filtering rows based on Type and Score > 90
filtered_cp = data[(data['Type'] == 'cp') & (data['Score'] > 90)]
filtered_kd = data[(data['Type'] == 'kd') & (data['Score'] > 90)]
filtered_oe = data[(data['Type'] == 'oe') & (data['Score'] > 90)]
filtered_cc = data[(data['Type'] == 'cc') & (data['Score'] > 90)]

In [16]:
# Saving filtered data to new CSV files
filtered_cp.to_csv('filtered_cp_above_90.csv', index=False)
filtered_kd.to_csv('filtered_kd_above_90.csv', index=False)
filtered_oe.to_csv('filtered_oe_above_90.csv', index=False)
filtered_cc.to_csv('filtered_cc_above_90.csv', index=False)

In [17]:
# Load and display CSV files
filtered_cp = pd.read_csv('filtered_cp_above_90.csv')
filtered_kd = pd.read_csv('filtered_kd_above_90.csv')
filtered_oe = pd.read_csv('filtered_oe_above_90.csv')
filtered_cc = pd.read_csv('filtered_cc_above_90.csv')

In [13]:
# Display the contents of each DataFrame
print("Contents of filtered_cp_above_90.csv:")
display(filtered_cp)

print("Contents of filtered_kd_above_90.csv:")
display(filtered_kd)

print("Contents of filtered_oe_above_90.csv:")
display(filtered_oe)

print("Contents of filtered_cc_above_90.csv:")
display(filtered_cc)

Contents of filtered_cp_above_90.csv:


Unnamed: 0,Rank,Score,Type,ID,Name,Description
0,5,99.88,cp,BRD-A02333338,cyclopamine,Smoothened receptor antagonist
1,20,99.25,cp,BRD-K90543092,levonorgestrel,Estrogen receptor agonist
2,21,99.21,cp,BRD-K59456551,methotrexate,Dihydrofolate reductase inhibitor
3,24,99.11,cp,BRD-K12994359,valdecoxib,Cyclooxygenase inhibitor
4,32,98.78,cp,BRD-K11663430,pyroxamide,HDAC inhibitor
5,33,98.77,cp,BRD-K32311154,nifekalant,Potassium channel blocker
6,51,97.91,cp,BRD-K33226500,indinavir,HIV protease inhibitor
7,59,97.39,cp,BRD-K41731458,triclosan,Enoyl-[acyl-carrier-protein] reductase [NADH] ...
8,61,97.24,cp,BRD-K81709173,halcinonide,Glucocorticoid receptor agonist
9,63,97.05,cp,BRD-A85587465,bemesetron,Serotonin receptor antagonist


Contents of filtered_kd_above_90.csv:


Unnamed: 0,Rank,Score,Type,ID,Name,Description
0,2,99.98,kd,CGS001-8848,TSC22D1,-
1,4,99.94,kd,CGS001-1196,CLK2,CDC-like kinases
2,8,99.81,kd,CGS001-79109,MAPKAP1,-
3,11,99.74,kd,CGS001-29105,C16ORF80,-
4,14,99.70,kd,CGS001-10253,SPRY2,-
...,...,...,...,...,...,...
57,139,92.21,kd,CGS001-9443,MED7,-
58,150,91.38,kd,CGS001-2058,EPRS,Aminoacyl tRNA synthetases / Class I
59,152,91.23,kd,CGS001-3660,IRF2,-
60,155,91.11,kd,CGS001-3460,IFNGR2,Interferon receptor family


Contents of filtered_oe_above_90.csv:


Unnamed: 0,Rank,Score,Type,ID,Name,Description
0,1,99.98,oe,ccsbBroad304_01966,RUVBL1,ATPases / AAA-type
1,3,99.96,oe,ccsbBroad304_00841,IKBKB,IKK family
2,6,99.82,oe,ccsbBroad304_07136,WNT9A,Wingless-type MMTV integration sites
3,7,99.82,oe,ccsbBroad304_00101,RHOC,-
4,9,99.79,oe,ccsbBroad304_01623,SUPT4H1,-
5,10,99.77,oe,ccsbBroad304_00122,ATP1B1,ATPases / P-type
6,12,99.72,oe,ccsbBroad304_06642,NIT1,-
7,13,99.71,oe,ccsbBroad304_07147,ZNF8,"Zinc fingers, C2H2-type"
8,16,99.39,oe,ccsbBroad304_02322,ZNF263,"Zinc fingers, C2H2-type"
9,23,99.16,oe,ccsbBroad304_00580,FLT1,Type IV RTKs: VEGF (vascular endothelial growt...


Contents of filtered_cc_above_90.csv:


Unnamed: 0,Rank,Score,Type,ID,Name,Description
0,46,98.21,cc,,Aldehyde dehydrogenases LOF,-
1,80,96.15,cc,,Aminoacyl tRNA synthetases class II LOF,-
2,86,95.64,cc,,Mediator complex LOF,-


### Step 2: Generating Files for SwissADME

- **Input:** `filtered_cp_above_90.csv`

- **Output:**
  - `compounds.csv`: A CSV file that includes compounds along with their names, PubChem CIDs, and SMILES notations.
  - `molecules_for_adme.txt`: A text file containing SMILES notations formatted specifically for input into SwissADME.

In [6]:
import requests
import csv
import time

# Step 2: Generating Files for SwissADME
# Input: filtered_cp_above_90.csv
# Output: 'compounds.csv' and 'molecules_for_adme.txt'

df = pd.read_csv('filtered_cp_above_90.csv')
filtered_df = df[df['Score'] > 90]

# Extracting drug names list
drug_names = filtered_df['Name'].tolist()

def get_cid(name):
    """Fetch PubChem CID for the given compound name."""
    url = f'https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/{name}/cids/JSON'
    response = requests.get(url)
    if response.status_code == 200:
        data = response.json()
        cids = data.get('IdentifierList', {}).get('CID', [])
        return cids[0] if cids else None
    return None

def get_smiles(cid):
    """Fetch SMILES notation for the given CID."""
    url = f'https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/property/ConnectivitySMILES/JSON'
    response = requests.get(url)
    if response.status_code == 200:
        data = response.json()
        props = data.get('PropertyTable', {}).get('Properties', [])
        return props[0]['ConnectivitySMILES'] if props and 'ConnectivitySMILES' in props[0] else None
    return None

output = []
output_for_adme = []

for name in drug_names:
    cid = get_cid(name)
    if cid:
        smiles = get_smiles(cid)
        if smiles:
            output.append([name, cid, smiles])
            output_for_adme.append(f"{smiles} {name}")
            print(f"✅ {name}: CID={cid}, SMILES found -> {smiles}")
        else:
            print(f"❌ {name}: SMILES not found for CID {cid}")
    else:
        print(f"❌ {name}: CID not found")
    time.sleep(0.2)  # Prevent request rate limits

# Save CSV file with compounds having SMILES
with open('compounds.csv', 'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Compound Name', 'CID', 'SMILES Notation'])
    writer.writerows(output)

print("\nCSV file 'compounds.csv' created successfully with compounds that have SMILES.")

# Save text file formatted for SwissADME input
with open('molecules_for_adme.txt', 'w', encoding='utf-8') as f:
    for line in output_for_adme:
        f.write(line + "\n")

print("Text file 'molecules_for_adme.txt' created successfully for SwissADME input.")

✅ cyclopamine: CID=442972, SMILES found -> CC1CC2C(C(C3(O2)CCC4C5CC=C6CC(CCC6(C5CC4=C3C)C)O)C)NC1
✅ levonorgestrel: CID=13109, SMILES found -> CCC12CCC3C(C1CCC2(C#C)O)CCC4=CC(=O)CCC34
✅ methotrexate: CID=126941, SMILES found -> CN(CC1=CN=C2C(=N1)C(=NC(=N2)N)N)C3=CC=C(C=C3)C(=O)NC(CCC(=O)O)C(=O)O
✅ valdecoxib: CID=119607, SMILES found -> CC1=C(C(=NO1)C2=CC=CC=C2)C3=CC=C(C=C3)S(=O)(=O)N
✅ pyroxamide: CID=4996, SMILES found -> C1=CC(=CN=C1)NC(=O)CCCCCCC(=O)NO
✅ nifekalant: CID=4486, SMILES found -> CN1C(=CC(=O)N(C1=O)C)NCCN(CCCC2=CC=C(C=C2)[N+](=O)[O-])CCO
✅ indinavir: CID=5362440, SMILES found -> CC(C)(C)NC(=O)C1CN(CCN1CC(CC(CC2=CC=CC=C2)C(=O)NC3C(CC4=CC=CC=C34)O)O)CC5=CN=CC=C5
✅ triclosan: CID=5564, SMILES found -> C1=CC(=C(C=C1Cl)O)OC2=C(C=C(C=C2)Cl)Cl
✅ halcinonide: CID=443943, SMILES found -> CC1(OC2CC3C4CCC5=CC(=O)CCC5(C4(C(CC3(C2(O1)C(=O)CCl)C)O)F)C)C
✅ bemesetron: CID=671690, SMILES found -> CN1C2CCC1CC(C2)OC(=O)C3=CC(=CC(=C3)Cl)Cl
✅ profenamine: CID=3290, SMILES found -> CCN(CC)C

In [18]:
# Load and display CSV files
compounds = pd.read_csv('compounds.csv')

# Display the contents of each DataFrame
print("Contents of compounds.csv:")
display(compounds)

Contents of compounds.csv:


Unnamed: 0,Compound Name,CID,SMILES Notation
0,cyclopamine,442972,CC1CC2C(C(C3(O2)CCC4C5CC=C6CC(CCC6(C5CC4=C3C)C...
1,levonorgestrel,13109,CCC12CCC3C(C1CCC2(C#C)O)CCC4=CC(=O)CCC34
2,methotrexate,126941,CN(CC1=CN=C2C(=N1)C(=NC(=N2)N)N)C3=CC=C(C=C3)C...
3,valdecoxib,119607,CC1=C(C(=NO1)C2=CC=CC=C2)C3=CC=C(C=C3)S(=O)(=O)N
4,pyroxamide,4996,C1=CC(=CN=C1)NC(=O)CCCCCCC(=O)NO
5,nifekalant,4486,CN1C(=CC(=O)N(C1=O)C)NCCN(CCCC2=CC=C(C=C2)[N+]...
6,indinavir,5362440,CC(C)(C)NC(=O)C1CN(CCN1CC(CC(CC2=CC=CC=C2)C(=O...
7,triclosan,5564,C1=CC(=C(C=C1Cl)O)OC2=C(C=C(C=C2)Cl)Cl
8,halcinonide,443943,CC1(OC2CC3C4CCC5=CC(=O)CCC5(C4(C(CC3(C2(O1)C(=...
9,bemesetron,671690,CN1C2CCC1CC(C2)OC(=O)C3=CC(=CC(=C3)Cl)Cl


In [19]:
# Load and display the text file
with open('molecules_for_adme.txt', 'r') as file:
    molecules_for_adme_content = file.read()

print("Contents of molecules_for_adme.txt:")
print(molecules_for_adme_content)

Contents of molecules_for_adme.txt:
CC1CC2C(C(C3(O2)CCC4C5CC=C6CC(CCC6(C5CC4=C3C)C)O)C)NC1 cyclopamine
CCC12CCC3C(C1CCC2(C#C)O)CCC4=CC(=O)CCC34 levonorgestrel
CN(CC1=CN=C2C(=N1)C(=NC(=N2)N)N)C3=CC=C(C=C3)C(=O)NC(CCC(=O)O)C(=O)O methotrexate
CC1=C(C(=NO1)C2=CC=CC=C2)C3=CC=C(C=C3)S(=O)(=O)N valdecoxib
C1=CC(=CN=C1)NC(=O)CCCCCCC(=O)NO pyroxamide
CN1C(=CC(=O)N(C1=O)C)NCCN(CCCC2=CC=C(C=C2)[N+](=O)[O-])CCO nifekalant
CC(C)(C)NC(=O)C1CN(CCN1CC(CC(CC2=CC=CC=C2)C(=O)NC3C(CC4=CC=CC=C34)O)O)CC5=CN=CC=C5 indinavir
C1=CC(=C(C=C1Cl)O)OC2=C(C=C(C=C2)Cl)Cl triclosan
CC1(OC2CC3C4CCC5=CC(=O)CCC5(C4(C(CC3(C2(O1)C(=O)CCl)C)O)F)C)C halcinonide
CN1C2CCC1CC(C2)OC(=O)C3=CC(=CC(=C3)Cl)Cl bemesetron
CCN(CC)C(C)CN1C2=CC=CC=C2SC3=CC=CC=C31 profenamine
C1C(C2=C(C=C(C=C2Cl)Cl)NC1C(=O)O)NC(=O)NC3=CC=CC=C3 L-689560
CC1=C(C=C(C=C1)C(=O)NC2=CC(=CC(=C2)NC(=O)C=C)C(F)(F)F)NC(=O)C3=CC=NO3 QL-XI-92
CCOC(=O)C12CC1C(=NO)C3=CC=CC=C3O2 CPCCOEt
CC1=C(C(=CC=C1)Cl)NC(=O)C2=CN=C(S2)NC3=CC(=NC(=N3)C)N4CCN(CC4)CCO dasatinib
CN(C1CCC

### Step-by-Step Explanation

#### Step 1: Filtering Data
1. **Read the Input File:** Load the data from `export.csv`.
2. **Filter Rows:** Filter the rows based on the drug type (`cp`, `kd`, `oe`, `cc`) and scores above 90.
3. **Save Filtered Data:** Save the filtered data into separate CSV files:
   - `filtered_cp_above_90.csv` for drug type `cp`.
   - `filtered_kd_above_90.csv` for drug type `kd`.
   - `filtered_oe_above_90.csv` for drug type `oe`.
   - `filtered_cc_above_90.csv` for drug type `cc`.

#### Step 2: Fetching CID and SMILES
1. **Read the Filtered File:** Load the filtered data from `filtered_cp_above_90.csv`.
2. **Extract Drug Names:** Create a list of drug names from the filtered data.
3. **Fetch CID and SMILES:**
   - Use the PubChem API to obtain the CID (unique identifier) for each drug name.
   - Use the CID to retrieve the SMILES (chemical structure representation) from PubChem.
4. **Save Output Files:**
   - `compounds.csv`: Contains drug name, CID, and SMILES notation.
   - `molecules_for_adme.txt`: Contains SMILES formatted for input into SwissADME.

#### Imported Libraries
- **pandas:** For reading and processing CSV files.
- **requests:** For making HTTP requests to the PubChem API.
- **csv:** For writing CSV files.
- **time:** For adding delays between API requests to avoid rate limiting.

### Conclusion

The data processing workflow for drug repurposing has been successfully implemented in two key steps. Initially, the data was filtered to identify relevant compounds based on specific criteria, such as drug type and score thresholds. This allowed for the creation of targeted CSV files that facilitate further analysis. 

In the second step, the integration with the PubChem API enabled the retrieval of essential chemical information, including PubChem CIDs and SMILES notations, for the identified compounds. This data is crucial for conducting in-depth analyses and simulations in platforms like SwissADME, which assess the pharmacokinetic properties of compounds.

Overall, this systematic approach not only streamlines the data processing but also enhances the quality of information available for drug repurposing efforts, ultimately contributing to more efficient drug discovery and development processes.