*this notebook uses a venv created by using uv*
- https://docs.astral.sh/uv/guides/integration/jupyter/#using-jupyter-from-vs-code

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv("CYP3A4_strong_substrates")
data

Unnamed: 0,generic_drug_name,cyp_strength_of_evidence,drug_class,common_adverse_effects^^,less_common_adverse_effects^,first_ref,second_ref,date_checked
0,carbamazepine,strong,antiepileptics,"constipation^^, leucopenia^^, dizziness^^, som...","eosinophilia^, thrombocytopenia^, neutropenia^...",drugs.com,nzf,211024.0
1,eliglustat,strong,metabolic_agents,"diarrhea^^, oropharyngeal_pain^^, arthralgia^^...","rash^, flatulence^, dyspepsia^, gastroesophage...",drugs.com,emc,151124.0
2,flibanserin,strong,CNS_agents,"dizziness^^, somnolence^^","sedation^, fatigue^, vertigo^, accidental_inju...",drugs.com,Drugs@FDA,161124.0
3,imatinib,strong,tyrosine_kinase_inhibitor,"rash^^, diarrhea^^, abdominal_pain^^, constipa...","flushing^, pruritus^, face_edema^, dry skin^, ...",drugs.com,nzf,181124.0
4,ibrutinib,strong,tyrosine_kinase_inhibitor,"hypertension^^, atrial_fibrillation^^, sinus_t...","atrial_flutter^, cardiac_failure(pm)^, ventric...",drugs.com,nzf,191124.0
5,neratinib,strong,tyrosine_kinase_inhibitor,"diarrhea^^, abdominal_pain^^, stomatitis^^, dy...","abdominal_distention^, dry_mouth^, nail_disord...",drugs.com,nzf,201124.0
6,esomeprazole,strong,proton_pump_inhibitors,"headache^^, flatulence^^","dizziness^, somnolence^, taste_disturbance/per...",drugs.com,emc,161124.0
7,omeprazole,strong,proton_pump_inhibitors,"fever^^, otitis_media^^, respiratory_system_re...","accidental_injury^, asthenia^, pain(pm), fatig...",drugs.com,nzf,181124.0
8,ivacaftor,strong,CFTR_potentiator,"rash^^, oropharyngeal_pain^^, abdominal_pain^^...","acne^, increased_hepatic_enzymes^, increased_b...",drugs.com,nzf,201124.0
9,naloxegol,strong,peripheral_opioid_receptor_antagonists,abdominal pain^^,"possible_opioid_withdrawal_syndrome^, diarrhea...",drugs.com,emc,211124.0


In [3]:
# drop some columns
df = data.drop([
    "cyp_strength_of_evidence", 
    "drug_class", 
    "less_common_adverse_effects^", 
    "first_ref", 
    "second_ref", 
    "date_checked"
    ], axis=1)
df

Unnamed: 0,generic_drug_name,common_adverse_effects^^
0,carbamazepine,"constipation^^, leucopenia^^, dizziness^^, som..."
1,eliglustat,"diarrhea^^, oropharyngeal_pain^^, arthralgia^^..."
2,flibanserin,"dizziness^^, somnolence^^"
3,imatinib,"rash^^, diarrhea^^, abdominal_pain^^, constipa..."
4,ibrutinib,"hypertension^^, atrial_fibrillation^^, sinus_t..."
5,neratinib,"diarrhea^^, abdominal_pain^^, stomatitis^^, dy..."
6,esomeprazole,"headache^^, flatulence^^"
7,omeprazole,"fever^^, otitis_media^^, respiratory_system_re..."
8,ivacaftor,"rash^^, oropharyngeal_pain^^, abdominal_pain^^..."
9,naloxegol,abdominal pain^^


In [None]:
# Need to consider saving ADRs for each drug as lists of sentences (? i.e. a function that does this repeatedly)

In [None]:
## Example - trial generating tensors on ADRs for one drug e.g. terfenadine

import torch
import torch.nn as nn
from collections import Counter

torch.manual_seed(1)

sentence = "dizziness^^, syncopal_episodes^^, palpitations^, ventricular_arrhythmias^^, cardiac_arrest^^, cardiac_death^^, headaches^"

words = sentence.split(', ')
words

['dizziness^^',
 'syncopal_episodes^^',
 'palpitations^',
 'ventricular_arrhythmias^^',
 'cardiac_arrest^^',
 'cardiac_death^^',
 'headaches^']

In [6]:
# create a dictionary
vocab = Counter(words) 
vocab

Counter({'dizziness^^': 1,
         'syncopal_episodes^^': 1,
         'palpitations^': 1,
         'ventricular_arrhythmias^^': 1,
         'cardiac_arrest^^': 1,
         'cardiac_death^^': 1,
         'headaches^': 1})

In [7]:
vocab = sorted(vocab)
vocab

['cardiac_arrest^^',
 'cardiac_death^^',
 'dizziness^^',
 'headaches^',
 'palpitations^',
 'syncopal_episodes^^',
 'ventricular_arrhythmias^^']

In [8]:
vocab_size = len(vocab)
vocab_size

7

In [9]:

# create a word to index dictionary from the vocab
word2idx = {word: ind for ind, word in enumerate(vocab)} 

encoded_sentences = [word2idx[word] for word in words]
encoded_sentences

[2, 5, 4, 6, 0, 1, 3]

In [10]:
# assign a value to your embedding_dim --> ?how large normally
e_dim = 5

# initialise an Embedding layer from Torch
emb = nn.Embedding(vocab_size, e_dim, padding_idx = 3)
word_vectors = emb(torch.LongTensor(encoded_sentences))
word_vectors

tensor([[-0.7773, -0.2515, -0.2223,  1.6871,  0.2284],
        [-0.7765,  2.0242, -0.0288,  2.3571, -1.0373],
        [ 0.1991,  0.0457,  0.1530, -0.4757, -1.8821],
        [ 1.5748, -0.6298,  2.4070,  0.2786,  0.2468],
        [-1.5256, -0.7502, -0.6540, -1.6095, -0.1002],
        [-0.6092, -0.9798, -1.6091, -0.7121,  0.3037],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],
       grad_fn=<EmbeddingBackward0>)

In [11]:
## Some data manipulations initially (may not need this)

# Change data types for ADRs
# df = df.astype({"generic_drug_name": "string", "common_adverse_effects^^": "string"})
# df.dtypes

# Expand the common ADR column
# alternative way is to explode:
#data["common_adverse_effects^^"].explode()

# adr = df["common_adverse_effects^^"].str.split(expand=True)

# Merge dfs
# df = df.join(adr)

# df = df.drop(["common_adverse_effects^^"], axis=1)

#df.stack(future_stack=True)

Structure-adverse drug reaction relationships: **ADRs <-> (dense vectors) <-> 2D drug structures**
Structure-activity relationships: drug activities <-> 2d drug structures

1. trial generating tensors of ADRs for one drug
- https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html

2. building a NN model to classify drugs in an ADRs dataset (?identify drugs in different therapeutic classes) or to predict ADRs of drugs (regression) - to determine whether to use classification/regression
- to infer possible drugs vs. ADRs relationships

3. 2D drug structures part (much further down the line) - graph neural networks (GNN): molecules as undirected graphs where the connections between nodes (atoms) and edges (bonds) don't matter (i.e. don't need to be in particular orders or sequences)