# **MediSure**


OR

# **SafeRx**

$$PROPOSAL$$



### **Objective:**  

The primary objective of this project is to develop a predictive model that evaluates the safety of medications for individual users by analyzing their medical profiles, current medicines, and allergies. The model will provide recommendations for both chemical and herbal alternatives to prescribed medicines, while cross-checking potential allergic reactions between medications. Additionally, it will offer guidance on foods that should be avoided while taking specific medications, ensuring users make informed decisions on medicine safety, alternatives, and dietary considerations.


$$ ~~~~~~~ $$



### **Tools:**

- **Pandas** for data cleaning and manipulation.

- **Scikit-learn** for building and training predictive models (Logistic Regression, Random Forest).

- **XGBoost** for efficient and scalable gradient boosting.

- **NLP libraries** (spaCy/Transformers) for processing unstructured data related to medical profiles and symptoms.

- **Flask/Django** to build the web interface and REST API for user interaction.

- **PostgreSQL/MongoDB** for storing user profiles, medicine information, and interactions.




$$ ~~~~~~~ $$




### **Expected Outcomes:**

- A user-friendly application that allows individuals to input their medical profiles and receive immediate feedback on medication safety.
- Recommendations for both chemical and herbal alternative medicines, along with safe food options based on allergies and current medications.
- A comprehensive report on potential side effects and safety considerations, offering users insights into their medications and health.


____

# EDA

In [1]:
import pandas as pd
import numpy as np

import os

base_path = '/Users/raynasser/code/raynasser/MediSure/data/raw_data/FooDru'


# 1

In [2]:

df = pd.read_csv(os.path.join(base_path, 'cmap_foodrugs.csv'), index_col= 'Unnamed: 0')

df

Unnamed: 0,node_id,cmap_node_id,tau
0,482,4408,53.0753
1,482,22960,-38.0148
2,482,37226,-28.8623
3,482,64330,44.9318
4,482,476,-40.0986
...,...,...,...
26365249,10,60894,-87.0948
26365250,10,4295,-28.9109
26365251,10,47295,-78.3002
26365252,10,49360,-36.3977


In [3]:
# df.info()

print(f'\n\n{df.columns}\n\n')

'{:,}'.format(df.shape[0])



Index(['node_id', 'cmap_node_id', 'tau'], dtype='object')




'26,365,254'

In [4]:
import os

[os.listdir("../data/raw_data")]

[['FooDru',
  'medicine_dataset.csv',
  '.DS_Store',
  'pharma_safe',
  'PubChemLite_31Oct2020_exposomics.csv',
  'Allergen_Status_of_Food_Products.csv',
  'drug_relation_dfe.csv',
  'drugs_allergic_interaction',
  'Medicine.csv']]

In [5]:
df.dtypes.unique()

array([dtype('int64'), dtype('float64')], dtype=object)

In [6]:
df.dtypes.value_counts()


int64      2
float64    1
Name: count, dtype: int64

In [7]:
df.describe()#include=['O'])

Unnamed: 0,node_id,cmap_node_id,tau
count,26365250.0,26365250.0,26365250.0
mean,264.3881,35505.59,-0.5751061
std,155.9492,20468.94,60.91786
min,3.0,0.0,-100.0
25%,140.0,17813.0,-56.7284
50%,245.0,35508.0,-4.48822
75%,379.0,53280.0,55.5919
max,672.0,70894.0,100.0


In [8]:
'{:,}'.format(df.duplicated().sum())

'1,396,277'

In [9]:
pd.DataFrame(df.isna().sum())

Unnamed: 0,0
node_id,0
cmap_node_id,0
tau,0


# 2

### Analysis


In [10]:

df2 = pd.read_csv('/Users/raynasser/Downloads/DSproj/FooDru/cmap.csv')

df2

Unnamed: 0,sample_id,study_id,treatment,origin_type,origin_name,GSM,time_point,concentration
0,0,0,cranberry extract,cell type,endothelial cells,129143,6 hours,-
1,1,0,grapeseed extract,cell type,endothelial cells,129152,6 hours,-
2,2,0,control,cell type,endothelial cells,129146,6 hours,-
3,3,0,red wine extract,cell type,endothelial cells,129149,6 hours,-
4,4,0,cranberry extract,cell type,endothelial cells,129147,6 hours,-
...,...,...,...,...,...,...,...,...
995,1021,38,blackberry digested extract + H202,cell line,SK-N-MC,2695047,24 hours,0.5 µg GAE + 300µM
996,1022,38,rubus vagabundus digested extract + H202,cell line,SK-N-MC,2695050,24 hours,0.5 µg GAE + 300µM
997,1023,38,blackberry digested extract,cell line,SK-N-MC,2695036,24 hours,0.5 µg GAE/mL
998,1024,38,control,cell line,SK-N-MC,2695033,24 hours,0


In [11]:
# df.info()

df2.columns

Index(['sample_id', 'study_id', 'treatment', 'origin_type', 'origin_name',
       'GSM', 'time_point', 'concentration'],
      dtype='object')

In [12]:
df2.dtypes.unique()

array([dtype('int64'), dtype('O')], dtype=object)

In [13]:
df2.dtypes.value_counts()

object    5
int64     3
Name: count, dtype: int64

In [14]:
df2.describe(include=['O'])

Unnamed: 0,treatment,origin_type,origin_name,time_point,concentration
count,812,972,972,812,812
unique,60,4,42,25,48
top,control,tissue,Periferal blood mononuclear cells,24 hours,0
freq,270,482,160,236,271


In [15]:
df2.duplicated().sum()

np.int64(0)

In [16]:
pd.DataFrame(df2.isna().sum())

Unnamed: 0,0
sample_id,0
study_id,0
treatment,188
origin_type,28
origin_name,28
GSM,0
time_point,188
concentration,188


# 3


In [17]:

df3 = pd.read_csv('/Users/raynasser/Downloads/DSproj/FooDru/misc_sample.csv', index_col= 'Unnamed: 0')

df3

Unnamed: 0,sample_id,attribute_name,attribute_value
0,0,Characteristics,cell culture experiment
1,1,Characteristics,cultured cells
2,2,Characteristics,cultured cells
3,3,Characteristics,cultured cells
4,4,Characteristics,cultured cells
...,...,...,...
6841,7733,Color,Cy3
6842,7734,Color,Cy5
6843,7735,Color,Cy3
6844,7736,Color,Cy5


In [18]:
# df.info()

df3.columns

Index(['sample_id', 'attribute_name', 'attribute_value'], dtype='object')

In [19]:
df3.dtypes.unique()

array([dtype('int64'), dtype('O')], dtype=object)

In [20]:
df3.dtypes.value_counts()

object    2
int64     1
Name: count, dtype: int64

In [21]:
df3.describe(include=['O'])

Unnamed: 0,attribute_name,attribute_value
count,6846,6846
unique,48,691
top,Id,Cy3
freq,1115,556


In [22]:
df3.duplicated().sum()

np.int64(0)

In [23]:
pd.DataFrame(df3.isna().sum())

Unnamed: 0,0
sample_id,0
attribute_name,0
attribute_value,0


#  4

In [24]:

df4 = pd.read_csv('/Users/raynasser/Downloads/DSproj/FooDru/misc_study.csv', index_col= 'Unnamed: 0')

df4

Unnamed: 0,study_id,attribute_name,attribute_value
0,2,gener,Female
1,2,bmi,Lean
2,2,Source name,Milk Fat Globule
3,3,cell line,CCD-18Co
4,3,cell type,Normal human colon fibroblasts cells
...,...,...,...
187,136,cell type,epithelial
188,143,Source name,cultured 3D oesophageal epithelium
189,151,Characteristics,GM5659 human primary skin fibroblasts
190,158,cell line,CUTLL1


In [25]:
# df.info()

df4.columns

Index(['study_id', 'attribute_name', 'attribute_value'], dtype='object')

In [26]:
df4.dtypes.unique()

array([dtype('int64'), dtype('O')], dtype=object)

In [27]:
df4.dtypes.value_counts()

object    2
int64     1
Name: count, dtype: int64

In [28]:
df4.describe(include=['O'])

Unnamed: 0,attribute_name,attribute_value
count,192,192
unique,50,149
top,cell line,female
freq,43,7


In [29]:
df4.duplicated().sum()

np.int64(0)

In [30]:
pd.DataFrame(df4.isna().sum())

Unnamed: 0,0
study_id,0
attribute_name,0
attribute_value,0


# 5

In [31]:

df5 = pd.read_csv('/Users/raynasser/Downloads/DSproj/FooDru/nodes.csv', index_col= 'Unnamed: 0')

df5

Unnamed: 0,node_id,sample_id
0,2,1136
1,2,1137
2,2,1138
3,2,1139
4,2,1140
...,...,...
2612,736,7734
2613,736,7736
2614,738,7763
2615,738,7769


In [32]:
# df.info()

df5.columns

Index(['node_id', 'sample_id'], dtype='object')

In [33]:
df5.dtypes.unique()

array([dtype('int64')], dtype=object)

In [34]:
df5.dtypes.value_counts()

int64    2
Name: count, dtype: int64

In [35]:
df5.describe()

Unnamed: 0,node_id,sample_id
count,2617.0,2617.0
mean,278.90256,2632.898739
std,196.89052,1940.828582
min,2.0,0.0
25%,131.0,1242.0
50%,262.0,2273.0
75%,422.0,3371.0
max,738.0,7771.0


In [36]:
df5.duplicated().sum()

np.int64(0)

In [37]:
pd.DataFrame(df5.isna().sum())

Unnamed: 0,0
node_id,0
sample_id,0


# 6

In [38]:

df6 = pd.read_csv('/Users/raynasser/Downloads/DSproj/FooDru/sample.csv', index_col= 'Unnamed: 0')

df6

Unnamed: 0,sample_id,study_id,treatment,origin_type,origin_name,GSM,time_point,concentration
0,0,0,cranberry extract,cell type,endothelial cells,129143,6 hours,-
1,1,0,grapeseed extract,cell type,endothelial cells,129152,6 hours,-
2,2,0,control,cell type,endothelial cells,129146,6 hours,-
3,3,0,red wine extract,cell type,endothelial cells,129149,6 hours,-
4,4,0,cranberry extract,cell type,endothelial cells,129147,6 hours,-
...,...,...,...,...,...,...,...,...
4554,7774,260,Control,cell line,HL-60,15637,-,-
4555,7775,260,Control,cell line,HL-60,15453,-,-
4556,7776,260,Control,cell line,HL-60,15634,-,-
4557,7777,260,Control,cell line,HL-60,15575,-,-


In [39]:
# df.info()

df6.columns

Index(['sample_id', 'study_id', 'treatment', 'origin_type', 'origin_name',
       'GSM', 'time_point', 'concentration'],
      dtype='object')

In [40]:
df6.dtypes.unique()

array([dtype('int64'), dtype('O')], dtype=object)

In [41]:
df6.dtypes.value_counts()

object    5
int64     3
Name: count, dtype: int64

In [42]:
df6.describe(include=['O'])

Unnamed: 0,treatment,origin_type,origin_name,time_point,concentration
count,4371,4531,4531,4371,4371
unique,298,8,153,84,136
top,control,tissue,PBMC,-,0
freq,1164,1856,396,730,1337


In [43]:
df6.duplicated().sum()

np.int64(0)

In [44]:
pd.DataFrame(df6.isna().sum())

Unnamed: 0,0
sample_id,0
study_id,0
treatment,188
origin_type,28
origin_name,28
GSM,0
time_point,188
concentration,188


In [45]:
df_comp = df.dtypes.value_counts().rename('df#1')
df2_comp = df2.dtypes.value_counts().rename('df#2')
df3_comp = df3.dtypes.value_counts().rename('df#3')
df4_comp = df4.dtypes.value_counts().rename('df#4')
df5_comp = df5.dtypes.value_counts().rename('df#5')
df6_comp = df6.dtypes.value_counts().rename('df#6')



dtype_comparison = pd.concat([df_comp, df2_comp, df3_comp, df4_comp, df5_comp, df6_comp], axis=1)



dtype_comparison = dtype_comparison.fillna(0).astype(int)



dtype_comparison


Unnamed: 0,df#1,df#2,df#3,df#4,df#5,df#6
int64,2,3,1,1,2,3
float64,1,0,0,0,0,0
object,0,5,2,2,0,5


In [46]:
df_comp = df.shape
df2_comp = df2.shape
df3_comp = df3.shape
df4_comp = df4.shape
df5_comp = df5.shape
df6_comp = df6.shape



shape_comparison = pd.DataFrame({
    'DataFrame': ['df', 'df2', 'df3', 'df4', 'df5', 'df6'],
    'Rows': [df_comp[0], df2_comp[0], df3_comp[0], df4_comp[0], df5_comp[0], df6_comp[0]],
    'Columns': [df_comp[1], df2_comp[1], df3_comp[1], df4_comp[1], df5_comp[1], df6_comp[1]]
})


total_rows = shape_comparison['Rows'].sum()
total_columns = shape_comparison['Columns'].sum()




shape_comparison = shape_comparison.append({
    'DataFrame': 'Total',
    'Rows': total_rows,
    'Columns': total_columns
}, ignore_index=True)



shape_comparison

AttributeError: 'DataFrame' object has no attribute 'append'

In [47]:
# pipe =


df7 = pd.read_csv('/Users/raynasser/Downloads/DSproj/FooDru/study.csv', index_col='Unnamed: 0')
df7

df8 = pd.read_csv('/Users/raynasser/Downloads/DSproj/FooDru/TM_interactions.csv', index_col='Unnamed: 0')
df8

df9 = pd.read_csv('/Users/raynasser/Downloads/DSproj/FooDru/topTable.csv', index_col='Unnamed: 0')
df9

df10 = pd.read_csv('/Users/raynasser/Downloads/DSproj/FooDru/texts.csv', index_col='Unnamed: 0')
df10

  df8 = pd.read_csv('/Users/raynasser/Downloads/DSproj/FooDru/TM_interactions.csv', index_col='Unnamed: 0')


Unnamed: 0,texts_ID,document,link,citation
0,0,Moderate Food Interaction||GENERALLY AVOID: G...,https://www.drugs.com/food-interactions/abemac...,
1,2,Major Food Interaction||GENERALLY AVOID: Alco...,https://www.drugs.com/food-interactions/fentan...,
2,7,Moderate Food Interaction||MONITOR: Coadminis...,https://www.drugs.com/food-interactions/paclit...,
3,11,Moderate Drug Interaction||GENERALLY AVOID: E...,https://www.drugs.com/food-interactions/codein...,
4,13,Moderate Drug Interaction||GENERALLY AVOID: T...,https://www.drugs.com/food-interactions/dextro...,
...,...,...,...,...
8346,8441,The Use of a Continuous Glucose Monitoring Sys...,https://clinicaltrials.gov/ct2/show/NCT04385862,
8347,8442,Glycemic Optimization On Discharge From the Em...,https://clinicaltrials.gov/ct2/show/NCT05197829,
8348,8443,PANDA: PKU Amino Acid Evaluation Phenylketonur...,https://clinicaltrials.gov/ct2/show/NCT04086511,
8349,8444,SNAP: Study Nutrients in Adult PKU Phenylketon...,https://clinicaltrials.gov/ct2/show/NCT03858101,


In [48]:
df8[df8["texts_ID"]==8441]

Unnamed: 0,TM_interactions_ID,texts_ID,start_index,end_index,food,drug
15017,15060.0,8441.0,406.0,540.0,insulin,d-glucose


In [49]:
df8.food.map(lambda x:  type(x))

0          <class 'str'>
1          <class 'str'>
2          <class 'str'>
3          <class 'str'>
4          <class 'str'>
               ...      
1108424    <class 'str'>
1108425    <class 'str'>
1108426    <class 'str'>
1108427    <class 'str'>
1108428    <class 'str'>
Name: food, Length: 1108433, dtype: object

In [50]:
from pandas.api.types import is_numeric_dtype

s_ = df8.food.map(lambda x: 0 if type(x)==type(4.4) else 25)

In [51]:
df8['weird'] = s_

In [52]:
df8[df8.weird==0]

Unnamed: 0,TM_interactions_ID,texts_ID,start_index,end_index,food,drug,weird
6017,6017.0,2738.0,1587.0,1731.0,,atypical antipsychotics,0
143465,144080.0,73218.0,859.0,1103.0,,fentanyl,0
143466,144081.0,73218.0,859.0,1103.0,,nalbuphine,0
179086,179888.0,89993.0,917.0,974.0,,deoxyadenosine,0
189411,190261.0,93827.0,917.0,974.0,,deoxyadenosine,0
210692,211542.0,101837.0,147.0,668.0,,MBH-ME,0
260508,261358.0,120647.0,1041.0,1108.0,,serotonin,0
303719,304569.0,136823.0,1350.0,1599.0,,calcium,0
311516,312366.0,139756.0,651.0,915.0,,amiloride-induced,0
311517,312367.0,139756.0,651.0,915.0,,amiloride,0


In [53]:
s_[s_==25]

0          25
1          25
2          25
3          25
4          25
           ..
1108424    25
1108425    25
1108426    25
1108427    25
1108428    25
Name: food, Length: 1108387, dtype: int64

In [54]:
df8.food.map(lambda).unique()

SyntaxError: invalid syntax (933624930.py, line 1)

In [56]:
!pip install transformers torch pytesseract

Collecting transformers
  Downloading transformers-4.45.2-py3-none-any.whl.metadata (44 kB)
Collecting torch
  Downloading torch-2.4.1-cp310-none-macosx_11_0_arm64.whl.metadata (26 kB)
Collecting pytesseract
  Downloading pytesseract-0.3.13-py3-none-any.whl.metadata (11 kB)
Collecting filelock (from transformers)
  Downloading filelock-3.16.1-py3-none-any.whl.metadata (2.9 kB)
Collecting huggingface-hub<1.0,>=0.23.2 (from transformers)
  Using cached huggingface_hub-0.25.2-py3-none-any.whl.metadata (13 kB)
Collecting regex!=2019.12.17 (from transformers)
  Using cached regex-2024.9.11-cp310-cp310-macosx_11_0_arm64.whl.metadata (40 kB)
Collecting safetensors>=0.4.1 (from transformers)
  Downloading safetensors-0.4.5-cp310-cp310-macosx_11_0_arm64.whl.metadata (3.8 kB)
Collecting tokenizers<0.21,>=0.20 (from transformers)
  Downloading tokenizers-0.20.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.7 kB)
Collecting tqdm>=4.27 (from transformers)
  Using cached tqdm-4.66.5-py3-none-any.whl

In [59]:
from transformers import pipeline

# Load the zero-shot classification model
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

# Text to classify
# text = "This is a description of a new medication for treating high blood pressure."

# # Labels to classify into
# labels = ["food", "drug"]

# # Classify the text
# result = classifier(text, candidate_labels=labels)

# print(result)


KeyboardInterrupt: 

In [63]:
pipe = pipeline('question-answering', model= 'deepset/roberta-base-squad2')


res9 = pipe(question = 'Can you tell me if this is a food?',context="insuline")
res9

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


{'score': 0.005783424712717533, 'start': 0, 'end': 8, 'answer': 'insuline'}

In [70]:
import requests
import json

def is_food(text):

    # Define the API key and endpoint
    API_KEY = os.environ.get("API_KEY")
    API_URL = os.environ.get("API_URL")

    # Set up headers for the API request
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {API_KEY}',
    }

    # Define the prompt or conversation for the model
    data = {
        "model": "gpt-3.5-turbo",
        "messages": [
            {"role": "system", "content": "You are ChatGPT, a helpful assistant."},
            {"role": "user", "content": f"Can you tell me if '{text}' is a food or a drug? only answer by food or drug"}
        ]
    }

    # Send the API request
    response = requests.post(API_URL, headers=headers, data=json.dumps(data))

    # Check if the request was successful
    if response.status_code == 200:
        result = response.json()
        print("Response from ChatGPT:", result['choices'][0]['message']['content'])
        return result['choices'][0]['message']['content'].lower()
    else:
        print(f"Error: {response.status_code}")
        print(response.text)
        return ''


In [149]:
!pip install openai

Collecting openai
  Downloading openai-1.51.2-py3-none-any.whl.metadata (24 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Using cached distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.6.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (5.2 kB)
Collecting pydantic<3,>=1.9.0 (from openai)
  Using cached pydantic-2.9.2-py3-none-any.whl.metadata (149 kB)
Collecting annotated-types>=0.6.0 (from pydantic<3,>=1.9.0->openai)
  Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.23.4 (from pydantic<3,>=1.9.0->openai)
  Using cached pydantic_core-2.23.4-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.6 kB)
Downloading openai-1.51.2-py3-none-any.whl (383 kB)
Using cached distro-1.9.0-py3-none-any.whl (20 kB)
Downloading jiter-0.6.1-cp310-cp310-macosx_11_0_arm64.whl (301 kB)
Using cached pydantic-2.9.2-py3-none-any.whl (434 kB)
Using cached pydantic_core-2.23.4-cp310-cp310-macosx_11_0_arm64.whl (

In [234]:
import openai

def tau_score(text):

    # Define the API key and endpoint
    API_KEY = os.environ.get("API_KEY")


    # Set up headers for the API request
    client = openai.OpenAI(api_key=API_KEY)


    # run = client.beta.threads.runs.create(
    # thread_id="thread_1LsGTC2QCPTf1AU0S1fVMvZW",
    # assistant_id="asst_fjluyXIaT4Q7tNQ3nJrybazU",
    # additional_instructions=text
    # #extra_query=text
    # )


    completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "assistant", "content": """You are tasked with elevating interaction between food and drugs fron a summary from a study document.
Analyze and identify the food and the drug in the text and the kind of relationship between the two of them(improving health, neutral, degrading health).
Return a tau score that will reflect the power of the interaction between the 2 elements and also if it's negative or positive
return the result through a json object, The response must be in the following JSON format:
            {
                "food": "",
                "drug": "",
                "relationship": "",
                "tau": 0
            }
            Return only the JSON object. """} ,
        {
            "role": "user",
            "content": text
        }
    ],
        response_format= { "type": "json_object" }
)

    return json.loads(completion.choices[0].message.content)

In [235]:
comp_ = tau_score("High levels of potassium can cause hyperkalemia in some patients who are taking angiotensin inhibitors.")

In [236]:
comp_["tau"]

-0.7

In [215]:
comp_

ChatCompletion(id='chatcmpl-AIvkQWQK2PGxDEkQxyuoXAZNgHdhv', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='{\n  "food": "potassium",\n  "drug": "angiotensin inhibitors",\n  "relationship": "degrading health",\n  "tau": -0.7\n}', refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1729075250, model='gpt-4o-2024-08-06', object='chat.completion', service_tier=None, system_fingerprint='fp_a20a4ee344', usage=CompletionUsage(completion_tokens=38, prompt_tokens=131, total_tokens=169, completion_tokens_details=CompletionTokensDetails(audio_tokens=None, reasoning_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=None, cached_tokens=0)))

In [217]:
json.loads(comp_.choices[0].message.content)

{'food': 'potassium',
 'drug': 'angiotensin inhibitors',
 'relationship': 'degrading health',
 'tau': -0.7}

In [190]:
content_ = comp_.choices[0].message.content

In [197]:
start_index = content_.find('{')  # Add 1 to exclude the start_char
end_index = content_.find("}")+1
json.load(comp_.choices[0].message.content[start_index:end_index].replace("""\n""",""))

AttributeError: 'str' object has no attribute 'read'

In [169]:
completion = openai.ChatCompletion.create(
    model="gpt-4-0613",
    messages=[{"role": "user", "content": "Tell me about matches that took place in Italy between 1980 up until the end of the 20th century"}],
    functions=[
    {
        "name": "get_matches",
        "description": "Return the rows in a DataFrame about women's football games which satisfy the criteria",
        "parameters": {
            "type": "object",
            "properties": {
                "country": {
                    "type": "string",
                    "description": "The name of the country the matches took place e.g. France or China",
                },
                "start_year": {
                    "type": "number",
                    "description": "The year to begin filtering from e.g. 1956",
                },
                "end_year": {
                    "type": "number",
                    "description": "The year to end filtering on e.g. 2005"}
            },
            "required": ["location", "start_year", "end_year"],
        },
    }
],
function_call="auto",
)

APIRemovedInV1: 

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742


In [76]:
df8.food.map(lambda x: x.lower)

AttributeError: 'float' object has no attribute 'lower'

In [74]:
pd.Series(df8.food..unique())

0                                     Grapefruit
1                                   carbohydrate
2                                     grapefruit
3                                        Ethanol
4                                        ethanol
                          ...                   
55858                                        CFC
55859                                    TPP+C10
55860        D-[1-(3)H]glucosamine hydrochloride
55861                           heparan sulphate
55862    6-hydroxy-1-methylindole-3-acetonitrile
Length: 55863, dtype: object

In [71]:
food_or_drug = df8.food.map(is_food)

Response from ChatGPT: Food
Response from ChatGPT: food
Response from ChatGPT: food
Response from ChatGPT: Food
Response from ChatGPT: food
Response from ChatGPT: food
Response from ChatGPT: Food
Response from ChatGPT: Food
Response from ChatGPT: food
Response from ChatGPT: Ethanol is categorized as a drug.
Response from ChatGPT: Drug
Response from ChatGPT: Drug
Response from ChatGPT: Ethanol is a drug.
Response from ChatGPT: Food
Response from ChatGPT: Food
Response from ChatGPT: Food
Response from ChatGPT: Food
Response from ChatGPT: drug
Response from ChatGPT: Drug
Response from ChatGPT: Food
Response from ChatGPT: food
Response from ChatGPT: Food
Response from ChatGPT: Food
Response from ChatGPT: drug
Response from ChatGPT: Food
Response from ChatGPT: Food
Response from ChatGPT: food
Response from ChatGPT: food
Response from ChatGPT: food.
Response from ChatGPT: Drug
Response from ChatGPT: food
Response from ChatGPT: food
Response from ChatGPT: Sodium is a food.
Response from ChatG

KeyboardInterrupt: 

In [80]:
df8.dropna(subset=['food'])["food"].map(str.lower).unique()[:5]

array(['grapefruit', 'carbohydrate', 'ethanol', 'calcium', 'iron'],
      dtype=object)

In [81]:
is_f__ =  pd.Series(df8.dropna(subset=['food'])["food"].map(str.lower).unique()[:5]).map(is_food)

Response from ChatGPT: Food
Response from ChatGPT: food
Response from ChatGPT: drug
Response from ChatGPT: Food
Response from ChatGPT: food


In [82]:
is_f__

0    food
1    food
2    drug
3    food
4    food
dtype: object

In [86]:
df10.document.map(lambda x:"interaction" in x.lower()).sum()/len(df10)

np.float64(0.2630822655969345)

In [91]:
df8

Unnamed: 0,TM_interactions_ID,texts_ID,start_index,end_index,food,drug,weird
0,0.0,0.0,0.0,116.0,Grapefruit,abemaciclib,25
1,1.0,0.0,1943.0,2220.0,carbohydrate,abemaciclib,25
2,2.0,0.0,2289.0,2392.0,grapefruit,abemaciclib,25
3,3.0,2.0,385.0,641.0,grapefruit,fentanyl,25
4,4.0,2.0,643.0,846.0,grapefruit,fentanyl,25
...,...,...,...,...,...,...,...
1108424,1109274.0,439431.0,1565.0,1740.0,l-methionine,putrescine,25
1108425,1109275.0,439432.0,154.0,310.0,water,hydrogen,25
1108426,1109276.0,439432.0,154.0,310.0,oxygen,hydrogen,25
1108427,1109277.0,439432.0,1009.0,1179.0,iron,SOx,25


In [111]:
summ_ =df10.merge(df8,on="texts_ID").tail(50).apply(lambda x: x["document"][int(x["start_index"]):int(x["end_index"])], axis=1).map(summarize)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Your max_length is set to 62, but your input_length is only 53. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=26)
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Your max_length is set

In [115]:
summ_

14976    [{'summary_text': 'Two genes that control the ...
14977    [{'summary_text': ' The University of Bristol ...
14978    [{'summary_text': ' The fourth dose of a vacci...
14979    [{'summary_text': ' The number of patients wit...
14980    [{'summary_text': 'Research into cocaine addic...
14981    [{'summary_text': 'Progesterone has been used ...
14982    [{'summary_text': 'Progesterone has been used ...
14983    [{'summary_text': 'Progesterone has been used ...
14984    [{'summary_text': 'Research is under way at th...
14985    [{'summary_text': ' The first drug to treat br...
14986    [{'summary_text': ' The first drug to treat br...
14987    [{'summary_text': ' A study has shown that a l...
14988    [{'summary_text': ' Atorvastatin, a cholestero...
14989    [{'summary_text': 'Scientists have developed a...
14990    [{'summary_text': ' BTXA is an anti-cancer dru...
14991    [{'summary_text': ' The use of zinc supplement...
14992    [{'summary_text': ' The use of zinc supplement.

In [116]:
for i in range(14976,15025):
    print(summ_[i][0]["summary_text"])

Two genes that control the amount of vitamin K E.
 The University of Bristol is conducting a study into the effects of epidural blood patch on headache.
 The fourth dose of a vaccine against the Omicron virus has been shown to slow the immune response.
 The number of patients with acute heart failure (ADHF) has increased significantly in recent years.
Research into cocaine addiction has been published in the journal Addiction.
Progesterone has been used to treat the effects of cocaine on women during the menstrual cycle.
Progesterone has been used to treat the effects of cocaine on women during the menstrual cycle.
Progesterone has been used to treat the effects of cocaine on women during the menstrual cycle.
Research is under way at the University of Aberdeen in the United States.
 The first drug to treat breast cancer has been approved by the US Food and Drug Administration.
 The first drug to treat breast cancer has been approved by the US Food and Drug Administration.
 A study has 

In [119]:
summ_ =df10.merge(df8,on="texts_ID")#.tail(50).apply(lambda x: x["document"][int(x["start_index"]):int(x["end_index"])], axis=1).map(summarize)

In [121]:
df_inter = df10.merge(df8,on="texts_ID").drop(columns=["link","citation","weird"])
df_inter.shq

In [125]:
df_inter.shape

(15026, 7)

In [130]:
def lower(text):
    try:
        return text.lower()
    except:
        return text

In [132]:
df_inter["food"] = df_inter.food.map(lower)
df_inter["drug"] = df_inter.drug.map(lower)


In [1]:
df_inter.iloc[0]

NameError: name 'df_inter' is not defined

In [142]:
#get text

df_inter["clean_text"] = df_inter.apply(lambda x: x["document"][int(x["start_index"]):int(x["end_index"])], axis=1)

In [143]:
df_combo =   df_inter.groupby(by="texts_ID").agg({"food":lambda x: x.mode()[0],"drug":lambda x: x.mode()[0],"clean_text":lambda x: '. '.join(x)}).reset_index()

In [146]:
summ_ = df_combo.head(50).clean_text.map(summarize)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Your max_length is set to 62, but your input_length is only 23. Since this is a summarization task, where outputs shorter th

In [148]:
for i in range(14):
    print(summ_[i][0]["summary_text"])

 A high-calorie meal may increase the concentration of the drug abemaciclib, according to research published in the British Journal of Oncology.
 The World Health Organization (WHO) warns that the consumption of grapefruit juice during treatment with the opioid fentanyl may increase the risk of an adverse reaction to the drug.
 AUC (AUC) for docetaxel, a drug used to treat leukaemia, has been reduced to 35% after a trial with grapefruit juice.
 Ethanol can be used as a treatment for painkiller overdoses.
Dextromethorphan and ethanol can be used as a treatment for alcohol addiction.
Iron salts may reduce the absorption of tetracyclines when they are combined with other drugs such as doxycycline.
 A small amount of iron can be used as a treatment for iron poisoning.
 A single dose of the drug acalabrutinib can be administered with a high-calorie meal, a study in the journal Thorax suggests.
High levels of potassium can cause hyperkalemia in some patients who are taking angiotensin inhibi

In [224]:
df_combo_50 = df_combo.head(50)

In [226]:
df_combo_50["summarize"] = df_combo_50.clean_text.map(summarize)


Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Your max_length is set to 62, but your input_length is only 23. Since this is a summarization task, where outputs shorter th

BadRequestError: Error code: 400 - {'error': {'message': "Missing required parameter: 'messages[1].content[0].type'.", 'type': 'invalid_request_error', 'param': 'messages[1].content[0].type', 'code': 'missing_required_parameter'}}

In [229]:
df_combo_50

Unnamed: 0,texts_ID,food,drug,clean_text,summarize
0,0,grapefruit,abemaciclib,Moderate Food Interaction||GENERALLY AVOID: G...,A high-calorie meal may increase the concentr...
1,2,grapefruit,fentanyl,GENERALLY AVOID: Consumption of grapefruit ju...,The World Health Organization (WHO) warns tha...
2,7,grapefruit,paclitaxel,Moderate Food Interaction||MONITOR: Coadminis...,"AUC (AUC) for docetaxel, a drug used to treat..."
3,11,ethanol,opioid analgesics,Moderate Drug Interaction||GENERALLY AVOID: E...,Ethanol can be used as a treatment for painki...
4,13,ethanol,dextromethorphan,Moderate Drug Interaction||GENERALLY AVOID: T...,Dextromethorphan and ethanol can be used as a ...
5,14,calcium,tetracycline,The calcium content of these foods forms nonab...,"Scientists are warning that certain foods, suc..."
6,15,iron,doxycycline,Moderate Drug Interaction||GENERALLY AVOID: T...,Iron salts may reduce the absorption of tetrac...
7,18,iron,ascorbic acid,Some studies suggest administration of iron wi...,A small amount of iron can be used as a treat...
8,19,grapefruit,acalabrutinib,Major Food Interaction||GENERALLY AVOID: Grap...,A single dose of the drug acalabrutinib can b...
9,20,potassium,ace inhibitors,Moderate Food Interaction||GENERALLY AVOID: M...,High levels of potassium can cause hyperkalemi...


In [237]:
df_combo_50["tau"] = df_combo_50.summarize.map(tau_score)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_combo_50["tau"] = df_combo_50.summarize.map(tau_score)


In [238]:
df_combo_50

Unnamed: 0,texts_ID,food,drug,clean_text,summarize,tau
0,0,grapefruit,abemaciclib,Moderate Food Interaction||GENERALLY AVOID: G...,A high-calorie meal may increase the concentr...,"{'food': 'high-calorie meal', 'drug': 'abemaci..."
1,2,grapefruit,fentanyl,GENERALLY AVOID: Consumption of grapefruit ju...,The World Health Organization (WHO) warns tha...,"{'food': 'grapefruit juice', 'drug': 'fentanyl..."
2,7,grapefruit,paclitaxel,Moderate Food Interaction||MONITOR: Coadminis...,"AUC (AUC) for docetaxel, a drug used to treat...","{'food': 'grapefruit juice', 'drug': 'docetaxe..."
3,11,ethanol,opioid analgesics,Moderate Drug Interaction||GENERALLY AVOID: E...,Ethanol can be used as a treatment for painki...,"{'food': 'Ethanol', 'drug': 'Painkillers', 're..."
4,13,ethanol,dextromethorphan,Moderate Drug Interaction||GENERALLY AVOID: T...,Dextromethorphan and ethanol can be used as a ...,"{'food': 'ethanol', 'drug': 'dextromethorphan'..."
5,14,calcium,tetracycline,The calcium content of these foods forms nonab...,"Scientists are warning that certain foods, suc...","{'food': 'bananas', 'drug': '', 'relationship'..."
6,15,iron,doxycycline,Moderate Drug Interaction||GENERALLY AVOID: T...,Iron salts may reduce the absorption of tetrac...,"{'food': '', 'drug': 'tetracyclines', 'relatio..."
7,18,iron,ascorbic acid,Some studies suggest administration of iron wi...,A small amount of iron can be used as a treat...,"{'food': 'iron', 'drug': 'iron', 'relationship..."
8,19,grapefruit,acalabrutinib,Major Food Interaction||GENERALLY AVOID: Grap...,A single dose of the drug acalabrutinib can b...,"{'food': 'high-calorie meal', 'drug': 'acalabr..."
9,20,potassium,ace inhibitors,Moderate Food Interaction||GENERALLY AVOID: M...,High levels of potassium can cause hyperkalemi...,"{'food': 'potassium', 'drug': 'angiotensin inh..."


In [243]:
df_combo_50["tau_score"] = df_combo_50["tau"].map(lambda x:x["tau"])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_combo_50["tau_score"] = df_combo_50["tau"].map(lambda x:x["tau"])


In [None]:
for index, row in df.iterrows():
    print(f"Row {index}: Food={row['food']}, Drug={row['drug']}, Relationship={row['relationship']}, Tau={row['tau']}")

In [248]:
for index,row in df_combo_50[["summarize","tau_score"]].iterrows():
    print(f"""TAU Score:{row["tau_score"]} Description :{row["summarize"]} """)

TAU Score:0.4 Description : A high-calorie meal may increase the concentration of the drug abemaciclib, according to research published in the British Journal of Oncology. 
TAU Score:-0.75 Description : The World Health Organization (WHO) warns that the consumption of grapefruit juice during treatment with the opioid fentanyl may increase the risk of an adverse reaction to the drug. 
TAU Score:-0.35 Description : AUC (AUC) for docetaxel, a drug used to treat leukaemia, has been reduced to 35% after a trial with grapefruit juice. 
TAU Score:0.7 Description : Ethanol can be used as a treatment for painkiller overdoses. 
TAU Score:0.7 Description :Dextromethorphan and ethanol can be used as a treatment for alcohol addiction. 
TAU Score:-0.7 Description :Iron salts may reduce the absorption of tetracyclines when they are combined with other drugs such as doxycycline. 
TAU Score:0.5 Description : A small amount of iron can be used as a treatment for iron poisoning. 
TAU Score:0.0 Descriptio

In [120]:
df10.merge(df8,on="texts_ID")#.tail(50).apply(lambda x: x["document"][int(x["start_index"]):int(x["end_index"])], axis=1).map(summarize)

Unnamed: 0,texts_ID,document,link,citation,TM_interactions_ID,start_index,end_index,food,drug,weird
0,0,Moderate Food Interaction||GENERALLY AVOID: G...,https://www.drugs.com/food-interactions/abemac...,,0.0,0.0,116.0,Grapefruit,abemaciclib,25
1,0,Moderate Food Interaction||GENERALLY AVOID: G...,https://www.drugs.com/food-interactions/abemac...,,1.0,1943.0,2220.0,carbohydrate,abemaciclib,25
2,0,Moderate Food Interaction||GENERALLY AVOID: G...,https://www.drugs.com/food-interactions/abemac...,,2.0,2289.0,2392.0,grapefruit,abemaciclib,25
3,2,Major Food Interaction||GENERALLY AVOID: Alco...,https://www.drugs.com/food-interactions/fentan...,,3.0,385.0,641.0,grapefruit,fentanyl,25
4,2,Major Food Interaction||GENERALLY AVOID: Alco...,https://www.drugs.com/food-interactions/fentan...,,4.0,643.0,846.0,grapefruit,fentanyl,25
...,...,...,...,...,...,...,...,...,...,...
15021,8443,PANDA: PKU Amino Acid Evaluation Phenylketonur...,https://clinicaltrials.gov/ct2/show/NCT04086511,,15064.0,0.0,268.0,phenylalanine,Phenylalanine,25
15022,8443,PANDA: PKU Amino Acid Evaluation Phenylketonur...,https://clinicaltrials.gov/ct2/show/NCT04086511,,15065.0,269.0,434.0,diet,Phenylketonuria,25
15023,8444,SNAP: Study Nutrients in Adult PKU Phenylketon...,https://clinicaltrials.gov/ct2/show/NCT03858101,,15066.0,0.0,270.0,phenylalanine,Phenylalanine,25
15024,8444,SNAP: Study Nutrients in Adult PKU Phenylketon...,https://clinicaltrials.gov/ct2/show/NCT03858101,,15067.0,271.0,436.0,diet,Phenylketonuria,25


In [99]:
def summarize(text):
    pipe = pipeline('summarization', model= 'sshleifer/distilbart-xsum-12-6')


    return pipe(text)


In [33]:
str(df10[df10.texts_ID==8441].document.values)[406:540]

'. Indeed, the many capillary blood glucose measurements required to adjust the IV insulin doses involve direct contact with the patien'

In [88]:
def clasic(text):

    pipe = pipeline('question-answering', model= 'deepset/roberta-base-squad2')


    res9 = pipe(question = 'Decribe the interaction between food and the drug here?',context=text)
    return res9

In [90]:
clasic(df10.document[0])

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


{'score': 0.149311363697052,
 'start': 1887,
 'end': 1940,
 'answer': 'modest effects on the pharmacokinetics of abemaciclib'}

In [134]:
df7#.iloc[33]

Unnamed: 0,study_id,contributor,entrez_id,accession,gpl,title,abstract,study_type,publication_date,pubmed_id
0,0,"Mark,R,Pothecary;Delphine,M,Lees;Noorafza,Q,Kh...",200005556,5556,570,Treatment of cultured human aortic endothelial...,Human aortic endothelial cells were grown in c...,One color array,Public on Feb 17 2007,
1,1,"Laura,M,Beaver;Alex,,Buchanan;Emily,,Ho",200048812,48812,11154,Genome-wide transcriptome analysis of the diet...,Epidemiological studies provide strong evidenc...,HTS,Public on Jul 11 2014,25044704.0
2,2,"Mahmoud,A,Mohammad;Agneta,L,Sunehag;Morey,W,Ha...",200051874,51874,6104,Gene Expression in the Human Mammary Epitheliu...,Mammary gland (MG) de novo lipogenesis contrib...,One color array,Public on May 19 2014,24496312.0
3,3,"María-Teresa,G,Conesa;Jesús,G,Cantalejo;Juan,A...",200015322,15322,570,Comparative transcriptomic profiling of human ...,We used microarrays to investigate gene expres...,One color array,Public on Mar 21 2009,19728713.0
4,4,"Michelle,S,Liberio;Martin,C,Sadowski;Rohan,A,D...",200074212,74212,16604,The ascidian natural product eusynstyelamide B...,As part of an anti-cancer natural product drug...,One color array,Public on Oct 21 2015,26733491.0
...,...,...,...,...,...,...,...,...,...,...
145,251,"Steve,,Rodriguez;Clemens,B,Hug;Sarah,A,Boswell...",2000164788,164788,18573,Machine Learning Identifies Candidates for Dru...,Clinical trials of novel therapeutics for Alzh...,HTS,Public on Jan 14 2021,33589615.0
146,253,"Birandra,K,Sinha",2000138442,138442,4133,Effects of ascorbic acid on topotecan-induced ...,To investigate mechanisms and differential gen...,One color array,Public on Jul 08 2020,32765594.0
147,255,"Yingxiu,,Peng;Yanfeng,,Xu",2000201455,201455,20795,Effects of corosolic acid on gene expression o...,The purpose of this study was to investigate g...,HTS,Public on Sep 15 2022,36038536.0
148,258,"D,,Ballesteros-Vivas;G,,Alvarez-Rivera;C,,León...",2000133686,133686,13607,Foodomics study on the anti-proliferative acti...,Passiflora mollissima commonly known as “banan...,Two channel array,Public on May 24 2022,32156385.0


In [135]:
df8.iloc[33]

TM_interactions_ID             33.0
texts_ID                       21.0
start_index                   872.0
end_index                    1140.0
food                           salt
drug                  ACE inhibitor
Name: 33, dtype: object

In [136]:
df8.food.nunique()

55862

In [137]:
df3.iloc[33]

sample_id                         58
attribute_name                  diet
attribute_value    high carbohydrate
Name: 33, dtype: object

In [138]:
df.iloc[33]

node_id           482.0000
cmap_node_id    25914.0000
tau               -16.9001
Name: 33, dtype: float64

In [139]:
# for i in range(10):
#     print(f'\n{i}\n')
#     display(df.iloc[33])


dfs = [df, df2, df3, df4, df5, df6, df7, df8, df9, df10]

for i, current_df in enumerate(dfs):
    print(f'\n\nDataFrame {i+1}\n')
    display(current_df.iloc[33])



DataFrame 1



node_id           482.0000
cmap_node_id    25914.0000
tau               -16.9001
Name: 33, dtype: float64



DataFrame 2



sample_id                                      33
study_id                                        1
treatment                                 control
origin_type                             cell type
origin_name      Normal prostate epithelial cells
GSM                                       1185125
time_point                                6 hours
concentration                                   0
Name: 33, dtype: object



DataFrame 3



sample_id                         58
attribute_name                  diet
attribute_value    high carbohydrate
Name: 33, dtype: object



DataFrame 4



study_id                  23
attribute_name     cell line
attribute_value     Huh7.5.1
Name: 33, dtype: object



DataFrame 5



node_id         2
sample_id    1169
Name: 33, dtype: int64



DataFrame 6



sample_id                                      33
study_id                                        1
treatment                                 control
origin_type                             cell type
origin_name      Normal prostate epithelial cells
GSM                                       1185125
time_point                                6 hours
concentration                                   0
Name: 33, dtype: object



DataFrame 7



study_id                                                           36
contributor         Mikhail,G,Dozmorov;Qing,,Yang;Weijuan,,Wu;Jona...
entrez_id                                                   200053171
accession                                                       53171
gpl                                                              6883
title               Differential actions of selective frankincense...
abstract            Background: Frankincense (Ru Xiang) and sandal...
study_type                                            One color array
publication_date                                Public on Jun 10 2014
pubmed_id                                                  25006348.0
Name: 33, dtype: object



DataFrame 8



TM_interactions_ID             33.0
texts_ID                       21.0
start_index                   872.0
end_index                    1140.0
food                           salt
drug                  ACE inhibitor
Name: 33, dtype: object



DataFrame 9



entrez_Id              10036
logFC                3.95283
AveExpr             -2.25039
moderated_t          3.70526
P_value             0.022834
adjusted_P_value     0.64697
B                   -4.58562
node_id                  127
geneSet                   UP
top150                    UP
Name: 33, dtype: object



DataFrame 10



texts_ID                                                   56
document    Moderate Food Interaction||GENERALLY AVOID:  A...
link        https://www.drugs.com/food-interactions/tadala...
citation                                                  NaN
Name: 33, dtype: object

In [140]:
# df8[(df8['food'] == 'salt') & (df8['drug'] == 'ACE inhibitor')]

df8[df8['food'] == 'coffee']
df8[df8['drug'] == 'vitamine']

Unnamed: 0,TM_interactions_ID,texts_ID,start_index,end_index,food,drug
949040,949890.0,379521.0,0.0,130.0,Berberine,vitamine


In [141]:
# drugs = pd.DataFrame(df8.drug)
# drugs[3]

# display(df8['drug'].unique())

# Get unique values
unique_drugs = pd.DataFrame(df8['drug'].unique(), columns=['drug'])
unique_drugs


# unique_drugs.to_csv('unique_foods.csv', index=True)


Unnamed: 0,drug
0,abemaciclib
1,fentanyl
2,paclitaxel
3,docetaxel
4,opioid analgesics
...,...
179445,D-[1-(3)H]glucosamine hydrochloride
179446,eNOS activator
179447,Kynurenine-3-monooxygenase
179448,Ro618048


In [142]:


unique_foods = pd.DataFrame(df8['food'].unique(), columns=['food'])
unique_foods

Unnamed: 0,food
0,Grapefruit
1,carbohydrate
2,grapefruit
3,Ethanol
4,ethanol
...,...
55858,CFC
55859,TPP+C10
55860,D-[1-(3)H]glucosamine hydrochloride
55861,heparan sulphate


In [143]:
df7.columns.unique()

Index(['study_id', 'contributor', 'entrez_id', 'accession', 'gpl', 'title',
       'abstract', 'study_type', 'publication_date', 'pubmed_id'],
      dtype='object')

In [144]:
df8

Unnamed: 0,TM_interactions_ID,texts_ID,start_index,end_index,food,drug
0,0.0,0.0,0.0,116.0,Grapefruit,abemaciclib
1,1.0,0.0,1943.0,2220.0,carbohydrate,abemaciclib
2,2.0,0.0,2289.0,2392.0,grapefruit,abemaciclib
3,3.0,2.0,385.0,641.0,grapefruit,fentanyl
4,4.0,2.0,643.0,846.0,grapefruit,fentanyl
...,...,...,...,...,...,...
1108424,1109274.0,439431.0,1565.0,1740.0,l-methionine,putrescine
1108425,1109275.0,439432.0,154.0,310.0,water,hydrogen
1108426,1109276.0,439432.0,154.0,310.0,oxygen,hydrogen
1108427,1109277.0,439432.0,1009.0,1179.0,iron,SOx


In [145]:
df9

Unnamed: 0,entrez_Id,logFC,AveExpr,moderated_t,P_value,adjusted_P_value,B,node_id,geneSet,top150
0,100093630,3.122490,-2.346420,3.016380,0.042166,0.646970,-4.58691,127,,UP
1,100128553,3.817080,-0.885943,3.679910,0.023325,0.646970,-4.58566,127,,UP
2,100129198,4.064070,-2.393640,2.930100,0.066737,0.646970,-4.59030,127,,UP
3,100129424,5.484730,-0.733114,3.954370,0.032983,0.646970,-4.58938,127,,UP
4,100129502,6.221530,-2.078580,4.485580,0.024114,0.646970,-4.58909,127,,UP
...,...,...,...,...,...,...,...,...,...,...
2799417,9766,-0.001559,3.231450,-0.123885,0.901633,0.999973,-6.45261,3,,
2799418,9775,-0.003250,3.565260,-0.319026,0.750315,0.999973,-6.41007,3,,
2799419,9794,-0.004357,3.382320,-0.337476,0.736404,0.999973,-6.40411,3,,
2799420,9956,-0.149142,1.978230,-0.337482,0.736400,0.999973,-6.40411,3,,


In [146]:
df10.document[21][872:1140]

'm, and/or underlying cardiovascular disorders such as coronary insufficiency, cardiac arrhythmias, or hypertension.  The recommended dosages should not be exceeded.'

In [147]:
df10.document[21][393:601]

' is mediated by beta- 1 receptors and thus less likely to occur with beta-2-selective agents such as albuterol.  However, beta-2-selectivity is not absolute and can be lost with larger doses.  High dosages of'

In [22]:
df8[df8.food == 'coffee']
df8[(df8['food'] == 'coffee') & (df8['drug'] == 'calcium')]


df8[(df8['food'] == 'salt') & (df8['drug'] == 'ACE inhibitor')]


Unnamed: 0,TM_interactions_ID,texts_ID,start_index,end_index,food,drug
31,31.0,21.0,393.0,601.0,salt,ACE inhibitor
33,33.0,21.0,872.0,1140.0,salt,ACE inhibitor


In [149]:
df8[df8.drug == 'vitamine']

Unnamed: 0,TM_interactions_ID,texts_ID,start_index,end_index,food,drug
949040,949890.0,379521.0,0.0,130.0,Berberine,vitamine


In [150]:
drugs = pd.DataFrame(df8.drug)
drugs

Unnamed: 0,drug
0,abemaciclib
1,abemaciclib
2,abemaciclib
3,fentanyl
4,fentanyl
...,...
1108424,putrescine
1108425,hydrogen
1108426,hydrogen
1108427,SOx


In [151]:
drugs.duplicated().sum()

928983

In [152]:

'{:,}'.format(drugs.shape[0])

'1,108,433'

In [153]:
drugs.drop_duplicates(inplace=True)

In [67]:
'{:,}'.format(drugs.shape[0])

'179,450'

In [160]:
drugs =drugs.reset_index().drop(columns='index').reset_index().rename(columns={"index":"ID"})

In [165]:
drugs

Unnamed: 0,ID,drug
0,0,abemaciclib
1,1,fentanyl
2,2,paclitaxel
3,3,docetaxel
4,4,opioid analgesics
...,...,...
179445,179445,D-[1-(3)H]glucosamine hydrochloride
179446,179446,eNOS activator
179447,179447,Kynurenine-3-monooxygenase
179448,179448,Ro618048


In [80]:
drugs['ID'] = range(1, len(drugs) + 1)

In [87]:
drugs.set_index("ID").reset_index()

Unnamed: 0,ID,drug
0,1,abemaciclib
1,2,fentanyl
2,3,paclitaxel
3,4,docetaxel
4,5,opioid analgesics
...,...,...
179445,179446,D-[1-(3)H]glucosamine hydrochloride
179446,179447,eNOS activator
179447,179448,Kynurenine-3-monooxygenase
179448,179449,Ro618048


In [69]:
drugs.set_index('ID', inplace=False)
drugs

Unnamed: 0,drug,ID
0,abemaciclib,1
3,fentanyl,2
6,paclitaxel,3
7,docetaxel,4
9,opioid analgesics,5
...,...,...
1108398,D-[1-(3)H]glucosamine hydrochloride,179446
1108402,eNOS activator,179447
1108407,Kynurenine-3-monooxygenase,179448
1108408,Ro618048,179449


In [163]:
drugs.to_csv('/Users/raynasser/Downloads/DSproj/FooDru/unique_drugs.csv')

In [166]:
uni_drug = pd.read_csv('/Users/raynasser/Downloads/DSproj/FooDru/unique_drugs.csv',index_col="ID")
uni_drug

Unnamed: 0_level_0,Unnamed: 0,drug
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0,abemaciclib
1,1,fentanyl
2,2,paclitaxel
3,3,docetaxel
4,4,opioid analgesics
...,...,...
179445,179445,D-[1-(3)H]glucosamine hydrochloride
179446,179446,eNOS activator
179447,179447,Kynurenine-3-monooxygenase
179448,179448,Ro618048


In [72]:
uni_drug.shape

(179450, 2)

In [73]:
uni_drug.columns

Index(['Unnamed: 0', 'drug'], dtype='object')

In [74]:
foods = pd.DataFrame(df8.food)
foods

foods.duplicated().sum()

'{:,}'.format(foods.shape[0])


'1,108,433'

In [75]:
foods.drop_duplicates(inplace=True)

'{:,}'.format(foods.shape[0])


'55,863'

In [76]:
foods['ID'] = range(1, len(foods) + 1)

# foods.set_index('ID', inplace=True)
foods


Unnamed: 0,food,ID
0,Grapefruit,1
1,carbohydrate,2
2,grapefruit,3
9,Ethanol,4
10,ethanol,5
...,...,...
1108349,CFC,55859
1108368,TPP+C10,55860
1108397,D-[1-(3)H]glucosamine hydrochloride,55861
1108403,heparan sulphate,55862


In [77]:
foods.to_csv('/Users/raynasser/Downloads/DSproj/FooDru/unique_foods.csv')

In [78]:
uni_food = pd.read_csv('/Users/raynasser/Downloads/DSproj/FooDru/unique_foods.csv', index_col='ID')
uni_food

Unnamed: 0_level_0,Unnamed: 0,food
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,0,Grapefruit
2,1,carbohydrate
3,2,grapefruit
4,9,Ethanol
5,10,ethanol
...,...,...
55859,1108349,CFC
55860,1108368,TPP+C10
55861,1108397,D-[1-(3)H]glucosamine hydrochloride
55862,1108403,heparan sulphate
