### Medication Information Search Engine

### This Project is Divided into 4 parts

1. Part 1. Data Pre-processing

2. Part 2. Information Retrieval Model Development

3. Part 3. API Development and Deployment

4. Documentation


### Table of Contents


The table of contents for this project is as follows:-

1. [The problem statement](#1)
2. [Import libraries](#2)
3. [Import dataset](#3)
4. [Exploratory data analysis](#4)
    -   [View dimensions of dataset](#4.1)
    -   [View data info](#4.2)
    -   [View number of unique values across data](#4.3)
    -   [Summary Statistic](#4.4) 
    -   [View % of Missing Values ](#4.5)
    -   [Handling Missing Values ](#4.6)
5. [Modelling Using LLM ](#5)
    -   [Explaning the flow of the system ](#5.1)
    -   [Import Document LLM Libaries](#5.2)
    -   [Load DataFrame with a DataFrameLoader](#5.3)
    -   [Initialize the Openai LLM model ](#5.4)
    -   [Create a vectore store to use as index](#5.5)
    -   [Create a prompt template](#5.6)
    -   [Test the Medical information extractor](#5.7)
    -   [Implement a LLM Agent to extract info](#5.8)
    -   [Test the Medical information extractor with an Agent](#5.9)
6. [Create a FastAPI Endpoint for our Model ](#6)
   -   [Test model   Endpoint](#6.1)
7. [Bonus Part](#7.0)
   -   [Using CSV_AGENT](#7.1)



## 1. Part 1. Data Pre-processing

### 1. The problem statement <a class="anchor" id="1"></a>

In the fast-paced pharmaceutical industry, a leading company faces a critical challenge: developing an advanced drug information retrieval system. This system aims to provide healthcare professionals with quick access to comprehensive information about drugs, including side effects, drug classes, activity levels, pregnancy categories, and more. As a highly skilled Machine Learning Engineer, your task is to build this system within a strict 1-week timeline.

### 2. Import libraries <a class="anchor" id="2"></a>

In [1]:
# required libraries
import pandas as pd
from langchain.document_loaders import DataFrameLoader

# pd.set_option('display.max_rows', None)  # Display all rows
# pd.set_option('display.max_columns', None)  # Display all columns

import warnings
warnings.filterwarnings("ignore")

### 3. Import dataset <a class="anchor" id="3"></a>


The next step is to import the dataset.

In [2]:
df = pd.read_csv("Drugs Master List.csv")
df.head()

Unnamed: 0,drug_name,medical_condition,side_effects,generic_name,drug_classes,brand_names,activity,rx_otc,pregnancy_category,csa,alcohol,related_drugs,medical_condition_description,rating,no_of_reviews,drug_link,medical_condition_url
0,doxycycline,Acne,"(hives, difficult breathing, swelling in your ...",doxycycline,"Miscellaneous antimalarials, Tetracyclines","Acticlate, Adoxa CK, Adoxa Pak, Adoxa TT, Alod...",87%,Rx,D,N,X,amoxicillin: https://www.drugs.com/amoxicillin...,Acne Other names: Acne Vulgaris; Blackheads; B...,6.8,760.0,https://www.drugs.com/doxycycline.html,https://www.drugs.com/condition/acne.html
1,spironolactone,Acne,hives ; difficulty breathing; swelling of your...,spironolactone,"Aldosterone receptor antagonists, Potassium-sp...","Aldactone, CaroSpir",82%,Rx,C,N,X,amlodipine: https://www.drugs.com/amlodipine.h...,Acne Other names: Acne Vulgaris; Blackheads; B...,7.2,449.0,https://www.drugs.com/spironolactone.html,https://www.drugs.com/condition/acne.html
2,minocycline,Acne,"skin rash, fever, swollen glands, flu-like sym...",minocycline,Tetracyclines,"Dynacin, Minocin, Minolira, Solodyn, Ximino, V...",48%,Rx,D,N,,amoxicillin: https://www.drugs.com/amoxicillin...,Acne Other names: Acne Vulgaris; Blackheads; B...,5.7,482.0,https://www.drugs.com/minocycline.html,https://www.drugs.com/condition/acne.html
3,Accutane,Acne,problems with your vision or hearing; muscle o...,isotretinoin (oral),"Miscellaneous antineoplastics, Miscellaneous u...",,41%,Rx,X,N,X,doxycycline: https://www.drugs.com/doxycycline...,Acne Other names: Acne Vulgaris; Blackheads; B...,7.9,623.0,https://www.drugs.com/accutane.html,https://www.drugs.com/condition/acne.html
4,clindamycin,Acne,hives ; difficult breathing; swelling of your ...,clindamycin topical,"Topical acne agents, Vaginal anti-infectives","Cleocin T, Clindacin ETZ, Clindacin P, Clindag...",39%,Rx,B,N,,doxycycline: https://www.drugs.com/doxycycline...,Acne Other names: Acne Vulgaris; Blackheads; B...,7.4,146.0,https://www.drugs.com/mtm/clindamycin-topical....,https://www.drugs.com/condition/acne.html


### Exploratory data analysis <a class="anchor" id="4"></a>

We can analyze our data using exploratory data analysis techniques to reveal underlying insights that may be concealed.


#### View dimensions of dataset <a class="anchor" id="4.1"></a>


In [3]:
df.shape

(999, 17)

#### View data info  <a class="anchor" id="4.2"></a>

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 999 entries, 0 to 998
Data columns (total 17 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   drug_name                      999 non-null    object 
 1   medical_condition              999 non-null    object 
 2   side_effects                   971 non-null    object 
 3   generic_name                   987 non-null    object 
 4   drug_classes                   978 non-null    object 
 5   brand_names                    613 non-null    object 
 6   activity                       999 non-null    object 
 7   rx_otc                         999 non-null    object 
 8   pregnancy_category             927 non-null    object 
 9   csa                            999 non-null    object 
 10  alcohol                        504 non-null    object 
 11  related_drugs                  491 non-null    object 
 12  medical_condition_description  999 non-null    obj

#### View number of unique values across data  <a class="anchor" id="4.3"></a>

In [5]:
df.nunique()

drug_name                        997
medical_condition                 13
side_effects                     962
generic_name                     463
drug_classes                     117
brand_names                      543
activity                          81
rx_otc                             3
pregnancy_category                 5
csa                                6
alcohol                            1
related_drugs                    190
medical_condition_description     13
rating                            70
no_of_reviews                    168
drug_link                        999
medical_condition_url             13
dtype: int64

#### Summary Statistic <a class="anchor" id="4.4"></a>

We can only view summary statistics for numeric columns

In [6]:
df.describe()

Unnamed: 0,rating,no_of_reviews
count,507.0,507.0
mean,6.978107,89.796844
std,2.187752,185.003942
min,0.0,1.0
25%,5.8,2.0
50%,7.2,11.0
75%,8.5,74.5
max,10.0,1471.0


#### View % of Missing Values <a class="anchor" id="4.5"></a>

In [7]:
def missing_values_table(df, print_info = False):
    """
    Returns a table and percentage of Missing values in a data set.
    """
    # Total missing values
    mis_val = df.isnull().sum()

    # Percentage of missing values
    mis_val_percent = 100 * df.isnull().sum() / len(df)

    # Make a table with the results
    mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)

    # Rename the columns
    mis_val_table_ren_columns = mis_val_table.rename(
    columns = {0 : 'Missing Values', 1 : '% of Total Values'})

    # Sort the table by percentage of missing descending
    mis_val_table_ren_columns = mis_val_table_ren_columns[
        mis_val_table_ren_columns.iloc[:,1] != 0].sort_values(
    '% of Total Values', ascending=False).round(1)

    if print_info:
        # Print some summary information
        print ("Your selected dataframe has " + str(df.shape[1]) + " columns.\n"      
            "There are " + str(mis_val_table_ren_columns.shape[0]) +
              " columns that have missing values.")

    # Return the dataframe with missing information
    return mis_val_table_ren_columns

In [8]:
missing_values = missing_values_table(df)
missing_values

Unnamed: 0,Missing Values,% of Total Values
related_drugs,508,50.9
alcohol,495,49.5
rating,492,49.2
no_of_reviews,492,49.2
brand_names,386,38.6
pregnancy_category,72,7.2
side_effects,28,2.8
drug_classes,21,2.1
generic_name,12,1.2


#### Handling Missing Values <a class="anchor" id="4.5"></a>

We will explore how to fill in missing data across each columns one step at a time by using our informed ideas about other columns



* Since we expect the same **drug_names** to have similar **related_drugs**, **brand_names**, **side_effect**, **generic_names**, **ratings**, and **drug_classes**, we can used this informed idea to fill in features with same **drug_name**

In [9]:
df['drug_name'].value_counts()

drug_name
minocycline       2
erythromycin      2
doxycycline       1
beclomethasone    1
Zithromax         1
                 ..
Edurant           1
Lexiva            1
nevirapine        1
Retrovir          1
Histafed          1
Name: count, Length: 997, dtype: int64

let's take "minocycline" and "erythromycin" as an example

In [10]:
filtered_df = df[df['drug_name'].isin(['minocycline', 'erythromycin'])]
sorted_df = filtered_df.sort_values(by='drug_name', ascending=False)
sorted_df


Unnamed: 0,drug_name,medical_condition,side_effects,generic_name,drug_classes,brand_names,activity,rx_otc,pregnancy_category,csa,alcohol,related_drugs,medical_condition_description,rating,no_of_reviews,drug_link,medical_condition_url
2,minocycline,Acne,"skin rash, fever, swollen glands, flu-like sym...",minocycline,Tetracyclines,"Dynacin, Minocin, Minolira, Solodyn, Ximino, V...",48%,Rx,D,N,,amoxicillin: https://www.drugs.com/amoxicillin...,Acne Other names: Acne Vulgaris; Blackheads; B...,5.7,482.0,https://www.drugs.com/minocycline.html,https://www.drugs.com/condition/acne.html
97,minocycline,Acne,"hives , itching, severe rash; swollen glands, ...",minocycline (mucous membrane powder),Mouth and throat products,Arestin,1%,Rx,D,N,,,Acne Other names: Acne Vulgaris; Blackheads; B...,9.0,1.0,https://www.drugs.com/mtm/minocycline-mucous-m...,https://www.drugs.com/condition/acne.html
39,erythromycin,Acne,hives ; difficult breathing; swelling of your ...,erythromycin topical,"Topical acne agents, Topical antibiotics","Emcin Clear, Erygel, Theramycin Z, A/T/S, Eryc...",5%,Rx,B,N,,,Acne Other names: Acne Vulgaris; Blackheads; B...,10.0,1.0,https://www.drugs.com/mtm/erythromycin-topical...,https://www.drugs.com/condition/acne.html
668,erythromycin,Bronchitis,"severe stomach pain, diarrhea that is watery o...",erythromycin (oral/injection),Macrolides,"E.E.S. Granules, Ery-Tab, Erythrocin Lactobion...",9%,Rx,B,N,X,amoxicillin: https://www.drugs.com/amoxicillin...,Bronchitis Bronchitis is a type of infection t...,7.5,4.0,https://www.drugs.com/erythromycin.html,https://www.drugs.com/condition/bronchitis.html


We can see that **minocycline** and **erythromycin** have similar column values, we can use this to fill in other column whose values are missing

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 999 entries, 0 to 998
Data columns (total 17 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   drug_name                      999 non-null    object 
 1   medical_condition              999 non-null    object 
 2   side_effects                   971 non-null    object 
 3   generic_name                   987 non-null    object 
 4   drug_classes                   978 non-null    object 
 5   brand_names                    613 non-null    object 
 6   activity                       999 non-null    object 
 7   rx_otc                         999 non-null    object 
 8   pregnancy_category             927 non-null    object 
 9   csa                            999 non-null    object 
 10  alcohol                        504 non-null    object 
 11  related_drugs                  491 non-null    object 
 12  medical_condition_description  999 non-null    obj

In [12]:
column_names = ['side_effects', 'generic_name', 'drug_classes', 'brand_names', 'pregnancy_category','related_drugs',
               ]

In [13]:
df[column_names] = df.groupby('drug_name', group_keys=False)[column_names].apply(lambda x: x.ffill().bfill())

In [14]:
###Check missing value percentage again
missing_values = missing_values_table(df)
missing_values

Unnamed: 0,Missing Values,% of Total Values
related_drugs,506,50.7
alcohol,495,49.5
rating,492,49.2
no_of_reviews,492,49.2
brand_names,386,38.6
pregnancy_category,72,7.2
side_effects,28,2.8
drug_classes,21,2.1
generic_name,12,1.2


we can clearly see that "drug_name" has droped from 50.9 to 50.7

* Drop columns with a unque counts of  1 and other target columns

In [15]:
# count the number of unique values in each column
unique_counts = df.nunique()

# get the column names with only one unique value
to_drop = unique_counts[unique_counts == 1].index

# # drop the columns
final_df = df.drop(to_drop, axis=1)

In [16]:
final_df.columns

Index(['drug_name', 'medical_condition', 'side_effects', 'generic_name',
       'drug_classes', 'brand_names', 'activity', 'rx_otc',
       'pregnancy_category', 'csa', 'related_drugs',
       'medical_condition_description', 'rating', 'no_of_reviews', 'drug_link',
       'medical_condition_url'],
      dtype='object')

Alchohol Column has been droped due to fact that the values across the column are the same.

## Part 2. Information Retrieval Model Development

### Modelling Using Large Language Model <a class="anchor" id="5"></a>

We Don't need to apply Text prepocessing due to the fact that the embedding libary "FAISS" handles that before storing the medical embedding in a vector database.

For this reasons, we will skip this processes, Tokenization, Stop word removal, Stemming, Word embeddings, and creating a Transformer based network for information retrieval.

#### Explaning the flow of the system <a class="anchor" id="5.1"></a>

![OpenAI Logo](https://raw.githubusercontent.com/okoliechykwuka/portfolio/master/pipeline.png)


Below are the steps required to extract information from the medical Records based on the input qprompt.

1. We first split our contents into chunks of data.

2. Extract embedding from this chunks(Vector Representation of our text)

3. Store the extracted embedding in a Vector database store.

4. The embedding contains semantic meaning of our text.

5. When a user ask questions, we extract embedding of this questions, and perform a semantic search on the vector database.

6. When we find the section of the text in the Vectore store relating to our query, we pass to the LLM Generative model.

7. The LLM generates a response to the User.


#### Import Document LLM Libaries <a class="anchor" id="5.2"></a>

In [17]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
# Import things that are needed generically
from langchain.prompts import PromptTemplate
from langchain.agents import initialize_agent, Tool

import os
###Load API Keys
from dotenv import load_dotenv
load_dotenv()

os.environ["LANGCHAIN_TRACING"] = "true"

#### Load DataFrame with a DataFrameLoader <a class="anchor" id="5.3"></a>

In [18]:
loader = DataFrameLoader(final_df, page_content_column="drug_name")

#### Initialize the Openai LLM model <a class="anchor" id="5.4"></a>

In [19]:
# from langchain.chat_models import ChatOpenAI

llm = OpenAI(
    model_name="gpt-3.5-turbo",
    temperature=0.5,
    max_tokens=512,
)

#### Create a vectore store to use as index <a class="anchor" id="5.5"></a>

In [30]:
#If you encounter any error on this part, generate a new Openai Api Key
#https://openai.com/blog/openai-api

In [20]:
documents = loader.load()
# split the documents into chunks
text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=50,separator="" )
texts = text_splitter.split_documents(documents)
# select which embeddings we want to use
embeddings = OpenAIEmbeddings()

# create the vectorestore to use as the index
medical_store = Chroma.from_documents(texts, embeddings, collection_name="support")
retriever = medical_store.as_retriever(search_type="similarity", search_kwargs={"k": 2} )

#### Create a prompt template <a class="anchor" id="5.6"></a>

Prompt template helps us to instruct the LLM model on how to process our queries and the response to return

In Our case, we feed in a question(query and focus) and the Language Model returns a Json Response.


**Question**

             {"query": 
               ["I want information on doxycycline"],
                "focus": 
                    ["side_effects", "drug_classes"]
                }
    
**Response**

                {
                  "results": {
                    "drug_name": "Doxycycline",
                    "generic_name": "Doxycycline",
                    "side_effects": ["Nausea","Vomiting","Diarrhea","Rash","Photosensitivity"],
                    "drug_classes": [
                      "Tetracycline Antibiotic"
                    ],
                    "brand_names": ["Adoxa",Doryx","Monodox","Oracea","Vibramycin"],
                    "activity": "Unknown",
                    "related_drugs": ["Minocycline","Tetracycline",zithromycin"],
                    "rating": 7.1,
                    "no_of_reviews": 998,
                    "drug_link": "https://www.drugs.com/doxycycline.html",
                    "CSA": "N"
                  }
                }
                          
                    `

In [21]:
support_template =  \
   """As a Medical Information Retrieval bot, your goal is to provide accurate and helpful information about medical records. 
       Always search for answers using only the provided tools; do not make up answers yourself, If the requested information is unavailable, use "unknown" as the value.

        Below is the response format, Note: This is just a format for your response and not the actual response.
        Only use keys that are relevant to the question asked. The maximum number of keys should be 8, and the minimum number of keys should be 5.


                {{
    
                  "results": {{
                    "drug_name": "Paracetamol",
                    "generic_name": "Acetaminophen",
                    "side_effects": ["Headache", "Nausea", "Stomach pain"],
                    "drug_classes": ["Analgesic", "Antipyretic"],
                    "brand_names": ["Acticlate", "Adoxa CK", "Adoxa Pak", "Adoxa TT", "Alodox"],
                    "activity": "87%",
                    "related_drugs": ["Amoxicillin", "Prednisone", "Albuterol"],
                    "rating": 6.8,
                    "no_of_reviews": 760,
                    "drug_link": "https://www.drugs.com/doxycycline.html",
                    "CSA": "N"
                          }}
                  }}


            {context}

            Question: {question}

                    """
SUPPORT_PROMPT = PromptTemplate.from_template(
    template=support_template, #input_variables=["question"],#validate_template=False
)
support_qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs={"prompt": SUPPORT_PROMPT},
)


In [22]:
yes = '{"query": ["I want information on doxycycline"],"focus":["side_effects", "drug_classes"]}'

#### Test the Medical information extractor <a class="anchor" id="5.7"></a>

In [23]:
print(support_qa.run(yes))



{
    
    "results": {
        "drug_name": "Doxycycline",
        "generic_name": "Doxycycline",
        "side_effects": ["Nausea", "Vomiting", "Diarrhea", "Loss of appetite", "Photosensitivity", "Headache"],
        "drug_classes": ["Tetracycline Antibiotic"],
        "brand_names": ["Acticlate", "Adoxa", "Doryx", "Monodox", "Oracea", "Vibramycin"],
        "activity": "Unknown",
        "related_drugs": ["Minocycline", "Tetracycline", "Azithromycin"],
        "rating": 6.4,
        "no_of_reviews": 537,
        "drug_link": "https://www.drugs.com/doxycycline.html",
        "CSA": "N"
    }
}


#### Implement a LLM Agent to extract info <a class="anchor" id="5.8"></a>

We now have an instructions and stores for  support responses. We can now assign an agent to perform thsi response for us.

In [26]:
# the zero-shot-react-description agent will use the "description" string to select tools
tools = [
    Tool(
        name = "support",
        func=support_qa.run,
        description="""
        useful for when a user is interested in various medical information.
        A user is not asking for any debugging, but only interested in medical infromations rendered in a Json format.
        Input should be a fully formed question.
        """
    ),
]

In [27]:
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

#### Test the Medical information extractor with an Agent <a class="anchor" id="5.9"></a>

In [28]:
result = agent.run(yes)






[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThe user is requesting information on the drug doxycycline, specifically related to side effects and drug classes.
Action: support
Action Input: "What are the side effects and drug classes of doxycycline?"[0m
Observation: [36;1m[1;3m{
    
  "results": {
    "drug_name": "Doxycycline",
    "side_effects": ["Nausea", "Vomiting", "Diarrhea", "Rash", "Photosensitivity"],
    "drug_classes": ["Tetracycline Antibiotic"],
  }
} 

Note: The provided information is not exhaustive and may not include all possible side effects or drug classes. Please consult a healthcare professional for more detailed information.[0m
Thought:[32;1m[1;3mThe user now has the requested information on doxycycline's side effects and drug classes.
Final Answer: The side effects of doxycycline are nausea, vomiting, diarrhea, rash, and photosensitivity, and its drug class is tetracycline antibiotic.[0m

[1m> Finished chain.[0m




In [29]:
result

'The side effects of doxycycline are nausea, vomiting, diarrhea, rash, and photosensitivity, and its drug class is tetracycline antibiotic.'

## 3. Part 3. API Development and Deployment

### Create a FastAPI Endpoint for our Model <a class="anchor" id="6"></a>

In [29]:
from fastapi import FastAPI
from pydantic import BaseModel
from typing import List, Optional
from fastapi.responses import JSONResponse
import json
app = FastAPI()

class Question(BaseModel):
    query: Optional[List[str]] = None
    focus: Optional[List[str]] = None
        
@app.get("/")
def root():
    return {"message": "EndPoint is working perfectly!"}

@app.post("/question/")
def create_question(question: Question):
    # Process the received Question
    # You can access the item attributes like question.query, question.focus
    #Pass the question to the agent
    result = support_qa.run(question.json())
    print(json.loads(result))
    
    return json.loads(result)


#### Test model   Endpoint<a class="anchor" id="6.1"></a>

Open http://127.0.0.1:8000/docs#/default to extract information from document.

**Request Body**

             {"query": 
               ["I want information on doxycycline"],
                "focus": 
                    ["side_effects", "drug_classes"]
                }
    
**Response Body**

                {
                  "results": {
                    "drug_name": "Doxycycline",
                    "generic_name": "Doxycycline",
                    "side_effects": ["Nausea","Vomiting","Diarrhea","Rash","Photosensitivity"],
                    "drug_classes": [
                      "Tetracycline Antibiotic"
                    ],
                    "brand_names": ["Adoxa",Doryx","Monodox","Oracea","Vibramycin"],
                    "activity": "Unknown",
                    "related_drugs": ["Minocycline","Tetracycline",zithromycin"],
                    "rating": 7.1,
                    "no_of_reviews": 998,
                    "drug_link": "https://www.drugs.com/doxycycline.html",
                    "CSA": "N"
                  }
                }


In [31]:
import nest_asyncio
import uvicorn

nest_asyncio.apply()

#Open http://127.0.0.1:8000/docs#/default to extract information from document
uvicorn.run(app, host="0.0.0.0", port=8000)


INFO:     Started server process [21568]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)


INFO:     127.0.0.1:63205 - "GET /docs HTTP/1.1" 200 OK
INFO:     127.0.0.1:63205 - "GET /openapi.json HTTP/1.1" 200 OK




{'results': {'drug_name': 'Doxycycline', 'generic_name': 'Doxycycline', 'side_effects': ['Nausea', 'Vomiting', 'Diarrhea', 'Rash', 'Photosensitivity'], 'drug_classes': ['Tetracycline Antibiotic'], 'brand_names': ['Adoxa', 'Doryx', 'Monodox', 'Oracea', 'Vibramycin'], 'activity': 'Unknown', 'related_drugs': ['Minocycline', 'Tetracycline', 'Azithromycin'], 'rating': 7.1, 'no_of_reviews': 998, 'drug_link': 'https://www.drugs.com/doxycycline.html', 'CSA': 'N'}}
INFO:     127.0.0.1:63234 - "POST /question/ HTTP/1.1" 200 OK


INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [21568]


In [34]:
yes = """{"query": ["I want information on Accutane"],"focus":["side_effects", "drug_classes"]}, \
          Below is the response format. Note: This is just a format for your response and not the actual response.
          Only use keys that are relevant to the question asked. The maximum number of keys should be 8, and the minimum number of keys should be 5. 
          If the requested information is unavailable, use "unknown" as the value.
                    {"results": {
            "drug_name": "Paracetamol",
            "generic_name": "Acetaminophen",
            "side_effects": ["Headache", "Nausea", "Stomach pain"],
            "drug_classes": ["Analgesic", "Antipyretic"],
            "brand_names": ["Acticlate", "Adoxa CK", "Adoxa Pak", "Adoxa TT"],
            "activity": "87%",
            "related_drugs": ["Amoxicillin", "Prednisone", "Albuterol"],
            "rating": 6.8,
            "no_of_reviews": 760,
            "drug_link": "https://www.drugs.com/doxycycline.html",
            "CSA": "N"
                  }}
          """

#### Bonus Part <a class="anchor" id="7.0"></a>


We can extract information from our medical file using the `create_csv_agent` api, with this api, we can accurately extract info based on the input prompt.


#### Using CSV_AGENT <a class="anchor" id="7.1"></a>


In [32]:
from langchain.agents import create_csv_agent
from langchain.llms import OpenAI

In [33]:
agent = create_csv_agent(llm, 'Drugs Master List.csv', verbose=True)

 

In [36]:
agent.run(yes)