# **Using Chroma vector store and GPT3.5 Turbo as LLM**
Please make sure to execute first LangChainRAG/Embedding-OpenAI-Chroma.ipynb to embed our medical documents. This Notebook is merely applying GPT3.5 Turbo as LLM.

In [49]:
!pip -q install langchain openai chromadb sentence_transformers evaluate rouge_score bert_score bleu_score

Collecting bert_score
  Downloading bert_score-0.3.13-py3-none-any.whl (61 kB)
     ---------------------------------------- 61.1/61.1 kB 3.2 MB/s eta 0:00:00
Installing collected packages: bert_score
Successfully installed bert_score-0.3.13


ERROR: Could not find a version that satisfies the requirement bleu_score (from versions: none)
ERROR: No matching distribution found for bleu_score


In [25]:
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma

## **OpenAI Authenticatation**
We use OpenAIs GPT3.5 Turbo. Make sure to have balance on your OpenAI Dashboard and create a personal secret key at https://platform.openai.com/api-keys.

In [4]:
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

········


## **Load Chroma**
We first load the Chroma vector database.

In [26]:
import os

persist_directory = "./Chroma/chroma_openai"
# Create the directory if it does not exist
if not os.path.exists(persist_directory):
    print(f"Please execute first LangChainRAG/Embedding-OpenAI-Chroma.ipynb, we didn't find any Chroma vector storage.")
else:
    print(f"Directory '{persist_directory}' exists, perfect!")


Directory './Chroma/chroma_openai' exists, perfect!


In [23]:
query = "What is the proposed optical biosensor based on?"
db3 = Chroma(persist_directory=persist_directory, embedding_function=OpenAIEmbeddings())

## **Initialize GPT 3.5 Turbo and prompt query**

In [20]:
from langchain import hub
from langchain_openai import ChatOpenAI

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


retriever = db3.as_retriever() # print(dir(db3)) to get all functions, attributes
prompt = hub.pull("rlm/rag-prompt")
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)


In [21]:
rag_chain.invoke("Why is the compatibility of drugs with excipients important in pharmaceutical formulations? And how can machine learning aid exactly?")

'The compatibility of drugs with excipients is crucial in pharmaceutical formulations to ensure stability, efficacy, and safety of the medication. Machine learning can aid in drug formulation development by predicting pharmaceutical formulations, optimizing drug delivery, and accelerating drug discovery through accurate predictions and informed decision-making. Deep learning, specifically convolutional neural networks, can excel in image analysis for biomarker identification and drug formulation optimization.'