## Demo #1 PEPFAR Documentation RAG

This demo demonstrates how you can chat with documents using Retrieval Augmented Generation (RAG) https://python.langchain.com/v0.2/docs/tutorials/rag/.

### Single document chat

Import libraries...

In [33]:
import os

from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from PyPDF2 import PdfReader
from langchain.text_splitter import CharacterTextSplitter 
from langchain.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

from dotenv import load_dotenv

load_dotenv()

True

Get secrets from your .env file...

In [10]:
os.environ["AZURE_OPENAI_VERSION"] = os.environ.get("AZURE_OPENAI_VERSION")
os.environ["AZURE_OPENAI_DEPLOYMENT"] = os.environ.get("AZURE_OPENAI_DEPLOYMENT")
os.environ["AZURE_OPENAI_ENDPOINT"] = os.environ.get("AZURE_OPENAI_ENDPOINT")
os.environ["AZURE_OPENAI_API_KEY"] = os.environ.get("AZURE_OPENAI_KEY")

Import a pdf document...

In [20]:
pdf = open("resources/PEPFAR-2023-Country-and-Regional-Operational-Plan.pdf", 'rb')

In [22]:
def get_pdf_text(pdf_docs):
    text = ""  # variable to store text
    pdf_reader = PdfReader(pdf)  # create pdf object
    for page in pdf_reader.pages:  # loop through pdfs
        text += page.extract_text()  # add text to text
    return text

raw_text = get_pdf_text(pdf)

Chunk your document...

In [24]:
    text_splitter = CharacterTextSplitter(
        separator="\n",
        chunk_size=1000,  # number of characters
        chunk_overlap=200,
        length_function=len
    )
    chunked_text = text_splitter.split_text(raw_text)

Create and store embeddings into a vectorstore...

In [25]:
    embeddings = AzureOpenAIEmbeddings()
    vectorstore = FAISS.from_texts(texts=chunked_text, embedding=embeddings)

Set up your LLM...

In [26]:
azure_llm = AzureChatOpenAI(
    api_version=os.environ["AZURE_OPENAI_VERSION"],
    azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"]
)

Implement retrieval from vectorstore...

In [31]:
retriever = vectorstore.as_retriever()

In [None]:
# Define template for prompts
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

In [None]:
prompt = ChatPromptTemplate.from_template(template)

In [35]:
    chain = (
        {"context": retriever, "questions": RunnablePassthrough()}
        | prompt
        | azure_llm
        | StrOutputParser()
    )

Ask!

In [36]:
chain.invoke("What is this doc about?")

'This document is about various aspects of public health, specifically focusing on HIV/AIDS treatment and prevention. It discusses person-centered care, antiretroviral therapy, the importance of aligning health plans with national priorities and strategies to end HIV/AIDS as a public health threat by 2030. It also covers the implementation of innovative distribution models to modernize supply chains, optimization of diagnostic networks, and the integration of quality assurance practices into site and program management. The document appears to be a scholarly or professional report with multiple cited sources.'

In [37]:
chain.invoke("What are the priorities for 2023?")

'The priorities for 2023, as mentioned in the document, include sustainability and partnerships. PEPFAR will begin to assess and explore opportunities to sustain DREAMSâ€™s aims and interventions for the long term while working closely with local partners in government, civil society, communities, the private sector, and adolescent girls and young women. They also aim to partner with multilateral, foundation, and private sector donors to provide economic and educational opportunities, and incorporate evidence-based interventions into local structures. Moreover, addressing the need for new partnerships, behavioral/social science gaps, and enablers such as data and community engagement is also a priority.'

In [38]:
chain.invoke("What are PEPFAR countries asked to do?")

"PEPFAR-supported countries are asked to develop data governance policies and arrangements for managing and sharing PEPFAR-supported staffing data with governments. They are also encouraged to support regional and national preparedness capacity to rapidly mobilize the frontline health workforce in pandemic responses and to maintain essential services. Additionally, these countries are asked to leverage PEPFAR's workforce during pandemic responses to maintain high-quality essential healthcare services. They should also track and use HRH (Human Resources for Health) data, strengthen their public health institutions, and promote public health security and responsiveness. Furthermore, they need to have robust national surveillance for HIV/AIDS and other public health threats. They are also asked to strengthen their government health workforce investments and improve the alignment of PEPFAR investments with their HRH staffing and other public health system priorities. Lastly, they are asked

Showing citations...

This langchain how-to shows how you can add citations to the results: https://python.langchain.com/v0.2/docs/how_to/qa_citations/#setup

### Multidocument chat