In [1]:
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma 
# import chromadb
# from langchain_chroma import Chroma
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever
from tqdm.notebook import tqdm

In [2]:
local_path = "../pdf/BILLS-119hr1eh.pdf"

if local_path:
    loader = UnstructuredPDFLoader(file_path=local_path)
    data = loader.load()
else:
    print("Upload a PDF file for processing.")

In [3]:
len(data[0].page_content)

1136279

In [4]:
#Split and chunk the data
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [5]:
# Add the chunks to vector database, which takes the model for creating the embeddings.
vector_db = Chroma.from_documents(
                                    documents=chunks, 
                                    embedding=OllamaEmbeddings(model="nomic-embed-text", show_progress=True),
                                    collection_name="local-rag"
                                )

  embedding=OllamaEmbeddings(model="nomic-embed-text", show_progress=True),
Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given
OllamaEmbeddings: 100%|█████████████████████| 155/155 [00:47<00:00,  3.25it/s]


In [6]:
local_llm = "llama3.1"
llm = ChatOllama(model=local_llm)

QUERY_PROMPT = PromptTemplate(
    input_variables = ["question"],
    template="""You are an AI Language model assistant. Your task is to generate five different versions of the given user question to retrieve relavant documents from a vector databaase. By generating multiple perspectives on the user question, your goal is to help the user overcome some of the limitations of the distance-based similarity search. Provide these alternative questions separated by newlines. 
    Original question: {question} """
)


retriever = MultiQueryRetriever.from_llm(vector_db.as_retriever(),llm, prompt=QUERY_PROMPT)

# RAG Prompt
template = """Answer the question based ONLY on the following context: 
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

chain = (
    {"context":retriever, "question":RunnablePassthrough()}
    | prompt 
    | llm 
    | StrOutputParser()
)

  llm = ChatOllama(model=local_llm)


In [7]:
q = "Describe the primary objectives of this Act."
response = chain.invoke(q)

print(response)

OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:01<00:00,  1.11s/it]
Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 49.81it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 39.15it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 47.59it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 56.58it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 51.82it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 42.62it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 50.31it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 50.76it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 40.79it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 41.03it/s]
OllamaEmbeddings: 100%|

The primary objectives of this Act are not explicitly stated in the provided text. However, based on the content, it appears to be a comprehensive bill addressing various aspects of healthcare policy and pharmacy benefit management. The Act aims to regulate pharmacy benefit managers, including requiring them to provide annual reports, implementing audit rights for PDP sponsors, and enforcing compliance with certain requirements.

Some potential objectives of this Act could include:

1. Improving transparency in pharmacy benefit management
2. Enhancing accountability among pharmacy benefit managers
3. Protecting patients' access to prescription medications
4. Reducing costs associated with pharmacy benefit management
5. Promoting fair business practices within the industry

Please note that these are speculative objectives and may not be an exhaustive list of the Act's primary goals.


In [8]:
q = "What is the most controversial objective within this Act?"
response = chain.invoke(q)

print(response)

OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:01<00:00,  1.27s/it]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 44.87it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 46.65it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 34.38it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 30.48it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 40.45it/s]


Based on the provided text, it appears that the most controversial objective within this Act may be the "PROHIBITING WAIVER OF COMMUNITY ENGAGEMENT REQUIREMENTS" section (subsection 44141(10)). This section restricts states from waiving certain community engagement requirements for individuals receiving Medicaid benefits. The controversy surrounding this section is likely due to its potential impact on individual freedoms and autonomy, as well as the possibility that it may be overly restrictive or burdensome.

However, without more context or information about the specific provisions of the Act and their intended effects, it's difficult to say for certain which objective is the most controversial. Other sections, such as those related to cost-sharing requirements or government efficiency grants, may also be contentious depending on one's perspective.


In [9]:
q = "Resulting from this Act, which groups would be harmed most?"
response = chain.invoke(q)

print(response)

OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:01<00:00,  1.10s/it]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 50.42it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 47.88it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 35.65it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 30.86it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 37.72it/s]


This is a very complex and multi-faceted Act, but I'll try to identify the groups that might be harmed the most based on its provisions.

**Medicaid Expansion**

* The groups that could be harmed are those who might not benefit from Medicaid expansion or might lose some of their current benefits:
	+ Individuals with higher incomes who currently receive Medicaid benefits (if they exceed income limits, they might no longer qualify)
	+ Those who have private health insurance and don't need Medicaid
	+ Businesses or organizations that provide healthcare services to those who would no longer be covered by Medicaid

**Work Requirements**

* Low-income individuals and families with dependents:
	+ The work requirements could make it more difficult for them to access healthcare, especially if they don't have stable employment or transportation.
	+ Those who are elderly, disabled, or have caregiving responsibilities might find it challenging to meet the work requirements.

**Changes to Medicaid*

In [10]:
q = "Resulting from this Act, which groups would benefit most?"
response = chain.invoke(q)

OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:01<00:00,  1.19s/it]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 54.15it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 46.56it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 51.17it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 39.16it/s]
OllamaEmbeddings: 100%|█████████████████████████| 1/1 [00:00<00:00, 46.92it/s]
