# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

In [None]:
#pip install --upgrade langchain

In [5]:
import os
import openai

openai.api_key = os.environ["OPENAI_API_KEY"]

In [29]:
import concurrent.futures
import pandas as pd
from langchain.chains import RetrievalQA, VectorDBQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS 
from langchain.vectorstores import DocArrayInMemorySearch, Chroma
from langchain.indexes import VectorstoreIndexCreator
from IPython.display import display, Markdown
from langchain.llms import OpenAI

In [7]:
## data contains news articles related to technology

In [8]:
df = pd.read_csv('data/tech_news_articles.csv')
df['blurp'].to_csv('./data/vectorstore.csv',index=False)
pd.read_csv('data/vectorstore.csv')

Unnamed: 0,blurp
0,John Hopfield and Geoffrey Hinton developed ar...
1,[Removed]
2,The award should not feed the AI-hype cycle.
3,Two scientists have been awarded\r\n the Nobel...
4,"OpenAI is now valued at $157 billion, a close ..."
...,...
478,Garrett expands its innovation footprint with ...
479,Actuators play a vital role in improving vehic...
480,"Tariffs be damned, Chinese EV brand ZEEKR cont..."
481,"GUANGZHOU, China, Oct. 9, 2024 /PRNewswire/ --..."


In [34]:
file = 'data/vectorstore.csv'
# file = 'data/tech_news_articles.csv'
loader = CSVLoader(file_path=file)
documents = loader.load()

In [35]:
documents[0]

Document(metadata={'source': 'data/tech_news_articles.csv', 'row': 0}, page_content=': 0\ntitle: Nobel Prize Goes to ‘Godfathers of AI’ Who Now Fear Their Work Is Growing Too Powerful\nimgurl: https://gizmodo.com/app/uploads/2024/10/nobel-hinton-hopfield-ai.jpg\ndate: 08/10/2024\nblurp: John Hopfield and Geoffrey Hinton developed artificial neural networks that laid the foundation for modern recommendation systems and generative AI.\nurl: https://gizmodo.com/nobel-prize-goes-to-godfathers-of-ai-who-now-fear-their-work-is-growing-too-powerful-2000509098\ntext: Two AI researchers, John Hopfield and Geoffrey Hinton, received the Nobel Prize in physics on Monday for their work building artificial neural networks that can memorize information and recognize pat… [+3174 chars]\ncategory: artificial intelligence\nsource: gizmodo.com')

In [19]:
from langchain_huggingface import HuggingFaceEmbeddings
# define embeddings
embeddings = HuggingFaceEmbeddings(
    model_name="thenlper/gte-small")

In [20]:
query = 'what is the most popular language?'

In [12]:
#pip install docarray

In [21]:
index = VectorstoreIndexCreator(
    embedding=embeddings,
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])



In [22]:
llm_replacement_model = ChatOpenAI(temperature=0, model='gpt-4o-mini')

response = index.query(query, llm_replacement_model)

  llm_replacement_model = ChatOpenAI(temperature=0, model='gpt-4o-mini')


In [23]:
# Responds with the summary of the document most related to the query

In [24]:
display(Markdown(response))

I don't know.

In [51]:
query ="which article is most relevant to AI?"

In [52]:
response = index.query(query, llm_replacement_model)

In [53]:
display(Markdown(response))

The first blurp about two artificial intelligence pioneers being awarded the Nobel Prize for their work in machine learning is the most relevant to AI.

## Using Chroma Vectordb

In [30]:
#Place vectorDB under /tmp. It can be anywhere else
# from langchain.vectorstores import Chroma
persist_directory = "/tmp/chromadb"
vectordb = Chroma.from_documents(documents=list(documents[0:1]), embedding=embeddings,
                                 persist_directory=persist_directory)


# vectordb.persist()
# vectordb._collection.count()

def batch_process(documents_arr, batch_size,):
    for i in range(1, len(documents_arr), batch_size):
        batch = documents_arr[i:i + batch_size]
        add_to_chroma_database(batch)

def add_to_chroma_database(batch):
    vectordb.add_documents(documents=batch)
    
    
batch_size = 100

# batch_process(documents, batch_size, add_to_chroma_database)

def form_batch(documents_arr, batch_size):
    data_list = []
    for i in range(1, len(documents_arr), batch_size):
        data_list.append(documents_arr[i:i + batch_size])
    return data_list

data_list = form_batch(documents, 100)

#this allows parallel processing and faster processing for inserting the articles into chroma
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        executor.map(add_to_chroma_database, data_list)

In [39]:
results = vectordb.similarity_search(query)
[print(doc.page_content) for doc in results]

blurp: Deploying AI applications to the cloud is a crucial step in enhancing their accessibility, usability, and real-world impact. By transitioning AI apps from a local environment to the cloud, developers can ensure that their applications are easily accessible to…
blurp: Hello everyone,
I’m exploring ways to optimize [cloud storage][1] solutions using Wolfram Language and would love to hear your insights and experiences.
I’ve been working with large datasets and am particularly interested in:
1.Data Compression: Are there …
blurp: The article highlights the critical need for robust cloud security amidst emerging threats like APTs, quantum computing risks, and ransomware-as-a-service. It details advancements like Zero Trust Architecture, AI and ML integration, Secure Access Service Edge…
blurp: These cloud security statistics paint a worrying picture for businesses worldwide. Nearly one in two companies have reported security breaches, a statistic all the more disturbing consideri

[None, None, None, None]

## Retreival QA

In [46]:
db = DocArrayInMemorySearch.from_documents(
    documents, 
    embeddings
)

In [47]:
retriever = db.as_retriever()

In [54]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm_replacement_model, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [55]:
query =  "Please list all your articles with the topic of AI in a table in markdown and summarize each one"

In [56]:
response = qa_stuff.run(query)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [57]:
display(Markdown(response))

Here is a table in markdown format listing the articles related to AI along with their summaries:

| Title                                                                 | Date       | Summary                                                                                                           | URL                                                                                       |
|-----------------------------------------------------------------------|------------|-------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
| How to Use AI to Answer Questions                                      | 04/10/2024 | This article discusses how to effectively ask AI questions to get the best results, covering various types of inquiries. | [Read more](https://www.cnet.com/tech/services-and-software/how-to-use-ai-to-answer-questions/) |
| Why artificial intelligence and clean energy need each other          | 08/10/2024 | The article explores the interdependence of AI and clean energy, emphasizing the need for enhanced electricity infrastructure to support AI's power demands. | [Read more](https://www.technologyreview.com/2024/10/08/1105165/why-artificial-intelligence-and-clean-energy-need-each-other/) |
| Corporate Governance Jumps Into The Artificial Intelligence Discussion | 07/10/2024 | This piece highlights new guidance from the National Association of Corporate Directors on the role of board governance in overseeing AI and technology use. | [Read more](https://www.forbes.com/sites/michaelperegrine/2024/10/07/corporate-governance-jumps-into-the-artificial-intelligence-discussion/) |
| How to effectively manage AI projects in 12 steps                     | 09/10/2024 | The article outlines 12 steps to successfully manage AI projects, aiming to help organizations achieve better results and deliver business value. | [Read more](https://www.techtarget.com/searchenterpriseai/tip/How-to-effectively-manage-AI-projects) |