## Exploring LangChain - Part III

The purpose of this notebook is to explore fundamentals of LangChain.

Key topics covered include: UnstructuredURLLoader, FAISS and RetrievalQAWithSourcesChain

**NOTE: This is just a practice notebook that acts as a precursor to the main project**

In [39]:
import os
import streamlit as st
import pickle
import time
import langchain
from langchain_openai import OpenAI,OpenAIEmbeddings
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.chains.qa_with_sources.loading import load_qa_with_sources_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredURLLoader

from langchain.vectorstores import FAISS

In [None]:
key = "i have removed my key. Enter your key"

In [12]:
#Initialize LLM with required params

llm = OpenAI(temperature = 0.7, api_key=key, max_tokens=500)

In [15]:
loader = UnstructuredURLLoader(urls=[
    "https://www.ft.com/content/95745636-2d21-46aa-b0f1-6bda1c0fdd0b?accessToken=zwAAAYblEFF3kdOVdFY2LSFGqtOw8WvaHA_dCwE.MEYCIQCKqVGoyEh2jPvo574Ns5jiUzEVBHMrg2m8wfbjaLwupwIhANpYFrgjSfID76yCJIJPJEhzWtetNi5MsOMiYl_gyjaH&segmentId=8bab5fbd-4508-93c4-7ded-a9e1428c7053",
    "https://www.ft.com/content/632411eb-c3fa-4351-a3b6-b0e30bdc0ef7?accessToken=zwAAAYWwEkESkc9jJBHrw_pDUdOjtrDjC9wO9wE.MEQCIC7-tLEPkOYVG427tYIVBtANt60iz-FWXCfBDHwEb0G0AiB5JouSRl1fivzejChmdq5TnvVdNmiibHtVbJUCviVHxA&segmentId=501d7750-774f-dc19-66bb-320ebfb582d1",
   
])

data = loader.load()

In [31]:
data[0].page_content

'Accessibility helpSkip to navigationSkip to contentSkip to footer\n\nSign In\n\nSubscribe\n\nOpen side navigation menuOpen search bar\n\nSubscribeSign In\n\nMenuSearch\n\nHome\n\nWorld\n\nUS\n\nCompanies\n\nTech\n\nMarkets\n\nClimate\n\nOpinion\n\nLex\n\nWork & Careers\n\nLife & Arts\n\nHTSI\n\nFinancial Times\n\nSubscribeSign In\n\nFT Financial Literacy and Inclusion Campaign\n\nCurrently reading:\n\nPersonal inflation calculator: what is your inflation rate?\n\nPeru makes teachers comfortable about money before teaching the kids\n\nBanks must play fair as mortgage rates rise\n\nWhen trading crypto becomes an addiction\n\nMyBnk calls for 30 hours a year of financial education in UK schools\n\n‘I’m 22 and I earn more than my parents’\n\nCharity warns of UK maths gender gap\n\nSee all 112 stories\n\nPersonal Finance\n\nManage your delivery channels here\n\nPersonal inflation calculator: what is your inflation rate?\n\nAs the cost of living continues to rise, find out how much inflation

In [34]:
text_splitter = RecursiveCharacterTextSplitter(separators=["\n","."," "],
                                              chunk_size = 500,
                                              chunk_overlap = 100)

docs = text_splitter.split_documents(data)


In [35]:
#77 chunks created
len(docs)

77

In [41]:
#creating an object of embeddings
embeddings = OpenAIEmbeddings(api_key=key)

#from documents method in FAISS class accepts two params - documents, and embeddings object
#we get vectorized chunks indices which we store locally (DB in real life)
vectorindex_openai = FAISS.from_documents(docs, embeddings)

In [62]:
# Save the FAISS index to a file
file_path = "vector_index.faiss"
vectorindex_openai.save_local(file_path)


In [63]:
# Load the FAISS index
vectorIndex = FAISS.load_local(file_path, embeddings,
                              allow_dangerous_deserialization=True)

The error message indicates that the FAISS.load_local method uses a pickle file for loading metadata, and the library has implemented a safeguard to prevent deserialization due to potential security risks.

To proceed, you need to explicitly allow dangerous deserialization by setting the allow_dangerous_deserialization parameter to True. This is safe if you're working with a file you created and trust its source.

Now, my vector index is loaded into a memory and has knowledge of the above articles.

### RetrievalQAWithSourcesChain
The code creates a **Retrieval-Based Question-Answering Chain with Sources** using a pre-trained language model (LLM) and a retriever.

- **`llm`**: The language model (e.g., OpenAI GPT) that generates answers based on the context retrieved.
- **`retriever`**: The retriever fetches relevant chunks of information from the FAISS vector index (`vectorIndex`) for answering questions.

#### Key Functionality:
- The chain uses the retriever to fetch context from the vector database.
- The LLM combines the retrieved information to generate an answer.
- Includes source references for transparency in the output.


In [68]:
#retriever is how you plan to retrieve vector database
#This is Map Reduced Method
chain = RetrievalQAWithSourcesChain.from_llm(llm = llm, retriever = vectorIndex.as_retriever())
print(chain)



### Running a QA Chain with Debugging Enabled
This code demonstrates how to query a **Retrieval-Based QA Chain** while enabling debugging to inspect underlying processes.

#### Steps:
1. **Set the Query**: The question `"What is CPI?"` is passed as input.
2. **Enable Debugging**: `langchain.debug = True` enables detailed logs to observe how the chain processes the query.
3. **Invoke the Chain**: The `.invoke()` method is used to run the QA chain and return only the outputs (deprecates the older `__call__` method).



In [71]:
query = "What is CPI?"

langchain.debug = True   #to see what's going underneath

# Run the chain and return the outputs
chain.invoke({"question": query}, return_only_outputs=True)


[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain] Entering Chain run with input:
[0m{
  "question": "What is CPI?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain > chain:MapReduceDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain > chain:MapReduceDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "input_list": [
    {
      "context": "These inflation rates — most often measured by the consumer price index, or CPI — are calculated based on a total “basket” of goods and services bought by all consumers in a national economy. They do not, however, necessarily represent the impact that inflation has on you.\n\nFT FLIC\n\nDonate to the Financial Literacy & Inclusion Campaign here",
      "question": "What is CPI?"
    },
    {
      "context": "We have chosen CPI measures for this calculator that take into account the service costs of home owner

{'answer': ' CPI stands for Consumer Price Index.\n',
 'sources': 'https://www.ft.com/content/95745636-2d21-46aa-b0f1-6bda1c0fdd0b?accessToken=zwAAAYblEFF3kdOVdFY2LSFGqtOw8WvaHA_dCwE.MEYCIQCKqVGoyEh2jPvo574Ns5jiUzEVBHMrg2m8wfbjaLwupwIhANpYFrgjSfID76yCJIJPJEhzWtetNi5MsOMiYl_gyjaH&segmentId=8bab5fbd-4508-93c4-7ded-a9e1428c7053'}