# Generative Artificial Intelligence using Langchain

---

Get started with Langchain
---

## 📑 Contents

1. LLM Model
2. Prompt Template
3. Chain Concept
4. Sequential Chain
5. LLM based Mini Project
6. Agents
7. Memory

#  Project - News Research Tool



## 1. Imports


In [60]:
import os
import streamlit as st
import pickle
import pickle
import time
import langchain
from dotenv import load_dotenv
from langchain_openai import OpenAI
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.chains.qa_with_sources.loading import load_qa_with_sources_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredURLLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS 


In [5]:
load_dotenv()     # To load the OpenAI API Key saved in .env file

True

## 2. Loading the blogs url with UnsctructuredURLLoader

UnstructuredURLLoader of Langchain internally uses unstructured python library to load the content from url's

In [26]:
# Initialize LLM with required params

llm = OpenAI(temperature=0.9, max_tokens=500)

loaders = UnstructuredURLLoader(urls=[
    "https://www.moneycontrol.com/news/business/markets/wall-street-rises-as-tesla-soars-on-ai-optimism-11351111.html",
    "https://www.moneycontrol.com/news/business/tata-motors-launches-punch-icng-price-starts-at-rs-7-1-lakh-11098751.html"
])

data = loaders.load()
len(data)

2

In [27]:
data[0].page_content



## 3. Text Splitting - Splitting the text from blogs into specific chunks

LLM's have token limits. Hence we need to split the text which can be large into small chunks so that each chunk size is under the token limit. There are various text splitter classes in langchain that allows us to do this.

In [28]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

docs = text_splitter.split_documents(data)
len(docs)

18

In [31]:
docs[9]

Document(metadata={'source': 'https://www.moneycontrol.com/news/business/markets/wall-street-rises-as-tesla-soars-on-ai-optimism-11351111.html'}, page_content='Mutual Funds\n\nHome MC 30 Top Ranked Funds ETFs Mutual Fund Screener\n\nTools\n\nIncome Tax Calculator EMI Calculator Retirement Planning Gratuity Calculator Petrol Price in India Diesel Price in India\n\nCommunity\n\nStock Markets\n\nNetwork 18 Sites\n\nNews18 Firstpost CNBC TV18 News18 Hindi Cricketnext Overdrive\n\nQuick Links\n\nAbout Us Contact Us Advisory Alert Advertise with Us SupportDisclaimer Privacy Policy Cookie Policy Terms & Conditions Financial Terms (Glossary) Sitemap Investors\n\nDownload MC Apps:\n\n\n\n\n\n\n\nCopyright © Network18 Media & Investments Limited. All rights reserved. Reproduction of news articles, photos, videos or any other content in whole or in part in any form or medium without express written permission of moneycontrol.com is prohibited.\n\nYou got 30 Day’s Trial of\n\nMoneycontrol Pro\n\nM

## 4. Embeddings

In [65]:
embeddings = OpenAIEmbeddings()

vectorindex_openai = FAISS.from_documents(docs, embeddings)

In [66]:
# Storing vector index create in local
save_path = 'C:/Users/shahe/OneDrive/Documents/GitHub/ai_portfolio/Generative_AI'

vectorindex_openai.save_local(save_path)

In [67]:
# Loading vector index from local

vectorindex = FAISS.load_local(save_path, embeddings, allow_dangerous_deserialization=True)

## 5. Retrieval

In [70]:
chain = RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=vectorindex.as_retriever())
chain



The model takes the query to all the selected chunks (4 in this case) and ask the query to all and get their respones. Now it generates the summary using all four responses and then generates the best answer from the summary.

In [71]:
query = 'what is the price of Tiago icng'

langchain.debug = True

chain({'question': query}, return_only_outputs=True)

  chain({'question': query}, return_only_outputs=True)


[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain] Entering Chain run with input:
[0m{
  "question": "what is the price of Tiago icng"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain > chain:MapReduceDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain > chain:MapReduceDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "input_list": [
    {
      "context": "The company also said it has also introduced the twin-cylinder technology on its Tiago and Tigor models.\n\nThe Tiago iCNG is priced between Rs 6.55 lakh and Rs 8.1 lakh, while the Tigor iCNG comes at a price range of Rs 7.8 lakh to Rs 8.95 lakh.\n\nTata Motors Passenger Vehicles Ltd Head-Marketing, Vinay Pant said these introductions put together will make the company's CNG line up \"appealing, holistic, and stronger than ever\".\n\nPTI\n\nfirst published: Aug 4, 2023 02:17 pm\n\nDiscover t

{'answer': ' The price of the Tiago iCNG is between Rs 6.55 lakh and Rs 8.1 lakh.\n',
 'sources': 'https://www.moneycontrol.com/news/business/tata-motors-launches-punch-icng-price-starts-at-rs-7-1-lakh-11098751.html'}

In [10]:

code="""
import os
import streamlit as st
import pickle
import time
from langchain_openai import OpenAI
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredURLLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

from dotenv import load_dotenv
load_dotenv()

st.title('News Research Tool 📈')
st.sidebar.title('News Article URLs')

save_path = 'C:/Users/shahe/OneDrive/Documents/GitHub/ai_portfolio/Generative_AI'
urls = []
embeddings = OpenAIEmbeddings()     # Create embeddings instance
llm = OpenAI(temperature=0.9, max_tokens=500)

for i in range(3):
    url = st.sidebar.text_input(f'URL {i+1}')
    urls.append(url)

process_url_clicked = st.sidebar.button('Process URLs')

main_placeholder = st.empty()

if process_url_clicked:
    # Load Data
    loader = UnstructuredURLLoader(urls=urls)
    main_placeholder.text("Data Loading...Started...✅✅✅")
    data = loader.load()

    # Split Data
    text_splitter = RecursiveCharacterTextSplitter(separators=["\\n\\n", "\\n", ".", ","], chunk_size=1000)

    main_placeholder.text("Text Splitter...Started...✅✅✅")
    docs = text_splitter.split_documents(data)

    # Create Embeddings
    vectorstore_openai = FAISS.from_documents(docs, embedding=embeddings)
    main_placeholder.text("Embedding Vector Started Building...✅✅✅")
    time.sleep(2)

    # Store FAISS vector index to .pkl file
    vectorstore_openai.save_local(save_path)

query = main_placeholder.text_input('Question: ')
process_query_clicked = st.button('Process Query')

if process_query_clicked:
    if os.path.exists(save_path):
        vectorstore = FAISS.load_local(save_path, embeddings, allow_dangerous_deserialization=True)
        chain = RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=vectorstore.as_retriever())
        result = chain({'question': query}, return_only_outputs=True)   # {'answer': '', 'sources': []}
        st.header('Answer')
        st.write(result['answer'])

        # Display sources, if available
        sources = result.get("sources", "")
        if sources:
            st.subheader("Sources:")
            sources_list = sources.split("\\n")  # Split the sources by newline
            for source in sources_list:
                st.write(source)
        
"""

with open("news_research_tool.py", "w") as f:
    f.write(code)

# Now run this .py file in terminal using `streamlit run news_research_tool.py` 