<a href="https://colab.research.google.com/github/skm08/LangChain/blob/main/LangChain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Import Groq API Key

In [1]:
pip install -r requirements.txt

Collecting langchain==0.2.10 (from -r requirements.txt (line 1))
  Downloading langchain-0.2.10-py3-none-any.whl.metadata (6.9 kB)
Collecting langchain-openai==0.1.17 (from -r requirements.txt (line 2))
  Downloading langchain_openai-0.1.17-py3-none-any.whl.metadata (2.5 kB)
Collecting python-dotenv==1.0.1 (from -r requirements.txt (line 3))
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting langchain-groq==0.1.6 (from -r requirements.txt (line 4))
  Downloading langchain_groq-0.1.6-py3-none-any.whl.metadata (2.8 kB)
Collecting langchain-community==0.2.10 (from -r requirements.txt (line 5))
  Downloading langchain_community-0.2.10-py3-none-any.whl.metadata (2.7 kB)
Collecting bs4==0.0.2 (from -r requirements.txt (line 6))
  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Collecting unstructured==0.15.0 (from -r requirements.txt (line 7))
  Downloading unstructured-0.15.0-py3-none-any.whl.metadata (29 kB)
Collecting pypdf==4.3.1 (from -r requi

In [2]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
groq_api_key = os.environ["GROQ_API_KEY"]
openai_api_key = os.environ["OPENAI_API_KEY"]

# Import LLAMA3

In [3]:
from langchain_groq import ChatGroq

llamaChatModel = ChatGroq(
    model="llama3-70b-8192"
)

# Import Mistral

In [4]:
mistralChatModel = ChatGroq(
    model="mixtral-8x7b-32768"
)

# Prompt Generation

In [5]:
messages = [
    ("system", "You are an historian expert in the Kennedy family."),
    ("human", "How many members of the family died tragically?"),
]

In [6]:
llamaResponse = llamaChatModel.invoke(messages)
print(llamaResponse.content)

What a sorrowful topic. The Kennedy family has indeed been plagued by a series of tragic events, which have become an integral part of their storied history. Here are the members of the family who met with untimely and tragic ends:

1. Joseph P. Kennedy Jr. (1915-1944): The eldest son of Joseph P. Kennedy Sr. and Rose Kennedy, Joe Jr. was a naval aviator who died in a plane explosion during a secret mission in World War II.
2. Kathleen "Kick" Kennedy (1920-1948): The fourth child and second daughter of the Kennedy family, Kick died in a plane crash in France at the age of 28.
3. John F. Kennedy (1917-1963): The 35th President of the United States, JFK was assassinated in Dallas, Texas, on November 22, 1963, at the age of 46.
4. Robert F. Kennedy (1925-1968): The U.S. Attorney General and Senator from New York, RFK was assassinated in Los Angeles, California, on June 5, 1968, at the age of 42, during his presidential campaign.
5. David Kennedy (1955-1984): The son of Robert F. Kennedy, 

In [None]:
mistralResponse = mistralChatModel.invoke(messages)
print(mistralResponse.content)

The Kennedy family has certainly experienced more than its fair share of tragedy. Here are some of the most notable deaths:

1. Joseph P. Kennedy Sr.'s (father of JFK) oldest son, Joseph P. Kennedy Jr., died in 1944 during World War II when the plane he was piloting exploded mid-air.

2. John F. Kennedy (JFK), the 35th President of the United States, was assassinated in Dallas, Texas, in 1963.

3. Robert F. Kennedy (RFK), JFK's brother and Attorney General under his presidency, was assassinated in Los Angeles, California, in 1968.

4. Senator Ted Kennedy, JFK's younger brother, was involved in a plane crash in 1964 that killed one of his aides. Ted Kennedy himself survived, but suffered lasting back injuries.

5. Kathleen Cavendish, Marchioness of Hartington (Kick Kennedy), JFK's sister, died in a plane crash in 1948.

6. Michael LeMoyne Kennedy, Robert Kennedy's son, died in a skiing accident in 1997.

7. John F. Kennedy Jr., JFK's son, died in a plane crash in 1999.

These are some o

In [None]:
from langchain_core.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    "Tell me a {adjective} story about {topic}"
)

llmModelPrompt = prompt_template.format(
    adjective="curious",
    topic="Tesla"
)

res = llamaChatModel.invoke(llmModelPrompt)
print(res.content)

Here's one:

**The Curious Case of Tesla's Lost Papers**

In the early 1900s, Nikola Tesla, the famous inventor and electrical engineer, was working on a top-secret project in his laboratory in New York City. The project was so revolutionary that Tesla believed it had the potential to change the course of human history.

The project was known as "Teleautomaton" or "Teleautomatics," and it involved the development of a system that could remotely control machines and vehicles using radio signals. Tesla envisioned a future where machines could be operated from a distance, without the need for wires or human intervention.

Tesla worked tirelessly on the project, filling notebooks with equations, diagrams, and notes. He built a series of prototypes, including a remote-controlled boat that he demonstrated in 1898 to a stunned audience in New York City.

But then, disaster struck. In 1904, a fire broke out in Tesla's laboratory, destroying much of his life's work, including his notes and prot

In [None]:
# chat completion model

from langchain_core.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an {profession} expert on {topic}."),
        ("human", "Hello, Mr. {profession}, can you please answer a question?"),
        ("ai", "Sure!"),
        ("human", "{user_input}"),
    ]
)

messages = chat_template.format_messages(
    profession="Historian",
    topic="The Kennedy family",
    user_input="How many grandchildren had Joseph P. Kennedy?"
)

response = llamaChatModel.invoke(messages)
print(response.content)

Joseph P. Kennedy, the patriarch of the Kennedy family, had a total of 29 grandchildren. His nine children, including John F. Kennedy, Robert F. Kennedy, Ted Kennedy, and others, had a large brood of kids, and Joe Sr. was known to be very close to all of them. In fact, he was often referred to as "Grandpa Joe" by his grandkids.


# Few Shot Prompting

In [None]:
from langchain_core.prompts import FewShotChatMessagePromptTemplate

examples = [
    {"input": "hi!", "output": "¡hola!"},
    {"input": "bye!", "output": "¡adiós!"},
]

example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an English-Spanish translator."),
        few_shot_prompt,
        ("human", "{input}"),
    ]
)

# Build Chains

In [None]:
from langchain_core.prompts import FewShotChatMessagePromptTemplate

examples = [
    {"input": "hi!", "output": "¡hola!"},
    {"input": "bye!", "output": "¡adiós!"},
]

example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an English-Spanish translator."),
        few_shot_prompt,
        ("human", "{input}"),
    ]
)


chain = final_prompt | llamaChatModel

res = chain.invoke({"input": "How are you?"})
print(res.content)

Estoy bien, ¿y tú?


# Output Parsers

In [None]:
from langchain_core.prompts import PromptTemplate
from langchain.output_parsers.json import SimpleJsonOutputParser

json_prompt = PromptTemplate.from_template(
    "Return a JSON object with an `answer` key that answers the following question: {question}"
)

json_parser = SimpleJsonOutputParser()

json_chain = json_prompt | llamaChatModel | json_parser

In [None]:
json_parser.get_format_instructions()

'Return a JSON object.'

In [None]:
res  = json_chain.invoke({"question": "What is the biggest country?"})
res

{'answer': 'Russia'}

# Pydantic to define a custom output format

In [None]:
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field

# Define a Pydantic Object with the desired output format.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

# Define the parser referring the Pydantic Object
parser = JsonOutputParser(pydantic_object=Joke)

# Add the parser format instructions in the prompt definition.
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# Create a chain with the prompt and the parser
chain = prompt | llamaChatModel | parser

chain.invoke({"query": "Tell me a joke."})

{'setup': "Why don't scientists trust atoms?",
 'punchline': 'Because they make up everything.'}

# RAG

In [None]:
messages = [
    ("system", "You are an helpful assistant."),
    ("human", "tell me about dswithbappy"),
]
response = llamaChatModel.invoke(messages)
print(response.content)

I apologize, but I'm not familiar with the term "dswithbappy." It's possible that it's a misspelling or a word that's not widely used. Can you please provide more context or information about where you encountered this term so I can better assist you?

If you meant to ask about "DS with Bappy," I'd be happy to try and help you with that. However, I couldn't find any relevant information on this topic. Could you please clarify what "DS" and "Bappy" refer to in this context?

If you have any more questions or need help with something else, feel free to ask!


In [12]:
# Data Loader
# Txt data loading

from langchain_community.document_loaders import TextLoader

loader = TextLoader("be-good.txt")
loaded_data_txt = loader.load()

In [14]:
# CSV loader

from langchain_community.document_loaders import CSVLoader

loader = CSVLoader('Street_Tree_List.csv')

loaded_data_csv = loader.load()
# loaded_data_csv

In [15]:
# Pdf data

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader('5pages.pdf')

loaded_data_pdf = loader.load_and_split()

In [16]:
# wikipedia data loader

from langchain_community.document_loaders import WikipediaLoader

loader = WikipediaLoader(query="Tesla", load_max_docs=1)

loaded_data_wiki = loader.load()[0].page_content

In [17]:
from langchain_core.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate.from_messages(
    [
        ("human", "Answer this {question}, here is some extra {context}"),
    ]
)

messages = chat_template.format_messages(
    name="Tesla",
    question="Tell me about tesla",
    context=loaded_data_wiki
)

In [18]:
response = llamaChatModel.invoke(messages)
response.content

"Here's an overview of Tesla, Inc.:\n\n**Overview**\n\nTesla, Inc. is an American multinational automotive and clean energy company founded in 2003 by Martin Eberhard and Marc Tarpenning. The company is named after Nikola Tesla, a Serbian-American inventor and electrical engineer. Tesla is headquartered in Austin, Texas, and designs, manufactures, and sells battery electric vehicles (BEVs), stationary battery energy storage devices, solar panels, and solar shingles, as well as related products and services.\n\n**History**\n\n* 2003: Tesla Motors, Inc. was incorporated on July 1 by Martin Eberhard and Marc Tarpenning.\n* 2004: Elon Musk joined as the chairman of the board of directors and the largest shareholder of Tesla, with a $6.5 million investment.\n* 2005-2009: Tesla developed its first car model, the Roadster sports car.\n* 2008: Eberhard was asked to step down as CEO, and Musk took over as CEO.\n* 2012: Tesla began production of the Model S sedan.\n* 2015: Tesla introduced the M

# RAG Components

In [19]:
# 1. CharacterTextSplitter

from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    separator="\n\n",
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
    is_separator_regex=False,
)

texts = text_splitter.create_documents([loaded_data_txt[0].page_content])

In [None]:
len(texts)

2

In [20]:
# 2. RecursiveCharacterTextSplitter

from langchain_text_splitters import RecursiveCharacterTextSplitter

recursive_splitter = RecursiveCharacterTextSplitter(
    chunk_size=26,
    chunk_overlap=4
)

text_rec = recursive_splitter.split_text(loaded_data_txt[0].page_content)

In [None]:
len(text_rec)

815

In [None]:
text_rec[:5]

['Be good',
 'April 2008(This essay is',
 'is derived from a talk at',
 'at the 2008 Startup',
 'School.)About a month']

# Embeddings

In [8]:
# pip install sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-3.2.1-py3-none-any.whl.metadata (10 kB)
Downloading sentence_transformers-3.2.1-py3-none-any.whl (255 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m255.8/255.8 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentence-transformers
Successfully installed sentence-transformers-3.2.1


In [21]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

# Load the Sentence-Transformers model for embeddings
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Now you can use this embedding model in your LangChain pipelines, for example, with FAISS for vector search
chunks_of_text =     [
        "Hi there!",
        "Hello!",
        "What's your name?",
        "Bond, James Bond",
        "Hello Bond!"
    ]

# Convert texts to embeddings
embeddings = [embedding_model.embed_query(text) for text in chunks_of_text]

# Print or use the embeddings for further processing
print(embeddings)




[[-0.09151120483875275, 0.025147806853055954, 0.0020471338648349047, 0.0051811011508107185, -0.015277049504220486, -0.052715811878442764, 0.01655740849673748, 0.0423915721476078, -0.02921575866639614, 0.006825124844908714, -0.02768443338572979, 0.007252856623381376, -0.01055940706282854, -0.04191432148218155, 0.032309070229530334, 0.03700307384133339, -0.053152453154325485, -0.009931975975632668, -0.10471197962760925, -0.03906849026679993, 0.03552796319127083, 0.07590004801750183, -0.10407810658216476, 0.04460618272423744, -0.02690211869776249, -0.03199000656604767, -0.006866887677460909, 0.06416885554790497, -0.0711832046508789, -0.017583826556801796, 0.01255409512668848, 0.12845945358276367, -0.022762082517147064, 0.053721778094768524, 0.0291697196662426, 0.048878032714128494, -0.04694394767284393, -0.07912521064281464, 0.0451648086309433, 0.023789480328559875, -0.01188875176012516, 0.011806285940110683, -0.017342569306492805, 0.03931812196969986, 0.053550079464912415, -0.07429265975

In [10]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

# Load a BERT-based model for embeddings from Hugging Face
embedding_model = HuggingFaceEmbeddings(model_name="distilbert-base-uncased")

chunks_of_text =     [
        "Hi there!",
        "Hello!",
        "What's your name?",
        "Bond, James Bond",
        "Hello Bond!"
    ]

# Convert texts to embeddings
embeddings = [embedding_model.embed_query(text) for text in chunks_of_text]

print(embeddings)




config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

[[-0.018343335017561913, -0.1628587245941162, -0.06578601896762848, 0.023128028959035873, 0.00455157645046711, -0.48300719261169434, 0.2474226951599121, 0.6296902298927307, -0.04311739653348923, -0.09659039229154587, -0.18549716472625732, -0.4312531650066376, -0.26857858896255493, 0.11359043419361115, -0.2893787622451782, 0.36271682381629944, 0.01674966886639595, 0.040249209851026535, 0.007227499037981033, 0.14428582787513733, 0.5352672338485718, 0.10889463126659393, -0.21775025129318237, 0.22065210342407227, 0.649745523929596, -0.21230319142341614, -0.014112617820501328, -0.12194402515888214, -0.5462151169776917, -0.3744245171546936, 0.29739952087402344, -0.06133110076189041, -0.10202661901712418, 0.08745697140693665, 0.023268982768058777, -0.27904340624809265, -0.1366906464099884, 0.05550412833690643, -0.14184676110744476, 0.2325471043586731, -0.44396477937698364, -0.2067464143037796, 0.19729650020599365, 0.32864218950271606, 0.08319167047739029, -0.1143551915884018, -0.0043078269809

In [22]:
len(embeddings[0])

384

In [23]:
print(embeddings[0][:5])

[-0.09151120483875275, 0.025147806853055954, 0.0020471338648349047, 0.0051811011508107185, -0.015277049504220486]


# Vector Stores (aka. Vector Databases)

In [24]:
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_chroma import Chroma

# Load the document, split it into chunks, embed each chunk and load it into the vector store.
loaded_document = TextLoader('state_of_the_union.txt').load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

chunks_of_text = text_splitter.split_documents(loaded_document)


In [26]:

vector_db = Chroma.from_documents(chunks_of_text, embedding_model)

In [27]:
vector_db

<langchain_chroma.vectorstores.Chroma at 0x7d635d44c580>

In [28]:
question = "What did the president say about the John Lewis Voting Rights Act?"

response = vector_db.similarity_search(question)

print(response[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.


# Vector Store as Retriever

In [30]:
pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.4 kB)
Downloading faiss_cpu-1.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.5/27.5 MB[0m [31m61.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.9.0


In [31]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("state_of_the_union.txt")

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loaded_document = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

chunks_of_text = text_splitter.split_documents(loaded_document)

vector_db = FAISS.from_documents(chunks_of_text, embedding_model)

In [32]:
retriever = vector_db.as_retriever(search_kwargs={"k": 1})
response = retriever.invoke("what did he say about ketanji brown jackson?")
response

[Document(metadata={'source': 'state_of_the_union.txt'}, page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.')]