# What is Langchain
Langchain is an open-source framework enabling developers to integrate large language models like GPT-4 with external computation and data sources. It is available as a Python or JavaScript/TypeScript package.

<img src="images/what_is_langchain.png" width=75%/>

## Why Langchain is needed:
Langchain is essential for connecting large language models to personalized data sources, such as books, PDFs, or databases, allowing users to interact with their own information dynamically. It facilitates the creation of data-aware and authentic applications, opening up various practical use cases, including personal assistance, learning, coding, data analysis, and connecting language models to company data for advanced analytics. The framework's main value proposition lies in LLW wrappers, prompt templates, chains, and agents, enabling seamless integration and interaction with language models.

### LangChain Pros and Cons

**Pros:**
- Freely available
- Open source
- Supports all major LLMs
- Variety of modules to perform common tasks

**Cons:**

- Limited support for languages other than Python
- Some people express security concerns over the handling of sensitive information.
- LangChain Pricing
- LangChain framework is a free-to-use open-source framework. 

**LangChain Pricing**
LangChain framework is a free-to-use open-source framework.

In this notebook, we will use the `langchain` library to use pre-trained models for various NLP tasks. We will use the `pipeline` class to use pre-trained models for various NLP tasks. The `pipeline` class provides a simple API dedicated to several NLP tasks. It provides a simple, straight-forward, and efficient way to use pre-trained models.

<a href="https://colab.research.google.com/github/miztiik/llm-bootcamp/blob/main/chapters/intro_to_langchain/intro_to_langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%%capture
# Comment the above line to see the installation logs

# Install the dependencies
!pip install -qU python-dotenv
!pip install -qU langchain
!pip install -qU langchain-openai

In [None]:
# Load environment variables
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())

## LangChain: Model I/O - Text Model Wrapper

Update your `OPENAI_API_KEY` in the `.env` file to use the OpenAI model. You can get the API key from the [OpenAI website](https://platform.openai.com/account/api-keys).

In [None]:
# Run basic query with OpenAI wrapper
from langchain_openai import OpenAI

llm = OpenAI()

# To specify a particular model refer to the OpenAI documentation - https://platform.openai.com/docs/models
# Completions Model: https://platform.openai.com/docs/models/completions
# Chat Model: https://platform.openai.com/docs/models/completions


llm = OpenAI(model_name="gpt-3.5-turbo-instruct")

In [None]:
llm.invoke("What is the currency of india")

In [None]:
txt_resp = llm.invoke("explain large language models in one sentence")
print(txt_resp, end="\n")

You can pass multiple text prompts to an OpenAI model via the `generate()` method. For example, the following script returns two outputs, one for each prompt.

In [None]:
multiple_txt_resp = llm.generate(
    [
        "What is the capital of india",
        "Tell me a joke about AI",
        "Who won FIFA 2018",  # Change it to 2022, and see what happens
    ]
)

In [None]:
print(f"Total responses: {len(multiple_txt_resp.generations)}")

for i, resp in enumerate(multiple_txt_resp.generations):
    print(f"Response {i+1}: {resp[0].text}", end="\n")

In [None]:
multiple_txt_resp.generations[1][0].text

## LangChain: Model I/O - Chat Model Wrapper

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

In [None]:
# Chat Model: https://platform.openai.com/docs/models/completions

llm_chat = ChatOpenAI(model_name="gpt-3.5-turbo-0125", temperature=0.3)
messages = [
    SystemMessage(content="You are an football historian"),
    HumanMessage(
        content="Who won the player of the tournament in 1998 Fifa World Cup?"
    ),
]


chat_resp = llm_chat.invoke(messages)
chat_resp

#### Stream the output from the chat model to the console.

In [None]:
for chunk in llm_chat.stream(messages):
    print(chunk.content, end="", flush=True)

## Langchain: Prompt Templates

A prompt for a language model is a set of instructions or input provided by a user to guide the model's response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation.

- String prompt template
- Chat prompt templates

Source: https://python.langchain.com/docs/modules/model_io/prompts/quick_start


### String Prompt Template

In [None]:
from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    "Tell me a {adjective} joke about {content}."
)
prompt_template.format(adjective="funny", content="AI")

In [None]:
joke_resp = llm.invoke(prompt_template.format(
    adjective="sad", content="Tech Engineer"))



print(joke_resp, end="\n")

In [None]:
template = """
You are an expert data scientist with an expertise in building deep learning models. 
Explain the concept of {concept} in a couple of lines.
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
    template_format="f-string",
    validate_template=True,
)

prompt

In [None]:
# Run LLM with PromptTemplate
prompt_template_resp = llm.invoke(prompt.format(concept="generative models"))
print(prompt_template_resp, end="\n")

In [None]:
# Try another query with PromptTemplate
prompt_template_resp = llm.invoke(
    prompt.format(concept="Large Language Models"))
print(prompt_template_resp, end="\n")

### Chat prompt composition

Source: 
- https://python.langchain.com/docs/modules/model_io/chat/quick_start#messages
- https://python.langchain.com/docs/modules/model_io/prompts/composition#chat-prompt-composition

In [None]:
from langchain.schema import HumanMessage, SystemMessage
from langchain.prompts import HumanMessagePromptTemplate


from langchain.chat_models import ChatOpenAI

In [None]:
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain_core.messages import SystemMessage


chat_template = ChatPromptTemplate.from_messages(
    [
        # SystemMessage(content=("You are a {sports} historian.")),
        SystemMessagePromptTemplate.from_template(
            "You are a {sports} historian."),
        HumanMessagePromptTemplate.from_template("{text}"),
    ]
)

chat_prompt = chat_template.format_prompt(
    sports="Tennis", text="Who won the Australian Open in 2015"
).to_messages()

print(chat_prompt)

In [None]:
prompt_resp = llm_chat.invoke(chat_prompt)
print(prompt_resp.content, end="\n")

In [None]:
# Try another query with ChatPromptTemplate
print(
    llm_chat.invoke(
        chat_template.format_prompt(
            sports="Football", text="Who won the 2018 FIFA World Cup?"
        ).to_messages()
    )
)

## Langchain: Chains

Chains allow you to run multiple LangChain modules in conjunction. For example, using a chain, you can run a prompt and an LLM together, saving you from first formatting a prompt for an LLM model and executing it using the model in separate steps.

LangChain supports three main types of chains:

- Simple LLM Chain
- Sequential Chain
- Custom Chain

In [None]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [("system", "You are an expert technical advisor."), ("user", "{input}")]
)

chain = prompt | llm

In [None]:
chain.invoke(
    {"input": "Write a Python script to generate a list of prime numbers up to 100.?"}
)

In [None]:
# Import prompt and define PromptTemplate

from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models. 
Explain the concept of {concept} in a couple of lines
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
)

In [None]:
# Run LLM with PromptTemplate

llm(prompt.format(concept="autoencoder"))

In [None]:
# Import LLMChain and define chain with language model and prompt as arguments.

from langchain.chains import LLMChain

chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain.run("autoencoder"))

In [None]:
# Define a second prompt

second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words",
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)

In [None]:
# Define a sequential chain using the two chains above: the second chain takes the output of the first chain as input

from langchain.chains import SimpleSequentialChain

overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("autoencoder")
print(explanation)

In [None]:
# Import utility for splitting up texts and split up the explanation given above into document chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=0,
)

texts = text_splitter.create_documents([explanation])

In [None]:
# Individual text chunks can be accessed with "page_content"

texts[0].page_content

In [None]:
# Import and instantiate OpenAI embeddings

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model_name="ada")

In [None]:
# Turn the first text chunk into a vector with the embedding

query_result = embeddings.embed_query(texts[0].page_content)
print(query_result)

In [None]:
# Import and initialize Pinecone client

import os
import pinecone
from langchain.vectorstores import Pinecone


pinecone.init(
    api_key=os.getenv("PINECONE_API_KEY"), environment=os.getenv("PINECONE_ENV")
)

In [None]:
# Upload vectors to Pinecone

index_name = "langchain-quickstart"
search = Pinecone.from_documents(texts, embeddings, index_name=index_name)

In [None]:
# Do a simple vector similarity search

query = "What is magical about an autoencoder?"
result = search.similarity_search(query)

print(result)

In [None]:
# Import Python REPL tool and instantiate Python agent

from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonREPLTool
from langchain.python import PythonREPL
from langchain.llms.openai import OpenAI

agent_executor = create_python_agent(
    llm=OpenAI(temperature=0, max_tokens=1000), tool=PythonREPLTool(), verbose=True
)

In [None]:
# Execute the Python agent

agent_executor.run(
    "Find the roots (zeros) if the quadratic function 3 * x**2 + 2*x -1")

## Langchain: Custom Models

C transformers package implements various LLMs that you can use in LangChain. You do not need an API key to access C transformers LLMs.

In [None]:
%%capture

!pip install -qU CTransformers

In [None]:
from langchain.llms import CTransformers

llm = CTransformers(model="marella/gpt-2-ggml")
print(llm("I am flying to Lisbon on"))

## Additional Reading

- [LangChain](https://python.langchain.com/docs/get_started/quickstart)
