# What is Langchain
Langchain is an open-source framework enabling developers to integrate large language models like GPT-4 with external computation and data sources. It is available as a Python or JavaScript/TypeScript package.

## Why Langchain is needed:
Langchain is essential for connecting large language models to personalized data sources, such as books, PDFs, or databases, allowing users to interact with their own information dynamically. It facilitates the creation of data-aware and authentic applications, opening up various practical use cases, including personal assistance, learning, coding, data analysis, and connecting language models to company data for advanced analytics. The framework's main value proposition lies in LLW wrappers, prompt templates, chains, and agents, enabling seamless integration and interaction with language models.

**References**
- [LangChain](https://python.langchain.com/docs/get_started/quickstart)

In this notebook, we will use the `langchain` library to use pre-trained models for various NLP tasks. We will use the `pipeline` class to use pre-trained models for various NLP tasks. The `pipeline` class provides a simple API dedicated to several NLP tasks. It provides a simple, straight-forward, and efficient way to use pre-trained models.

<a href="https://colab.research.google.com/github/miztiik/llm-bootcamp/blob/main/chapters/what_is_langchain/getting_started_with_langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
# Comment the above line to see the installation logs

# Install the dependencies
!pip install -qU python-dotenv
!pip install -qU langchain
!pip install -qU langchain-openai

In [2]:
# Load environment variables
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())

True

## LangChain: Models - Text Model Wrapper

Update your `OPENAI_API_KEY` in the `.env` file to use the OpenAI model. You can get the API key from the [OpenAI website](https://platform.openai.com/account/api-keys).

In [12]:
# Run basic query with OpenAI wrapper
from langchain_openai import OpenAI

llm = OpenAI()

# To specify a particular model refer to the OpenAI documentation - https://platform.openai.com/docs/models
# llm = OpenAI(model_name="gpt-3.5-turbo-0125")

In [13]:
llm.invoke("What is the currency of india")

'\n\nThe currency of India is the Indian Rupee (INR).'

In [60]:
txt_resp = llm.invoke("explain large language models in one sentence")

In [None]:
print(txt_resp, end="\n")

In [21]:
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

In [34]:
llm_chat = ChatOpenAI(temperature=0.3)

messages = [
    SystemMessage(content="You are an expert python programmer"),
    HumanMessage(content="Python method to square given number?"),
]

chat_resp = llm_chat.invoke(messages)
chat_resp

AIMessage(content="You can use the `**` operator to square a given number in Python. Here's an example:\n\n```python\ndef square_number(number):\n    return number ** 2\n\n# Example usage\nresult = square_number(5)\nprint(result)  # Output: 25\n```\n\nIn this example, the `square_number` function takes a number as input and returns the square of that number using the `**` operator.")

## Langchain: Prompt Templates

A prompt for a language model is a set of instructions or input provided by a user to guide the model's response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation.

- String prompt template
- Chat prompt templates

Source: https://python.langchain.com/docs/modules/model_io/prompts/quick_start

### String Prompt Template


In [37]:
from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    "Tell me a {adjective} joke about {content}."
)
prompt_template.format(adjective="funny", content="AI")

'Tell me a funny joke about AI.'

In [55]:
template = """
You are an expert data scientist with an expertise in building deep learning models. 
Explain the concept of {concept} in a couple of lines.
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
    template_format='f-string',
    validate_template=True
)

In [56]:
prompt

PromptTemplate(input_variables=['concept'], template='\nYou are an expert data scientist with an expertise in building deep learning models. \nExplain the concept of {concept} in a couple of lines.\n', validate_template=True)

In [58]:
# Run LLM with PromptTemplate
prompt_template_resp = llm.invoke(prompt.format(concept="generative models"))
print(prompt_template_resp, end="\n")


Generative models are a type of deep learning model that can learn the underlying patterns and structures of a given dataset in order to generate new data that mimics the original data. This allows for the creation of new data samples that are similar to the original data, making generative models useful for tasks such as data augmentation and image generation.


In [57]:

prompt_template_resp = llm.invoke(
    prompt.format(concept="Large Language Models"))
print(prompt_template_resp, end="\n")


Large Language Models (LLMs) refer to deep learning models that are trained on a large amount of text data and can generate human-like text responses. These models use advanced techniques such as attention mechanisms and transformer architectures to understand the context of the given text and produce coherent and relevant responses. LLMs are often used in natural language processing tasks such as text completion, question-answering, and text summarization.


### Chat prompt composition

Source: 
- https://python.langchain.com/docs/modules/model_io/chat/quick_start#messages
- https://python.langchain.com/docs/modules/model_io/prompts/composition#chat-prompt-composition

In [None]:
# import schema for chat messages and ChatOpenAI in order to query chat models like GPT-3.5-turbo or GPT-4

from langchain.schema import AIMessage, HumanMessage, SystemMessage
from langchain.chat_models import ChatOpenAI

In [None]:
llm_chat = ChatOpenAI(temperature=0.3)

messages = [
    SystemMessage(content="You are an expert in python programmer"),
    HumanMessage(content="Python method to square given number?"),
]

chat_resp = llm_chat.invoke(messages)


print(chat_resp.content, end="\n")

Make the output of the model more human-like,

In [31]:
print(chat_resp.content, end="\n")

You can use the `**` operator to square a given number in Python. Here's an example:

```python
def square_number(number):
    return number ** 2

# Example usage
result = square_number(5)
print(result)  # Output: 25
```

In this example, the `square_number` function takes a `number` as input and returns the square of that number using the `**` operator.


#### Stream the output from the chat model to the console.

In [33]:
for chunk in llm_chat.stream(messages):
    print(chunk.content, end="", flush=True)

To square a given number in Python, you can use the `**` operator or the `pow()` function. Here is an example using both approaches:

```python
def square_number(number):
    # Using the ** operator
    squared = number ** 2
    print(f"The square of {number} is {squared}")

    # Using the pow() function
    squared = pow(number, 2)
    print(f"The square of {number} is {squared}")

# Example usage
square_number(5)
```

Output:
```
The square of 5 is 25
The square of 5 is 25
```

In this example, the `square_number()` function takes a `number` parameter and calculates the square using both the `**` operator and the `pow()` function. The result is then printed to the console.

## Langchain: Chains

We can also guide the response with a prompt template. _Prompt templates_ are used to convert raw `user input` to a `better input` to the LLM.

In [10]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [("system", "You are an expert technical advisor."), ("user", "{input}")]
)

chain = prompt | llm

In [11]:
chain.invoke(
    {"input": "Write a Python script to generate a list of prime numbers up to 100.?"}
)

AIMessage(content="Sure, here's a Python script that generates a list of prime numbers up to 100:\n\n```python\ndef is_prime(n):\n    if n <= 1:\n        return False\n    for i in range(2, int(n**0.5) + 1):\n        if n % i == 0:\n            return False\n    return True\n\nprimes = []\nfor num in range(2, 101):\n    if is_prime(num):\n        primes.append(num)\n\nprint(primes)\n```\n\nThis script defines a function `is_prime()` which checks if a given number is prime. It iterates from 2 to the square root of the number (inclusive) and if any divisor is found, it returns `False`. Otherwise, it returns `True`.\n\nThen, it loops through numbers from 2 to 100 and checks if each number is prime using the `is_prime()` function. If a number is prime, it is appended to the `primes` list.\n\nFinally, the script prints the list of prime numbers.")

In [None]:
# Import prompt and define PromptTemplate

from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models. 
Explain the concept of {concept} in a couple of lines
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
)

In [None]:
# Run LLM with PromptTemplate

llm(prompt.format(concept="autoencoder"))

In [None]:
# Import LLMChain and define chain with language model and prompt as arguments.

from langchain.chains import LLMChain

chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain.run("autoencoder"))

In [None]:
# Define a second prompt

second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words",
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)

In [None]:
# Define a sequential chain using the two chains above: the second chain takes the output of the first chain as input

from langchain.chains import SimpleSequentialChain

overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("autoencoder")
print(explanation)

In [None]:
# Import utility for splitting up texts and split up the explanation given above into document chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=0,
)

texts = text_splitter.create_documents([explanation])

In [None]:
# Individual text chunks can be accessed with "page_content"

texts[0].page_content

In [None]:
# Import and instantiate OpenAI embeddings

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model_name="ada")

In [1]:
# Turn the first text chunk into a vector with the embedding

query_result = embeddings.embed_query(texts[0].page_content)
print(query_result)

In [None]:
# Import and initialize Pinecone client

import os
import pinecone
from langchain.vectorstores import Pinecone


pinecone.init(
    api_key=os.getenv("PINECONE_API_KEY"), environment=os.getenv("PINECONE_ENV")
)

In [None]:
# Upload vectors to Pinecone

index_name = "langchain-quickstart"
search = Pinecone.from_documents(texts, embeddings, index_name=index_name)

In [None]:
# Do a simple vector similarity search

query = "What is magical about an autoencoder?"
result = search.similarity_search(query)

print(result)

In [None]:
# Import Python REPL tool and instantiate Python agent

from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonREPLTool
from langchain.python import PythonREPL
from langchain.llms.openai import OpenAI

agent_executor = create_python_agent(
    llm=OpenAI(temperature=0, max_tokens=1000), tool=PythonREPLTool(), verbose=True
)

In [None]:
# Execute the Python agent

agent_executor.run(
    "Find the roots (zeros) if the quadratic function 3 * x**2 + 2*x -1")