# Harness the Power of [LangChain](https://python.langchain.com/en/latest/index.html)


LangChain is an open source framework for building applications based on large language models (LLMs). LLMs are large deep-learning models pre-trained on large amounts of data that can generate responses to user queries—for example, answering questions or creating images from text-based prompts. LangChain provides tools and abstractions to improve the customization, accuracy, and relevancy of the information the models generate. For example, developers can use LangChain components to build new prompt chains or customize existing templates. LangChain also includes components that allow LLMs to access new datasets without re-training. 

In this notebook, you explore how to use LangChain to develop applications powered by LLMs, harnessing LLM capabilities for various tasks.

## Step 1. Deploy text generation model.

In this step, you use the SageMaker Python SDK to deploy the Falcon model for text generation. This permissively licensed ([Apache-2.0](https://jumpstart-cache-prod-us-east-2.s3.us-east-2.amazonaws.com/licenses/Apache-License/LICENSE-2.0.txt)) open source model is trained on the [RefinedWeb dataset](https://huggingface.co/datasets/tiiuae/falcon-refinedweb).

In [None]:
from sagemaker.jumpstart.model import JumpStartModel

model_id = "huggingface-llm-falcon-7b-instruct-bf16"
model = JumpStartModel(model_id=model_id, instance_type="ml.g5.2xlarge", model_version="4.6.0")
model.deploy(accept_eula=False)

## Step 2. Install LangChain and other required Python modules.



In [None]:
import warnings
warnings.filterwarnings('ignore')

!pip install --upgrade pip --root-user-action=ignore
!pip install langchain==0.3.20 --quiet --root-user-action=ignore
!pip install --upgrade langchain_community==0.3.19 --quiet --root-user-action=ignore
!pip install faiss-cpu --quiet --root-user-action=ignore

Faiss is a library for efficient similarity searches and the clustering of dense vectors. Faiss is used for vector similarity searches in LangChain. To learn more, see the Faiss documentation at https://faiss.ai/index.html.

## Step 3: Import the required Python modules and set up the Amazon SageMaker runtime client.

For more information about the Boto3 SageMakerRuntime client, see the documentation page at https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime.html.

In [None]:
import time
import boto3, json
from typing import Any, Dict, List, Optional

session = boto3.Session()
sagemaker_runtime_client = session.client("sagemaker-runtime")

## Step 4: Use LangChain with an LLM hosted on a SageMaker endpoint to test a basic Q&A app. 

For more information about LangChain and SageMaker integration, see the documentation page at https://python.langchain.com/docs/integrations/llms/sagemaker.

This code section sets up a question and answering chain by using a SageMaker endpoint as the LLM provider. The code defines a prompt template to be used with the LLM. The code also defines a custom content handler to handle input and output formatting.


In [None]:
from typing import Dict
from langchain.chains.question_answering import load_qa_chain
from langchain_core.prompts import PromptTemplate 
from langchain_community.llms import SagemakerEndpoint 
from langchain_community.llms.sagemaker_endpoint import LLMContentHandler
from langchain.docstore.document import Document


example_doc_1 = """
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from 
leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon via a single 
API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, 
and responsible AI. Using Amazon Bedrock, you can easily experiment with and evaluate top FMs for your use case, 
privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation 
(RAG), and build agents that execute tasks using your enterprise systems and data sources. Since Amazon Bedrock 
is serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy generative 
AI capabilities into your applications using the AWS services you are already familiar with.
"""

docs = [
    Document(
        page_content=example_doc_1,
    )
]

prompt_template = """Use the following pieces of context to answer the question at the end.

{context}

Question: {question}
Answer:"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

class ContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs: Dict) -> bytes:
        input_str = json.dumps({"inputs": prompt, "parameters": model_kwargs})
        # print(input_str)
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        # print(response_json)
        return response_json[0]["generated_text"]


content_handler = ContentHandler()


## Step 5: Test the Q&A chain with a sample question.

This code block initializes the question and answering chain with the SageMaker endpoint and prompt template. The code invokes the chain with the example document and a sample question, and the code then prints the result.


In [None]:
embedding_endpoint_name = "<replace-with-jumpstart-embedding-endpoint>"
instruct_endpoint_name = "<replace-with-falcon-instruct-endpoint>"

parameters = {
    "max_length": 200,
    "num_return_sequences": 1,
    "top_k": 250,
    "top_p": 0.95,
    "do_sample": False,
    "temperature": 1
}

prompt = PROMPT
llm = SagemakerEndpoint(
    endpoint_name=instruct_endpoint_name,
    region_name='us-east-1',
    model_kwargs=parameters,
    content_handler=content_handler,
)
chain = prompt | llm

question = "What is Amazon Bedrock?"
result = chain.invoke({"context": docs, "question": question})
print(result)

## Step 6: Use Retrieval Augmented Generation (RAG), with LangChain and SageMaker endpoints, to build a basic Q&A app.



This practice section uses document embeddings to fetch the most relevant documents in the document knowledge library and combine the documents with the prompt that is provided to the LLM.

To achieve this, you will:
1. Generate embeddings for each document in the knowledge library by using the SageMaker GPT-J-6B embedding model.
2. Identify the top K most relevant documents based on the user's query:
    - 2.1 Generate the embedding of the query by using the same embedding model.
    - 2.2 Search for the indexes of the top K most relevant documents in the embedding space by using an in-memory Faiss search.
    - 2.3 Use the indexes to retrieve the corresponding documents.
3. Combine the retrieved documents with the prompt and question, and send them to the SageMaker LLM.

Note: The retrieved documents should be large enough to contain sufficient information to answer the question, but small enough to fit into the LLM prompt, which has a maximum sequence length of 1024 tokens.

---
To build the basic Q&A app with LangChain, you must:

1. Wrap the SageMaker endpoints for the embedding model and the LLM into `langchain_community.embeddings.SagemakerEndpointEmbeddings` and `langchain_community.embeddings.sagemaker_endpoint`.

2. Prepare the dataset to build the knowledge database.

---

In [None]:
# Wrap the SageMaker endpoint for the embedding model into langchain_community.embeddings.SagemakerEndpointEmbeddings.
# This code defines a custom subclass of SagemakerEndpointEmbeddings to handle document embeddings by using a SageMaker endpoint.
# The code also defines a custom content handler for input and output formatting.

from langchain_community.embeddings import SagemakerEndpointEmbeddings 
from langchain_community.embeddings.sagemaker_endpoint import EmbeddingsContentHandler


class SagemakerEndpointEmbeddingsJumpStart(SagemakerEndpointEmbeddings):
    def embed_documents(self, texts: List[str], chunk_size: int = 5) -> List[List[float]]:
        """Compute doc embeddings using a SageMaker Inference Endpoint.

        Args:
            texts: The list of texts to embed.
            chunk_size: The chunk size defines how many input texts will
                be grouped together as request. If None, will use the
                chunk size specified by the class.

        Returns:
            List of embeddings, one for each text.
        """
        results = []
        _chunk_size = len(texts) if chunk_size > len(texts) else chunk_size

        for i in range(0, len(texts), _chunk_size):
            response = self._embedding_func(texts[i : i + _chunk_size])
            print
            results.extend(response)
        return results


class ContentHandler(EmbeddingsContentHandler):
    content_type = "application/json"
    accepts = "application/json"
 
    def transform_input(self, text_inputs: List[str], model_kwargs: dict) -> bytes:
        """
        Transforms the input into bytes that can be consumed by SageMaker endpoint.
        Args:
            text_inputs (list[str]): A list of input text strings to be processed.
            model_kwargs (Dict): Additional keyword arguments to be passed to the endpoint.
               Possible keys and their descriptions:
               - mode (str): Inference method. Valid modes are 'embedding', 'nn_corpus', and 'nn_train_data'.
               - corpus (str): Corpus for Nearest Neighbor. Required when mode is 'nn_corpus'.
               - top_k (int): Top K for Nearest Neighbor. Required when mode is 'nn_corpus'.
               - queries (list[str]): Queries for Nearest Neighbor. Required when mode is 'nn_corpus' or 'nn_train_data'.
        Returns:
            The transformed bytes input.
        """
        input_str = json.dumps(
            {
                "text_inputs": text_inputs,
                **model_kwargs
            }
        )
        return input_str.encode("utf-8")
 
    def transform_output(self, output: bytes) -> List[List[float]]:
        """
        Transforms the bytes output from the endpoint into a list of embeddings.
        Args:
            output: The bytes output from SageMaker endpoint.
        Returns:
            The transformed output - list of embeddings
        Note:
            The length of the outer list is the number of input strings.
            The length of the inner lists is the embedding dimension.
        """
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json["embedding"]


content_handler = ContentHandler()

embeddings = SagemakerEndpointEmbeddingsJumpStart(
    endpoint_name=embedding_endpoint_name,
    region_name='us-east-1',
    model_kwargs={"mode": "embedding"},
    content_handler=content_handler,
)

In [None]:
# This code wraps the SageMaker endpoint for the LLM into a SagemakerEndpoint object from LangChain.
# The code also defines a custom content handler for input and output formatting.

parameters = {
    "max_length": 200,
    "num_return_sequences": 1,
    "top_k": 250,
    "top_p": 0.95,
    "do_sample": False,
    "temperature": 1
}


class ContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs={}) -> bytes:
        input_str = json.dumps({"inputs": prompt, **model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json[0]["generated_text"]


content_handler = ContentHandler()

sm_llm = SagemakerEndpoint(
    endpoint_name=instruct_endpoint_name,
    region_name='us-east-1',
    model_kwargs=parameters,
    content_handler=content_handler,
)

### Step 6.1 : Preprocess data 
Now, it's time to use the example data and prepare it for the demonstration. The knowledge library is provided by Amazon SageMaker FAQs at https://aws.amazon.com/sagemaker/faqs/. The data is formatted in a .csv file with two columns: Question and Answer. The Answer column is used for the documents of the knowledge library, from which relevant documents are retrieved based on a query. 

To build your own custom Q&A app, you can replace this example dataset with your own.

For cases when you have data saved in multiple subsets, the following code reads all files that end with .csv and concatenates them together. Make sure each .csv file has the same format.

In [None]:
import glob
import os
import pandas as pd

all_files = glob.glob(os.path.join("rag_data/", "*.csv"))

df_knowledge = pd.concat(
    (pd.read_csv(f, header=None, names=["Question", "Answer"]) for f in all_files),
    axis=0,
    ignore_index=True,
)

Drop the Question column, which is not used in this demonstration.

In [None]:
df_knowledge.drop(["Question"], axis=1, inplace=True)
df_knowledge.head(5)

### Step 6.2: Use LangChain to use processed data and LLM

In [None]:
# Set up the necessary imports for building the Q&A app.
from langchain.chains import RetrievalQA 
from langchain.indexes.vectorstore import VectorstoreIndexCreator 
from langchain_community.document_loaders.text import TextLoader 
from langchain_community.vectorstores.faiss import FAISS 
from langchain_text_splitters import CharacterTextSplitter 
from langchain_core.prompts import PromptTemplate 
from langchain.chains.question_answering import load_qa_chain 
from langchain_community.document_loaders import DataFrameLoader

Use LangChain to read the .csv file data. LangChain has multiple built-in functions to read different file formats, such as .txt, .html, and .pdf. For more information, see LangChain document loaders at https://python.langchain.com/docs/integrations/document_loaders/.

In [None]:
# Load the processed data into LangChain.
loader = DataFrameLoader(df_knowledge,page_content_column="Answer")
documents = loader.load()

Build the Q&A app with LangChain.

Based on the following question, you can achieve the points in Step 4 with just a few lines of code, as shown below the question.

In [None]:
question = "What is Amazon SageMaker?"

In [None]:
index_creator = VectorstoreIndexCreator(
    vectorstore_cls=FAISS,
    embedding=embeddings,
    text_splitter=CharacterTextSplitter(chunk_size=300, chunk_overlap=0),
)

In [None]:
index = index_creator.from_loaders([loader])

In [None]:
index.query(question=question, llm=sm_llm)

## Step 7. Customize the previous Q&A app with a different prompt.

You see how quickly LangChain can be used to create a question and answering application with just a few lines of code. You can break down the previous `VectorstoreIndexCreator` to see what's happening under the hood. You can also see how to use a custom prompt instead of a default prompt with `VectorstoreIndexCreator`.

First, generate embeddings for each document in the knowledge library by using the SageMaker GPT-J-6B embedding model.

In [None]:
docsearch = FAISS.from_documents(documents, embeddings)

In [None]:
question

Based on the previous question, you then identify the top K most relevant documents based on the user query, where K = 3 in this setup.

In [None]:
docs = docsearch.similarity_search(question, k=3)

Print the top three most relevant documents, as shown below.

In [None]:
n = 0
for doc in docs:
    print(f"{n}: {doc.page_content} \n")
    n+=1

Lastly, combine the retrieved documents with the prompt and question and send them to the SageMaker LLM.

You define a customized prompt, as shown below.

In [None]:
prompt_template = """Answer based on context:\n\n{context}\n\n{question}"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

In [None]:
prompt=PROMPT
chain = prompt | llm

Send the top three most relevant documents and question to the LLM to get a answer.

In [None]:
result = chain.invoke({"context": docs, "question": question})

Print the final answer from the LLM, as shown below, which is accurate.

In [None]:
result

## Step 8: Apply additional use cases.

Using LangChain and an LLM helps you create straightforward custom tools, such as code generators and chatbots.

### Step 8.1 : Create a code generator with LangChain and an LLM.

The following code snippet outlines the setup for automated code generation using LangChain and an LLM.

- The code imports the necessary components from LangChain and initializes the LLM.
- A prompt template is defined to guide the LLM in writing Python functions based on task descriptions.
- A PromptTemplate instance is created, configuring how the task description is processed.
- An LLMChain instance is then established, combining the prompt and the LLM, ready to generate code.
- The system generates code by running the LLMChain with a given task description, demonstrating a streamlined method for converting task descriptions into functional Python code.

In [None]:
from langchain import PromptTemplate, LLMChain

# Define the SageMaker LLM
llm = sm_llm

# Define the prompt template
prompt_template = """
Write a Python function that {task_description}.
"""

# Create the prompt template instance
prompt = PromptTemplate(
    input_variables=["task_description"],
    template=prompt_template,
)

# Create the LLMChain instance with the prompt and LLM
llm_chain = prompt | llm

# Generate code based on the task description
task_description = "takes a list of numbers and returns the sum of all even numbers"
output = llm_chain.invoke(input=task_description)
print(output)

### Step 8.2: Create a chatbot with LangChain and an LLM.

This code snippet demonstrates the process of establishing a conversation chain equipped with memory functionality, using LangChain and an LLM.

- The code imports ConversationBufferMemory and ConversationChain from LangChain. The code then initializes the LLM by using a specific model.
- The code creates a memory buffer designed to store and manage the context of a conversation, making sure past interactions can be recalled and used to inform an ongoing dialogue.
- The final step involves constructing the ConversationChain, which seamlessly integrates both the LLM and the newly created memory buffer.

Through this structured approach, the code facilitates the development of conversational AI interactions capable of retaining and using past dialogue. This capability significantly improves the interactions, making them more coherent and context-sensitive, thus mimicking a more natural and engaging conversational experience.

In [None]:
from langchain.memory import ConversationBufferMemory
from langchain import ConversationChain
from langchain.schema import SystemMessage

parameters = { 
    "max_length": 200, 
    "num_return_sequences": 1, 
    "top_k": 10, 
    "top_p": 0.01, 
    "do_sample": False, 
    "temperature": 0, }

sm_llm = SagemakerEndpoint(
    endpoint_name=instruct_endpoint_name,
    region_name='us-east-1',
    model_kwargs=parameters,
    content_handler=content_handler,
)

# Initialize the LLM (in this case, OpenAI's GPT-3)
llm = sm_llm

# # Initialize the memory
memory = ConversationBufferMemory(memory_key="history", return_messages=False)

# # Add system message
memory.chat_memory.add_message(SystemMessage(content="You are a helpful professional assistant. Respond to the question only."))

# Create the conversation chain with memory
conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=False
)

In [None]:
def chat_with_ai():
    print("Hi! I'm an AI assistant. How can I help you today?")
    try:
        while True:
            human_input = input("Human: ").strip()
            if human_input.lower() == 'exit':
                print("Assistant: Goodbye!")
                break  # Exit the loop if the user types 'exit'
            
            # Process the input through the conversation chain using the run method.
            response = conversation.run(input=human_input)
            
            # Print the AI's response.
            print(f"Assistant: {response}")
    except KeyboardInterrupt:
        print("\nAssistant: Goodbye!")

if __name__ == "__main__":
    chat_with_ai()