# RAG (Retrieval Augmented Generation) and Chatbot

## Pre-Requisites

### LLM model (VARCO) deploy in SageMaker
Make sure that you have ran the Notebook 1_deploy-varco_model_13_IST.ipynb

In [None]:
%store -r endpoint_name

In [None]:
import os
os.environ["VARCO_ENDPOINT"]=endpoint_name
print(os.environ["VARCO_ENDPOINT"])

In [None]:
!apt update

In [None]:
!apt install wkhtmltopdf -y

### Install certain libraries which are needed for this run.
These are provided in the requirements.txt or you can run these cells to fine control which libraries you need

In [None]:
!pip install --upgrade pip

In [None]:
!pip install langchain==0.0.161 --quiet

In [None]:
# !pip install chromadb==0.3.21 --quiet

In [None]:
!pip install langchain==0.0.161 boto3 html2text jinja2 --quiet

In [None]:
!pip install faiss-cpu==1.7.4 --quiet

In [None]:
!pip install pypdf==3.8.1 --quiet

In [None]:
!pip install transformers==4.24.0 --quiet

In [None]:
!pip install sentence_transformers==2.2.2 --quiet

In [None]:
!pip install pdfkit

In [None]:
import sentence_transformers 
sentence_transformers.__version__

In [None]:
print("all libraries installed")

### Import statements for our chain and indexers. We are not using any explicit agent here

In [None]:
#from aws_langchain.kendra_index_retriever import KendraIndexRetriever
from langchain.chains import ConversationalRetrievalChain
from langchain import SagemakerEndpoint
from langchain.llms.sagemaker_endpoint import ContentHandlerBase
from langchain.prompts import PromptTemplate
import sys
import json
import os
import time
import sagemaker, boto3, json
from sagemaker.session import Session
from sagemaker.model import Model
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base
from typing import Any, Dict, List, Optional
from langchain.embeddings import SagemakerEndpointEmbeddings
from langchain.llms.sagemaker_endpoint import ContentHandlerBase

In [None]:
import sagemaker
import boto3
import jinja2
role = sagemaker.get_execution_role()  # execution role for the endpoint

### [Optiona] Deploy a GPT-J embeddings Model - so we can use that to generate the embeddings for the documents

skip

### Use HuggingFaceEmbeddings in the workshop setting. 
If you are in a workshop, please use the below code. If you are using GPTJ model for generating the embeddings, please comment the below cell. 

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from typing import Any, Dict, List, Optional
from pydantic import BaseModel, Extra, Field
from langchain.embeddings.base import Embeddings
import numpy as np

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cpu'}


class CustomHFEmbeddings(HuggingFaceEmbeddings):
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Compute doc embeddings using a HuggingFace transformer model.

        Args:
            texts: The list of texts to embed.

        Returns:
            List of embeddings, one for each text.
        """
        texts = list(map(lambda x: x.replace("\n", " "), texts))
        embeddings = self.client.encode(texts, **self.encode_kwargs)
        #- (22, 1536)
        print(f"CustomHFEmbeddings::embed_documents::shape:returned -- > {embeddings.shape}:")
        
        return embeddings.tolist()
    def embed_query(self, text: str) -> List[float]:
            """Compute query embeddings using a HuggingFace transformer model.

            Args:
                text: The text to embed.

            Returns:
                Embeddings for the text.
            """
            text = text.replace("\n", " ")
            embedding = self.client.encode(text, **self.encode_kwargs)
            print(f"CustomHFEmbeddings::QUERY::shape:returned -- > {embedding.shape}:")
            return embedding.tolist()

hf_embeddings = CustomHFEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

### Test the VARCO model 
Testing VARCO 13B IST model for answering a random question.

In [None]:
sagemaker_session = sagemaker.Session()
runtime = boto3.client("runtime.sagemaker")

model_name = endpoint_name
content_type = "application/json"

In [None]:
prompt = {"text":"Answer this question below, How can it help me?"}
print(f"Question being asked is -- > {prompt}:")

response = runtime.invoke_endpoint(
    EndpointName=model_name,
    ContentType=content_type,
    Accept="application/json",
    Body=json.dumps(prompt)
)

json.load(response["Body"])

## Section 2: Use LangChain

We will follow this pattern for the rest of the section

<li>Exploring vector databases
<li>Basics of QA exploring simple chains
<li>Basics of chatbot
<li>Going to prompt templates,
<li>Exploring Chains

### Exploring Vector DataBases and Create the Embeddings. 

Leverage SageMaker GPT-J model or the same

#### Use the file based document to retrieve based on embeddings

Run the below to visualize the Dataset

In [None]:
import glob
import os
import pandas as pd

all_files = glob.glob(os.path.join("rag_data/", "*.csv"))

df_knowledge = pd.concat(
    (pd.read_csv(f, header=None, names=["Question", "Answer"]) for f in all_files),
    axis=0,
    ignore_index=True,
)

#- drop 
df_answer = df_knowledge.drop(["Question"], axis=1)

print(df_knowledge.shape)
df_knowledge.head(2)

#### Convert csv to pdf
temp.html 로컬 파일 생성

In [None]:
!pip install weasyprint

In [None]:
from bs4 import BeautifulSoup
import requests

# 한글 FAQ 문서 URL
url = 'https://aws.amazon.com/ko/sagemaker/faqs'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Add a style tag with desired font
font_tag = soup.new_tag('style')
font_tag.string = "@import url('https://fonts.googleapis.com/css2?family=Noto+Sans+KR&display=swap'); body { font-family: 'Noto Sans KR', sans-serif; }"
soup.head.append(font_tag)

# Save the modified HTML to a temporary file.
with open('rag_data/temp.html', 'w') as f:
    f.write(str(soup))


temp.html 파일을 Amazon_SageMaker_FAQs.pdf로 변환

In [None]:
# Generate the PDF from the modified HTML.

from weasyprint import HTML

HTML('rag_data/temp.html').write_pdf('rag_data/Amazon_SageMaker_FAQs.pdf')



## <i> 한글 PDF 파일 만드는 로직은 조금 더 고민.. 일단은 SageMaker FAQ (한글) PDF 파일은 수동으로 생성하여 rag_data에 업로드

In [None]:
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import Chroma, AtlasDB, FAISS
from langchain.text_splitter import CharacterTextSplitter
from langchain import PromptTemplate
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders.csv_loader import CSVLoader

In [None]:
import time
import sagemaker, boto3, json
from sagemaker.session import Session
from sagemaker.model import Model
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base
from typing import Any, Dict, List, Optional
from langchain.embeddings import SagemakerEndpointEmbeddings
from langchain.llms.sagemaker_endpoint import ContentHandlerBase

#### Create the embeddings for document search

In [None]:
from langchain.indexes import VectorstoreIndexCreator

#### Vector store indexer. 

This is what stores and matches the embeddings.This notebook showcases Chroma and FAISS and will be transient and in memory. The VectorStore Api's are available [here](https://python.langchain.com/en/harrison-docs-refactor-3-24/reference/modules/vectorstore.html)

We will use our own Custom implementation of SageMaker Embeddings which needs a reference to the SageMaker endpoint to call the model which will return the embeddings. This will be used by the FAISS or Chroma to store in memory and be used when ever the User runs a query

#### Use LangChain to leverage a SageMaker LLM 

Let's break down the above VectorstoreIndexCreator and see what's happening under the hood. Furthermore, we will see how to incorporate a customize prompt rather than using a default prompt with VectorstoreIndexCreator.

Firstly, we generate embedings for each of document in the knowledge library with SageMaker  embedding model.


In [None]:
from langchain.llms.sagemaker_endpoint import SagemakerEndpoint
from langchain.llms.sagemaker_endpoint import LLMContentHandler
import ast

"""
parameters = {
    "max_length": 200,
    "num_return_sequences": 1,
    "top_k": 250,
    "top_p": 0.95,
    "do_sample": False,
    "temperature": 1,
}
"""

parameters = {
    "request_output_len": 256,
    "repetition_penalty": 1.15,
    "temperature": 0.1,
    "top_k": 50,
    "top_p": 1.0
}


MAX_CHARACTER_TRUNCATION=10000 # at 20k it produced garbage results

class ContentHandlerSMLMI(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs={}) -> bytes:
        #input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
        print(f"ContentHandlerSMLMI::LangChain:::LEN:input_str={len(prompt)}:: will truncate if > {MAX_CHARACTER_TRUNCATION}::")
        if len(prompt) > MAX_CHARACTER_TRUNCATION:
            prompt=prompt[:MAX_CHARACTER_TRUNCATION]
        input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
        #print(f"ContentHandlerSMLMI::LangChain:::LEN:input_str={len(input_str)}::")
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json_dict = json.loads(output.read().decode("utf-8"))
        print(f"ContentHandlerSMLMI::LangChain::output={response_json_dict}:")
        return response_json_dict[list(response_json_dict.keys())[0]] [0]


content_handler_sm_llm = ContentHandlerSMLMI()
session = boto3.Session()
boto3_sm_client = boto3.client(
    "sagemaker-runtime"
    # **boto3_kwargs
)
print(boto3_sm_client)


sm_llm = SagemakerEndpoint(
    client = boto3_sm_client,
    endpoint_name=os.environ["VARCO_ENDPOINT"],
    region_name='us-west-2',
    model_kwargs=parameters,
    content_handler=content_handler_sm_llm,
)

print(f"SageMaker LLM created at {sm_llm}::")

#### Load the Data from our Documents Source. 

Then we will feed this into the VectorStore to create the embeddings using the loaders like [here](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/directory_loader.html). First we will try with the SageMaker FAQ PDF document and also the IRS PDF files

we will create 3 Loaders and 3 documents after doing a split on them. 1st loader for amazon faq, 2nd for some of the IRS PDF's, 3rd just for  some ramdom example. For text it will be just a separate loader, text loader vs pdf

In [None]:
from langchain.document_loaders import TextLoader
from langchain.document_loaders.csv_loader import CSVLoader

from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("rag_data/Amazon_SageMaker_FAQs.pdf")
documents_aws = loader.load() # -- gives 2 docs
documents_split = loader.load_and_split() # - gives 22 docs

vectorstore_faiss_aws = FAISS.from_documents(
    CharacterTextSplitter(chunk_size=300, chunk_overlap=0).split_documents(documents_aws), 
    hf_embeddings, 
    #k=1
    #**k_args
)#### VectorStore as FAISS 

You can read up about [FAISS](https://arxiv.org/pdf/1702.08734.pdf) in memory vector store here. However for our example it will be the same 

Chroma

[Chroma](https://www.trychroma.com/) is a super simple vector search database. The core-API consists of just four functions, allowing users to build an in-memory document-vector store. By default Chroma uses the Hugging Face transformers library to vectorize documents.

Weaviate

[Weaviate](https://github.com/weaviate/weaviate) is a very posh looking tool - not only does Weaviate offer a GraphQL API with support for vector search. It also allows users to vectorize their content using Weaviate's inbuilt modules or custom modules.

In [None]:
%%time
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders import TextLoader
from langchain.document_loaders.csv_loader import CSVLoader

from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Chroma, AtlasDB, FAISS

from langchain.document_loaders import PyPDFLoader
import glob
import os
import pandas as pd
from langchain.document_loaders import DirectoryLoader

from langchain.indexes import VectorstoreIndexCreator
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

k_args = {"k": 1}
# - sub_docs = self.text_splitter.split_documents(docs)
# - create Vectorstore
vectorstore_faiss_aws = FAISS.from_documents(
    CharacterTextSplitter(chunk_size=300, chunk_overlap=0).split_documents(documents_aws), 
    hf_embeddings, 
    #k=1
    #**k_args
)

wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss_aws)

#### First way of running the Query. High Level abstraction

Leverage VectorStoreIndexCreator which wraps around the RetrievalQA and provides a high level API abstraction to generate the response. This is a wrapper around the underlying API's which we will explore below

In [None]:
#query="Simplified method for business use of home deduction"
query="Amazon SageMaker 란 무엇인가"

In [None]:
wrapper_store_faiss.query(question="Amazon SageMaker",llm=sm_llm)