<a href="https://colab.research.google.com/github/jyoti-jha/openai-training/blob/main/Langchain_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Langchain


Installing Helper Packages

In [None]:
!pip install -q langchain-openai
!pip install langchain-community langchain-core
!pip install -qU langchain_community pypdf
!pip install -qU langchain-openai
!pip install -qU langchain_chroma



In [None]:
from langchain_openai import AzureChatOpenAI
model = AzureChatOpenAI(
        azure_endpoint="https://eygroup02.openai.azure.com/",
        azure_deployment="Gropu2Lang",
        openai_api_version="2024-06-01",
        api_key="c533939b29764cd4a03ff02f1d831057"
    )

In [None]:
res = model.invoke("Hi! What day is today")
res.content

'Today is Wednesday.'

In [None]:
!ls

sample_data  Telco-Customer-Churn.csv  Telco_Customer_Churn_Prediction.pdf


Load and Prepare PDF

In [None]:
from langchain_community.document_loaders import PyPDFLoader

pdf_file = "Telco_Customer_Churn_Prediction.pdf"
loader = PyPDFLoader(pdf_file)
docs = loader.load()

In [None]:
print("len:",len(docs),"\n","content:",docs[0].page_content[0:100])

len: 9 
 content: Highlights in Science, Engineering and Technology  SDPIT  2024 
Volume 92 (2024)  
 
218 Telco Custo


### Configuring Azure OpenAI
Notice how we're using different endpoints for Embeddings and Completion

In [None]:
AZURE_OPENAI_ENDPOINT = "https://eygroup02.openai.azure.com/"
AZURE_OPENAI_EMBEDDINGS_ENDPOINT="https://eygroup02.openai.azure.com/openai/deployments/Group2-ada/embeddings?api-version=2023-05-15"
AZURE_OPENAI_DEPLOYMENT_NAME = "Gropu2Lang"
AZURE_OPENAI_API_VERSION = "2024-06-01"
AZURE_OPENAI_API_KEY = "c533939b29764cd4a03ff02f1d831057"

In [None]:
import os
os.environ["AZURE_OPENAI_ENDPOINT"] = AZURE_OPENAI_ENDPOINT
os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"] = AZURE_OPENAI_DEPLOYMENT_NAME
os.environ["AZURE_OPENAI_API_VERSION"] = AZURE_OPENAI_API_VERSION
os.environ["AZURE_OPENAI_EMBEDDINGS_ENDPOINT"] = AZURE_OPENAI_EMBEDDINGS_ENDPOINT
os.environ["AZURE_OPENAI_API_KEY"] = AZURE_OPENAI_API_KEY

In [None]:
from langchain_openai import AzureChatOpenAI
model = AzureChatOpenAI(
    azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
    azure_deployment = os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
    openai_api_version = os.environ["AZURE_OPENAI_API_VERSION"]
)

### Preprocess PDF

In [None]:
from langchain_chroma import Chroma
from langchain_openai import AzureOpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

Split text, create embeddings and store in vector store (Load, Chunk and Index)

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
embeddings = AzureOpenAIEmbeddings(azure_endpoint=os.environ["AZURE_OPENAI_EMBEDDINGS_ENDPOINT"])
print(embeddings)
vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)

retriever = vectorstore.as_retriever()

client=<openai.resources.embeddings.Embeddings object at 0x7f1be932b190> async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x7f1be93505e0> model='text-embedding-ada-002' dimensions=None deployment=None openai_api_version='2023-05-15' openai_api_base=None openai_api_type='azure' openai_proxy='' embedding_ctx_length=8191 openai_api_key=SecretStr('**********') openai_organization=None allowed_special=None disallowed_special=None chunk_size=2048 max_retries=2 request_timeout=None headers=None tiktoken_enabled=True tiktoken_model_name=None show_progress_bar=False model_kwargs={} skip_empty=False default_headers=None default_query=None retry_min_seconds=4 retry_max_seconds=20 http_client=None http_async_client=None check_embedding_ctx_length=True azure_endpoint='https://eygroup02.openai.azure.com/openai/deployments/Group2-ada/embeddings?api-version=2023-05-15' azure_ad_token=None azure_ad_token_provider=None validate_base_url=True


### Create RAG Pipeline

Import the helper functions for RAG pipeline

In [None]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

Prompt 1 -
```Contextualize the user's query based on chat history```

In [None]:
context_system_prompt = (
    "What are the two primary challenges"
    "How to survive in the market, "
    "strategies to retain customer "
)



Template to guide model behaviour.

This template allows to retain context from chat history

In [None]:
messages = [
    ("system", context_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
]
prompt = ChatPromptTemplate.from_messages(messages)

Create ```create_history_aware_retriever``` to fetch relevant context from user's document based on the user's query and the conversation history

In [None]:
context_retriever = create_history_aware_retriever(
    llm=model, retriever=retriever, prompt=prompt
)


Prompt 2 - ```To generate the final answer using the relevant context.```

In [None]:
system_prompt = (
    "You are a helpful assistant expert churn customer data analysis."
    "What are the two primary challenges"
    "How to survive in the market, "
    "strategies to retain customer "
    "\n\n"
   "{context}"
)

In [None]:
system_prompt

'You are a helpful assistant expert churn customer data analysis.What are the two primary challengesHow to survive in the market, strategies to retain customer \n\n{context}'

Template to guide model behaviour.

This template allows to answer user's query with retrieved information

In [None]:
conversation_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

Create a ```context_chain```
which "stuffs" the relevant documents into the language model's prompt, along with the user's query, and returns a response.

In [None]:
context_chain = create_stuff_documents_chain(
    model,
    conversation_prompt
)

Create RAG chain - serves two purposes

1 - Uses the ```context_retriever``` to find relevant text chunks from the vector database (Chroma).

2 - Passes the retrieved context and the question to the language model to generate an answer.

In [None]:
rag_chain = create_retrieval_chain(context_retriever,context_chain)

### Pass, Store and Update Chat History

Import packages to enable dynamic chat history support

In [None]:
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

```RunnableWithMessageHistory``` Wraps the RAG chain to maintain a history of the chat conversation.

```get_session_history``` is a helper function to create a ```ChatMessageHistory``` object for a new ```session_id``` or to retrieve ```ChatMessageHistory``` for an existing ```session_id```

In [None]:
store = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

### Invoking the conversational RAG chain
Presenting answer, and cofiguring a ```session_id``` to store unique chat history

In [None]:
conversational_rag_chain.invoke(
    {"input": "Which gender are retained?"},
    config={
        "configurable": {"session_id": "myuniqueid001"}
    },
)["answer"]

'The given information does not provide any data or analysis related to gender and customer retention. Therefore, it is not possible to determine which gender is more likely to be retained based on the given information.'

In [None]:
store

{'myuniqueid001': InMemoryChatMessageHistory(messages=[HumanMessage(content='Which gender are retained?'), AIMessage(content='The given information does not provide any data or analysis related to gender and customer retention. Therefore, it is not possible to determine which gender is more likely to be retained based on the given information.')])}

In [None]:
conversational_rag_chain.invoke(
    {"input": "Where this informations are coming from?"},
    config={
        "configurable": {"session_id": "myuniqueid002"}
    },
)["answer"]

'The information provided is sourced from various academic papers and articles on customer churn analysis and prediction. The specific sources are mentioned in the citations provided with each piece of information.'

In [None]:
store


{'myuniqueid001': InMemoryChatMessageHistory(messages=[HumanMessage(content='Which gender are retained?'), AIMessage(content='The given information does not provide any data or analysis related to gender and customer retention. Therefore, it is not possible to determine which gender is more likely to be retained based on the given information.')]),
 'myuniqueid002': InMemoryChatMessageHistory(messages=[HumanMessage(content='Where this informations are coming from?'), AIMessage(content='The information provided is sourced from various academic papers and articles on customer churn analysis and prediction. The specific sources are mentioned in the citations provided with each piece of information.')])}

In [None]:
conversational_rag_chain.invoke(
    {"input": "Where model is used to predict the customer churn rate?"},
    config={
        "configurable": {"session_id": "myuniqueid002"}
    },
)["answer"]

'In the study mentioned, four different models were used to predict customer churn rate: Classification Tree Model, Random Forest Model, Support Vector Machine (SVM) Model, and Logistic Regression Model. These models were employed to estimate the target churn rate and compare their efficiency and performance in predicting churn within the telecom industry.'

In [None]:
store

{'myuniqueid001': InMemoryChatMessageHistory(messages=[HumanMessage(content='Which gender are retained?'), AIMessage(content='The given information does not provide any data or analysis related to gender and customer retention. Therefore, it is not possible to determine which gender is more likely to be retained based on the given information.'), HumanMessage(content='else'), AIMessage(content='Without specific data or analysis on gender and customer retention, it is difficult to make definitive statements about which gender is more likely to be retained. Retention rates can vary based on various factors such as individual preferences, satisfaction with services, pricing, and overall customer experience. It is important to conduct a comprehensive analysis that includes gender as a variable and factors in other relevant variables to determine any potential correlations between gender and customer retention.')]),
 'myuniqueid002': InMemoryChatMessageHistory(messages=[HumanMessage(content

In [None]:
while True:
  user_input = input("Enter 'q' to quit or anything else to continue: ")
  if user_input == 'q':
    break
  else:
    res = conversational_rag_chain.invoke(
    {"input": user_input},
    config={
        "configurable": {"session_id": "myuniqueid001"}
    },
    )["answer"]
    print(res)


Enter 'q' to quit or anything else to continue: else
Without specific data or analysis on gender and customer retention, it is difficult to make definitive statements about which gender is more likely to be retained. Retention rates can vary based on various factors such as individual preferences, satisfaction with services, pricing, and overall customer experience. It is important to conduct a comprehensive analysis that includes gender as a variable and factors in other relevant variables to determine any potential correlations between gender and customer retention.
Enter 'q' to quit or anything else to continue: q
