### Sample Gen AI app with LangChain

In [4]:
import os
from dotenv import load_dotenv
load_dotenv()

# genAI API key
GENERATIVE_AI_API_KEY = os.getenv("GENERATIVE_AI_API_KEY")

# LangChain Tracing :
LANGCHAIN_API_KEY = os.getenv("LANGCHAIN_API_KEY")
LANGCHAIN_TRACING_V2 = "true"
LANGCHAIN_PROJECT = os.getenv("LANGCHAIN_PROJECT")

from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", google_api_key=GENERATIVE_AI_API_KEY) # must needs to pass the `model`

In [5]:
# Data Ingetion from website need to scrap the data
# https://docs.smith.langchain.com/tutorials/Administrators/manage_spend

from langchain_community.document_loaders import WebBaseLoader
load_website = WebBaseLoader("https://docs.smith.langchain.com/tutorials/Administrators/manage_spend")
load_website

<langchain_community.document_loaders.web_base.WebBaseLoader at 0x7f38942c48b0>

In [10]:
# load the data
load_docs = load_website.load()
load_docs

[Document(metadata={'source': 'https://docs.smith.langchain.com/tutorials/Administrators/manage_spend', 'title': 'Optimize tracing spend on LangSmith | 🦜️🛠️ LangSmith', 'description': 'Before diving into this content, it might be helpful to read the following:', 'language': 'en'}, page_content='\n\n\n\n\nOptimize tracing spend on LangSmith | 🦜️🛠️ LangSmith\n\n\n\n\n\n\n\nSkip to main contentGo to API DocsSearchRegionUSEUGo to AppQuick startTutorialsAdministratorsOptimize tracing spend on LangSmithDevelopersHow-to guidesConceptsReferencePricingSelf-hostingLangGraph CloudTutorialsAdministratorsOptimize tracing spend on LangSmithOn this pageOptimize tracing spend on LangSmithRecommended ReadingBefore diving into this content, it might be helpful to read the following:Data Retention Conceptual DocsUsage Limiting Conceptual DocsnoteSome of the features mentioned in this guide are not currently available in Enterprise plan due to its\ncustom nature of billing. If you are on Enterprise plan a

In [13]:
# using `chunk_size`, we need chunk the loaded data as every LLM model have some context window
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_spliter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_spliter.split_documents(load_docs)

documents

[Document(metadata={'source': 'https://docs.smith.langchain.com/tutorials/Administrators/manage_spend', 'title': 'Optimize tracing spend on LangSmith | 🦜️🛠️ LangSmith', 'description': 'Before diving into this content, it might be helpful to read the following:', 'language': 'en'}, page_content='Optimize tracing spend on LangSmith | 🦜️🛠️ LangSmith'),
 Document(metadata={'source': 'https://docs.smith.langchain.com/tutorials/Administrators/manage_spend', 'title': 'Optimize tracing spend on LangSmith | 🦜️🛠️ LangSmith', 'description': 'Before diving into this content, it might be helpful to read the following:', 'language': 'en'}, page_content='Skip to main contentGo to API DocsSearchRegionUSEUGo to AppQuick startTutorialsAdministratorsOptimize tracing spend on LangSmithDevelopersHow-to guidesConceptsReferencePricingSelf-hostingLangGraph CloudTutorialsAdministratorsOptimize tracing spend on LangSmithOn this pageOptimize tracing spend on LangSmithRecommended ReadingBefore diving into this co

In [15]:
# Converting into the Vector Embeddings
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key=GENERATIVE_AI_API_KEY)

In [18]:
# VectorStore DB
from langchain_community.vectorstores import FAISS
vectorStoreDB = FAISS.from_documents(documents=documents, embedding=embeddings)

In [19]:
query = "LangSmith has two usage limits: total traces and extended"
result = vectorStoreDB.similarity_search(query=query)
result[0].page_content

'That\'s a cost reduction of nearly 75% per day!Optimization 2: limit usage\u200bIn the previous section, we managed data retention settings to optimize existing spend. In this section, we will\nuse usage limits to prevent future overspend.LangSmith has two usage limits: total traces and extended retention traces. These correspond to the two metrics we\'ve\nbeen tracking on our usage graph. We can use these in tandem to have granular control over spend.To set limits, we navigate back to Settings -> Usage and Billing -> Usage configuration. There is a table at the\nbottom of the page that lets you set usage limits per workspace. For each workspace, the two limits appear, along\nwith a cost estimate:Lets start by setting limits on our production usage, since that is where the majority of spend comes from.Setting a good total traces limit\u200bPicking the right "total traces" limit depends on the expected load of traces that you will send to LangSmith. You should'

In [20]:
# Retrieval Chain, Document Chain
from langchain.chains.combine_documents import create_stuff_documents_chain


# Chat prompt template :
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template(
    """
    Answer the following question based only on the provided context:
    <context>
    {context}
    </context>
    """
)


document_chain = create_stuff_documents_chain(llm, prompt)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template='\n    Answer the following question based only on the provided context:\n    <context>\n    {context}\n    </context>\n    '), additional_kwargs={})])
| ChatGoogleGenerativeAI(model='models/gemini-1.5-flash', google_api_key=SecretStr('**********'), client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x7f38942c47f0>, async_client=<google.ai.generativelanguage_v1beta.services.generative_service.async_client.GenerativeServiceAsyncClient object at 0x7f387c86c490>, default_metadata=())
| StrOutputParser(), kwargs={}, config={'run_nam

In [22]:
from langchain_core.documents import Document

document_chain.invoke(
    {
        "input":"LangSmith has two usage limits: total traces and extended",
        "context" : [Document(page_content="LangSmith has two usage limits: total traces and extended retention traces. These correspond to the two metrics we've been tracking on our usage graph. We can use these in tandem to have granular control over spend.")]
    }
)

'LangSmith has two usage limits: total traces and extended retention traces. These limits can be used together to control spending. \n'

However, we want the documents to first come from the retriever that we just set up.
That way, we can use the retriever to dynamically select the most relevant documents and pass those in for a given question.

In [23]:
# input --->>> retriever --->>> vectorstoredb
vectorStoreDB

<langchain_community.vectorstores.faiss.FAISS at 0x7f387645f460>

In [25]:
retriever = vectorStoreDB.as_retriever()
retriever

VectorStoreRetriever(tags=['FAISS', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7f387645f460>, search_kwargs={})

In [27]:
from langchain.chains import create_retrieval_chain
retrieval_chain = create_retrieval_chain(retriever, document_chain)
retrieval_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['FAISS', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7f387645f460>, search_kwargs={}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template='\n    Answer the following question based only on the provided context:\n    <context>\n    {context}\n    </context>\n    '), additional_kwa

In [33]:
# get response from the LLM
response = retrieval_chain.invoke(
    {"input":"what are the two usage limits of LangSmith?"}
)

In [34]:
response 

{'input': 'what are the two usage limits of LangSmith?',
 'context': [Document(metadata={'source': 'https://docs.smith.langchain.com/tutorials/Administrators/manage_spend', 'title': 'Optimize tracing spend on LangSmith | 🦜️🛠️ LangSmith', 'description': 'Before diving into this content, it might be helpful to read the following:', 'language': 'en'}, page_content='limit on usage for each workspace.While this works with our usage pattern, setting good dev and staging limits may vary depending on\nyour use case with LangSmith. For example, if you run evals as part of CI/CD in dev or staging, you may'),
  Document(metadata={'source': 'https://docs.smith.langchain.com/tutorials/Administrators/manage_spend', 'title': 'Optimize tracing spend on LangSmith | 🦜️🛠️ LangSmith', 'description': 'Before diving into this content, it might be helpful to read the following:', 'language': 'en'}, page_content='That\'s a cost reduction of nearly 75% per day!Optimization 2: limit usage\u200bIn the previous s

In [36]:
response['context']

[Document(metadata={'source': 'https://docs.smith.langchain.com/tutorials/Administrators/manage_spend', 'title': 'Optimize tracing spend on LangSmith | 🦜️🛠️ LangSmith', 'description': 'Before diving into this content, it might be helpful to read the following:', 'language': 'en'}, page_content='limit on usage for each workspace.While this works with our usage pattern, setting good dev and staging limits may vary depending on\nyour use case with LangSmith. For example, if you run evals as part of CI/CD in dev or staging, you may'),
 Document(metadata={'source': 'https://docs.smith.langchain.com/tutorials/Administrators/manage_spend', 'title': 'Optimize tracing spend on LangSmith | 🦜️🛠️ LangSmith', 'description': 'Before diving into this content, it might be helpful to read the following:', 'language': 'en'}, page_content='That\'s a cost reduction of nearly 75% per day!Optimization 2: limit usage\u200bIn the previous section, we managed data retention settings to optimize existing spend.

In [35]:
response['answer']

'The provided context discusses how to optimize tracing spend on LangSmith, a platform that likely deals with large language models (LLMs) and their associated data. The text highlights two key optimization strategies:\n\n1. **Manage Data Retention:** This involves adjusting data retention settings to minimize the amount of data stored, thereby reducing costs. It mentions changing retention defaults for new projects and keeping a percentage of traces for extended data retention.\n2. **Limit Usage:** This involves setting limits on the total number of traces and extended retention traces sent to LangSmith. These limits help prevent overspending by capping the amount of data processed. \n\nThe context emphasizes the importance of setting good limits for development and staging environments, as these may have varying usage patterns compared to production environments. It also mentions that some features related to cost optimization may not be available in the Enterprise plan due to its cu