#### Simple Gen AI APP Using Langchain

In [6]:
import os
from dotenv import load_dotenv
from IPython.display import display, Markdown

load_dotenv()

os.environ['OPENAI_API_KEY']=os.getenv("OPENAI_API_KEY")
## Langsmith Tracking
os.environ["LANGCHAIN_API_KEY"]=os.getenv("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_TRACING_V2"]="true"
os.environ["LANGCHAIN_PROJECT"]=os.getenv("LANGCHAIN_PROJECT")

In [4]:
## Data Ingestion--From the website we need to scrape the data
from langchain_community.document_loaders import WebBaseLoader

In [12]:
loader=WebBaseLoader("https://docs.smith.langchain.com/administration/tutorials/manage_spend")
loader

<langchain_community.document_loaders.web_base.WebBaseLoader at 0x10e06cd90>

In [13]:
docs=loader.load()
docs



[Document(metadata={'source': 'https://docs.smith.langchain.com/administration/tutorials/manage_spend', 'title': 'Optimize tracing spend on LangSmith | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith', 'description': 'Before diving into this content, it might be helpful to read the following:', 'language': 'en'}, page_content='\n\n\n\n\nOptimize tracing spend on LangSmith | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith\n\n\n\n\n\n\n\n\nSkip to main contentWe are growing and hiring for multiple roles for LangChain, LangGraph and LangSmith.  Join our team!API ReferenceRESTPythonJS/TSSearchRegionUSEUGo to AppGet StartedObservabilityEvaluationPrompt EngineeringDeployment (LangGraph Platform)AdministrationTutorialsOptimize tracing spend on LangSmithHow-to GuidesSetupConceptual GuideSelf-hostingPricingReferenceCloud architecture and scalabilityAuthz and AuthnAuthentication methodsdata_formatsEvaluationDataset transformationsRegions FAQsdk_referenceChangelogCloud architecture and scalabilityAuthz and AuthnAuthentica

In [63]:
### Load Data--> Docs-->Divide our Docuemnts into chunks dcouments-->text-->vectors-->Vector Embeddings--->Vector Store DB
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter=RecursiveCharacterTextSplitter(chunk_size=5000,chunk_overlap=30)
documents=text_splitter.split_documents(docs)

In [64]:
documents

[Document(metadata={'source': 'https://docs.smith.langchain.com/administration/tutorials/manage_spend', 'title': 'Optimize tracing spend on LangSmith | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith', 'description': 'Before diving into this content, it might be helpful to read the following:', 'language': 'en'}, page_content='Optimize tracing spend on LangSmith | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith\n\n\n\n\n\n\n\n\nSkip to main contentWe are growing and hiring for multiple roles for LangChain, LangGraph and LangSmith.  Join our team!API ReferenceRESTPythonJS/TSSearchRegionUSEUGo to AppGet StartedObservabilityEvaluationPrompt EngineeringDeployment (LangGraph Platform)AdministrationTutorialsOptimize tracing spend on LangSmithHow-to GuidesSetupConceptual GuideSelf-hostingPricingReferenceCloud architecture and scalabilityAuthz and AuthnAuthentication methodsdata_formatsEvaluationDataset transformationsRegions FAQsdk_referenceChangelogCloud architecture and scalabilityAuthz and AuthnAuthentication metho

In [65]:
# from langchain_openai import OpenAIEmbeddings
# embeddings=OpenAIEmbeddings()

from langchain_community.embeddings import OllamaEmbeddings
embeddings = OllamaEmbeddings(model="deepseek-r1:8b")

In [66]:
from langchain_community.vectorstores import FAISS
vectorstoredb=FAISS.from_documents(documents,embeddings)

In [67]:
vectorstoredb

<langchain_community.vectorstores.faiss.FAISS at 0x10fb6b910>

In [68]:
## Query From a vector db
query="LangSmith has two usage limits: total traces and extended retention traces. "
result=vectorstoredb.similarity_search(query)
display(Markdown(result[0].page_content))

Given that the number of total traces per day is equal to the number of extended retention traces per day, it's most likely the
case that this org is using extended data retention tracing everywhere. As such, we start by optimizing our retention settings.
Optimization 1: manage data retention‚Äã
LangSmith charges differently based on a trace's data retention (see our data retention conceptual docs),
where short-lived traces are an order of magnitude less expensive than ones that last for a long time. In this optimization, we will
show how to get optimal settings for data retention without sacrificing historical observability, and
show the effect it has on our bill.
Change org level retention defaults for new projects‚Äã
We navigate to the Usage configuration tab, and look at our organization level retention settings. Modifying this setting affects all new projects that are
created going forward in all workspaces in our org.
noteFor backwards compatibility, older organizations may have this defaulted to Extended. Organizations created after June 3rd
have this defaulted to Base.

Change project level retention defaults‚Äã
Our existing projects have not changed their data retention settings, so we can change these on the individual project pages.
We navigate to Projects -> <your project name>, click the data retention drop down, and modify it to base retention. As
with the organization level setting, this will only affect retention (and pricing) for traces going forward.

Keep around a percentage of traces for extended data retention‚Äã
We may not want all our traces to expire after 14 days if we care about historical debugging. As such, we can take advantage
of LangSmith's built in ability to do server side sampling for extended data retention.
Choosing the right percentage of runs to sample depends on your use case. We will arbitrarily pick 10% of runs here, but will
leave it to the user to find the right value that balances collecting rare events and cost constraints.
LangSmith automatically upgrades the data retention for any trace that matches a run rule in our automations product (see our run rules docs). On the
projects page, click Rules -> Add Rule, and configure the rule as follows:

Run rules match on runs rather than traces. Runs are single units of work within an LLM application's API handling. Traces
are end to end API calls (learn more about tracing concepts in LangSmith). This means a trace can
be thought of as a tree of runs making up an API call. When a run rule matches any run within a trace, the trace's full run tree
upgrades to be retained for 400 days.
Therefore, to make sure we have the proper sampling rate on traces, we take advantage of the
filtering functionality of run rules.
We add add a filter condition to only match the "root" run in the run tree. This is distinct per trace, so our 10% sampling
will upgrade 10% of traces, rather 10% of runs, which could correspond to more than 10% of traces. If desired, we can optionally add
any other filtering conditions required (e.g. specific tags/metadata attached to our traces) for more pointed data retention
extension. For the sake of this tutorial, we will stick with the simplest condition, and leave more advanced filtering as an
exercise to the user.
noteIf you want to keep a subset of traces for longer than 400 days for data collection purposes, you can create another run
rule that sends some runs to a dataset of your choosing. A dataset allows you to store the trace inputs and outputs (e.g., as a key-value dataset),
and will persist indefinitely, even after the trace gets deleted.
See results after 7 days‚Äã
While the total amount of traces per day stayed the same, the extended data retention traces was cut heavily. In the invoice, we can see thatwe've only spent about $900 in the last 7 days, as opposed to $2,000 in the previous 4.
That's a cost reduction of nearly 75% per day!

Optimization 2: limit usage‚Äã
In the previous section, we managed data retention settings to optimize existing spend. In this section, we will
use usage limits to prevent future overspend.
LangSmith has two usage limits: total traces and extended retention traces. These correspond to the two metrics we've
been tracking on our usage graph. We can use these in tandem to have granular control over spend.
To set limits, we navigate back to Settings -> Usage and Billing -> Usage configuration. There is a table at the
bottom of the page that lets you set usage limits per workspace. For each workspace, the two limits appear, along
with a cost estimate:

Lets start by setting limits on our production usage, since that is where the majority of spend comes from.
Setting a good total traces limit‚Äã
Picking the right "total traces" limit depends on the expected load of traces that you will send to LangSmith. You should
clearly think about your assumptions before setting a limit.
For example:

In [69]:
from langchain_community.llms import Ollama  # Replace ChatOpenAI with Ollama

llm = Ollama(
    model="deepseek-r1:8b",  # Replace with your model name (run `ollama list` to check)
    base_url="http://localhost:11434"  # Ollama's default local endpoint
)

In [70]:
## Retrieval Chain, Document chain

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

prompt=ChatPromptTemplate.from_template(
    """
Answer the following question based only on the provided context:
<context>
{context}
</context>


"""
)

document_chain=create_stuff_documents_chain(llm,prompt)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template='\nAnswer the following question based only on the provided context:\n<context>\n{context}\n</context>\n\n\n'), additional_kwargs={})])
| Ollama(model='deepseek-r1:8b')
| StrOutputParser(), kwargs={}, config={'run_name': 'stuff_documents_chain'}, config_factories=[])

In [71]:
from langchain_core.documents import Document
document_chain.invoke({
    "input":"LangSmith has two usage limits: total traces and extended",
    "context":[Document(page_content="LangSmith has two usage limits: total traces and extended traces. These correspond to the two metrics we've been tracking on our usage graph. ")]
})

'<think>\nOkay, let\'s start by looking at the user\'s query. They want me to answer a question based solely on the given context about Langsmith having two usage limits: total traces and extended traces.\n\nFirst, I need to understand what exactly they\'re asking. The question isn\'t provided here, so my response would be incomplete without it. But since there\'s no specific question mentioned in the user\'s message, maybe the user forgot to include one? Wait, looking back at their query—it seems like they just gave the context and said "Answer the following question," but didn\'t state the actual question. That could be a mistake.\n\nHmm, as DeepSeek, I should probably point this out because without the question, there\'s nothing specific to answer with the given context. The user might need help clarifying their query or providing more details. But maybe they expect me to explain something about Langsmith\'s usage limits even without an explicit question?\n\nLet me break down the pr

However, we want the documents to first come from the retriever we just set up. That way, we can use the retriever to dynamically select the most relevant documents and pass those in for a given question.

In [24]:
### Input--->Retriever--->vectorstoredb

vectorstoredb

<langchain_community.vectorstores.faiss.FAISS at 0x23ef4d513f0>

In [72]:
retriever=vectorstoredb.as_retriever()
from langchain.chains import create_retrieval_chain
retrieval_chain=create_retrieval_chain(retriever,document_chain)


In [73]:
retrieval_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x10fb6b910>, search_kwargs={}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template='\nAnswer the following question based only on the provided context:\n<context>\n{context}\n</context>\n\n\n'), additional_kwargs={})])
            | Ollama(

In [74]:
## Get the response form the LLM
response=retrieval_chain.invoke({"input":"LangSmith has two usage limits: total traces and extended"})
response['answer']

"<think>\nOkay, let's tackle this question step by step. The user is asking how much they can save monthly on their LangSmith trace costs after implementing the two optimizations mentioned in the context.\n\nFirst, from the provided text, I understand that the current production workspace generates around 130,000 traces per day, and they expect to double this number in the near future. The initial calculation for the extended data retention limit was based on doubling the daily traces (from 130k to 260k) over a month with about 30 days, resulting in roughly 7.8 million traces monthly.\n\nThe user then wants to keep only 10% of these traces with extended data retention. So, taking 10% of the expected limit: 7.8 million times 0.1 equals 780,000. \n\nNow, looking at the cost impact section in the context, it mentions that by limiting to this 10%, they can cut their monthly spend from around $40k down to about $7.5k because extended data retention is expensive. The key point here seems to 

In [61]:

response

{'input': 'LangSmith has two usage limits: total traces and extended',
 'context': [Document(id='84c8eb2a-06e1-4d53-843c-b96ffd6b523d', metadata={'source': 'https://docs.smith.langchain.com/administration/tutorials/manage_spend', 'title': 'Optimize tracing spend on LangSmith | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith', 'description': 'Before diving into this content, it might be helpful to read the following:', 'language': 'en'}, page_content='Optimize tracing spend on LangSmith | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith'),
  Document(id='5b2050c7-e745-4daf-b4d3-4b98b00491c4', metadata={'source': 'https://docs.smith.langchain.com/administration/tutorials/manage_spend', 'title': 'Optimize tracing spend on LangSmith | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith', 'description': 'Before diving into this content, it might be helpful to read the following:', 'language': 'en'}, page_content="Keep around a percentage of traces for extended data retention‚Äã\nWe may not want all our traces to expire after 14 days 

In [75]:
display(Markdown(response['answer'].split("</think>")[-1].strip()  ))

Based on the provided context:

You can save approximately **$7,500** per month in LangSmith trace costs after implementing Optimization 2 (usage limits) for your production workspace.

This estimate comes from:
1. The expected monthly limit of high-retention traces was calculated as `130,000 * 2 * 30 = 7,800,000` traces.
2. By limiting to only **10%** (`780,000`) of these traces for extended retention, you effectively cut the cost by a factor of ten compared to allowing all traces to use extended data retention.

The context states that extending data retention unnecessarily increases costs significantly because:
- Base tracing is cheaper than extended data retention.
- Even with doubling expected usage (`7.8M traces/month`), limiting extended retention traces reduces spend from ~$40k to ~$7.5k per month in the production environment alone.

**Answer:** You can save about **$7,500 monthly** by setting a 10% limit on high-retention traces for your production workspace.

In [76]:
response['context']

[Document(id='49c46b53-9952-40ea-9e20-3d28c3e28bba', metadata={'source': 'https://docs.smith.langchain.com/administration/tutorials/manage_spend', 'title': 'Optimize tracing spend on LangSmith | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith', 'description': 'Before diving into this content, it might be helpful to read the following:', 'language': 'en'}, page_content='Current Load: Our gen AI application is called between 1.2-1.5 times per second, and each API request has a trace associated with it,\nmeaning we log around 100,000-130,000 traces per day\nExpected Growth in Load: We expect to double in size in the near future.\n\nFrom these assumptions, we can do a quick back-of-the-envelope calculation to get a good limit of:\nlimit = current_load_per_day * expected_growth * days/month      = 130,000 * 2 * 30      = 7,800,000 traces / month\nWe click on the edit icon on the right side of the table for our Prod row, and can enter this limit as follows:\n\nnoteWhen set without the extended data reten