In [24]:
##Simple gen AI app using langchain
##Load Data -->Docs-->Devide text to cunks -->text -->vectors embedding-->vectors-->vector stor db-->
import os
from dotenv import load_dotenv
#Data Ingetion from website
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.documents import Document
from langchain.chains import create_retrieval_chain

load_dotenv()

loader = WebBaseLoader("https://docs.smith.langchain.com/administration/tutorials/manage_spend")
documents = loader.load()

textSplitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=100)
doctsInChunks = textSplitter.split_documents(documents)

open_api_key = os.getenv(key="OPEN_API_KEY2")

embeddings = OpenAIEmbeddings(model="text-embedding-3-large",api_key=open_api_key,dimensions=1024)

#Saving and querying from vector DB
vectorStoreDb = FAISS.from_documents(doctsInChunks,embeddings)
query = "LangSmith has two usage limits: total traces and extended retention traces"
result = vectorStoreDb.similarity_search(query)

#Retrieval/Document Chain
prompt = ChatPromptTemplate.from_template(
    """ 
     Answer the following context based only on the provided context:
     <context>
     {context}
     </context> 
    """
)

llm = ChatOpenAI(model="gpt-4o",api_key=open_api_key)
document_chain = create_stuff_documents_chain(llm,prompt)
document_chain.invoke({
    "input":"LangSmith has two usage limits: total traces and extended",
       "context":[Document(page_content="LangSmith has two usage limits: total traces and extended traces. These correspond to the two metrics we ve been tracking on our usage graph.")]
    })

##Retriever -> Way of getting the data from vector DB.
retriever = vectorStoreDb.as_retriever()

retrival_chain=create_retrieval_chain(retriever,document_chain)

#Get the response form LLM
response = retrival_chain.invoke({"input":"LangSmith has two usage limits: total traces and extended retention traces"})
print(response)


Based on the provided context, here are the key points related to setting limits on LangSmith production usage:

1. **Setting Production Limits**: 
   - The total traces limit should be based on the expected load of traces sent to LangSmith. 
   - Example: If the gen AI application is called 1.2-1.5 times per second, resulting in 100,000-130,000 traces per day, and expecting to double in size, a suitable monthly limit would be 7,800,000 traces (i.e., 130,000 traces/day * 2 * 30 days).

2. **Cost Management**:
   - By setting appropriate limits, the maximum cost can be reduced significantly (e.g., from ~$40k to ~$7.5k per month) by avoiding expensive data retention upgrades.

3. **Extended Data Retention**:
   - Extended data retention can affect other features if the limit is reached. It’s important to understand its functionality before using it.

4. **Dev/Staging Environment Limits**:
   - Dev and staging environments should have limits set at 10% of the production limit, though this