In [8]:
!pip list

Package                  Version
------------------------ ---------
aiohttp                  3.9.5
aiosignal                1.3.1
annotated-types          0.7.0
anyio                    4.4.0
appnope                  0.1.4
asttokens                2.4.1
async-timeout            4.0.3
attrs                    23.2.0
Brotli                   1.0.9
certifi                  2024.6.2
charset-normalizer       3.3.2
comm                     0.2.2
dataclasses-json         0.6.6
debugpy                  1.6.7
decorator                5.1.1
distro                   1.9.0
exceptiongroup           1.2.0
executing                2.0.1
frozenlist               1.4.0
greenlet                 3.0.1
h11                      0.14.0
httpcore                 1.0.5
httpx                    0.27.0
idna                     3.7
importlib_metadata       7.1.0
ipykernel                6.29.4
ipython                  8.25.0
jedi                     0.19.1
jsonpatch                1.33
jsonpointer              2.

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI()

In [2]:
llm.invoke("how can langsmith help with testing?")

AIMessage(content='Langsmith can help with testing by providing automated testing tools and frameworks that can be used to test code written in various programming languages. These tools can help developers perform unit tests, integration tests, and end-to-end tests to ensure that their code functions correctly and meets the requirements. Langsmith can also provide resources and guidance on best practices for testing, as well as support for continuous integration and deployment processes to streamline the testing workflow. Additionally, Langsmith can assist in setting up and maintaining testing environments and infrastructure to support the testing process.', response_metadata={'token_usage': {'completion_tokens': 106, 'prompt_tokens': 15, 'total_tokens': 121}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-39933902-c75e-42bf-aa6f-6f01211bcace-0', usage_metadata={'input_tokens': 15, 'output_tokens': 106, 'total_tokens': 12

In [4]:
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a world class technical documentation writer."),
    ("user", "{input}")
])
chain = prompt | llm 
chain.invoke({"input": "how can langsmith help with testing?"})

AIMessage(content="Langsmith can help with testing by providing a platform for automated testing of software applications. The tool allows users to create and run test cases, generate test data, and analyze test results. Langsmith's automation capabilities can save time and effort in testing, ensure more consistent and thorough testing coverage, and help identify issues early in the development process. Additionally, Langsmith can integrate with popular testing frameworks and tools, making it easier to incorporate automated testing into the software development workflow.", response_metadata={'token_usage': {'completion_tokens': 94, 'prompt_tokens': 28, 'total_tokens': 122}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-cbffe71a-7b08-477a-8f24-7b237ebf528f-0', usage_metadata={'input_tokens': 28, 'output_tokens': 94, 'total_tokens': 122})

The output of a ChatModel (and therefore, of this chain) is a message. However, it's often much more convenient to work with strings. Let's add a simple output parser to convert the chat message to a string.

In [5]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()
chain = prompt | llm | output_parser
chain.invoke({"input": "how can langsmith help with testing?"})


'Langsmith can help with testing in several ways:\n\n1. Automated Testing: Langsmith can be used to generate test data for automated testing. By creating realistic and diverse test data sets, Langsmith can help ensure comprehensive test coverage and identify potential edge cases.\n\n2. Performance Testing: Langsmith can be used to generate large volumes of data to simulate real-world scenarios for performance testing. By providing a variety of data types and structures, Langsmith can help identify performance bottlenecks and optimize system performance.\n\n3. Load Testing: Langsmith can generate synthetic load on a system by creating multiple instances of data sets. This can help evaluate system scalability, response times, and overall performance under heavy loads.\n\n4. Data Validation: Langsmith can be used to validate the accuracy and integrity of data by generating test data sets that cover a wide range of input variations. This can help identify data inconsistencies, errors, and 

Retrieval Chain

To properly answer the original question ("how can langsmith help with testing?"), we need to provide additional context to the LLM. We can do this via retrieval. Retrieval is useful when you have too much data to pass to the LLM directly. You can then use a retriever to fetch only the most relevant pieces and pass those in.

In this process, we will look up relevant documents from a Retriever and then pass them into the prompt. A Retriever can be backed by anything - a SQL table, the internet, etc - but in this instance we will populate a vector store and use that as a retriever. For more information on vectorstores, see this documentation.

First, we need to load the data that we want to index. To do this, we will use the WebBaseLoader. This requires installing BeautifulSoup:

In [9]:
!pip install beautifulsoup4

Collecting beautifulsoup4
  Using cached beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting soupsieve>1.2 (from beautifulsoup4)
  Using cached soupsieve-2.5-py3-none-any.whl.metadata (4.7 kB)
Using cached beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
Using cached soupsieve-2.5-py3-none-any.whl (36 kB)
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.12.3 soupsieve-2.5


In [10]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide")

docs = loader.load()

In [11]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

In [13]:
!pip install faiss-cpu

Collecting faiss-cpu
  Using cached faiss_cpu-1.8.0-cp312-cp312-macosx_10_14_x86_64.whl.metadata (3.6 kB)
Using cached faiss_cpu-1.8.0-cp312-cp312-macosx_10_14_x86_64.whl (7.4 MB)
Installing collected packages: faiss-cpu
Successfully installed faiss-cpu-1.8.0


In [14]:
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter


text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents, embeddings)

In [15]:
from langchain.chains.combine_documents import create_stuff_documents_chain

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

document_chain = create_stuff_documents_chain(llm, prompt)

In [16]:
from langchain_core.documents import Document

document_chain.invoke({
    "input": "how can langsmith help with testing?",
    "context": [Document(page_content="langsmith can let you visualize test results")]
})

'Langsmith can help with testing by allowing you to visualize test results.'

In [17]:
from langchain.chains import create_retrieval_chain

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [18]:
response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"})
print(response["answer"])

# LangSmith offers several features that can help with testing:...

LangSmith can help with testing by allowing developers to create datasets, run tests on LLM applications, upload test cases in bulk or create them on the fly, export test cases from application traces, run custom evaluations, compare results for different configurations, provide a playground environment for rapid iteration and experimentation, collect feedback from users, annotate traces, add runs to datasets, monitor key metrics over time, and perform automations on traces in near real-time.
