# LangChain

LangCain is software that enables the development of applications based on LLMs. Since all models/providers/inference servers have a slightly it different input/ouput format. Langchain builds a set of unified APIs. The same or nearly the same code can be used to build systems with different models and their hosting principles.

There is a set of packages that implement a typical integrations of the langchain. The following table lists the most typical of them.

| Package Name | Description |
| :--- | :--- |
| `langchain-community` | A general package for a wide variety of community-contributed tools and integrations, including web search (DuckDuckGo, Tavily), Python REPL, and various database connectors. |
| `langchain-core` | The foundational package with the core tool abstractions and base classes. |
| `langchain-experimental` | A package for new and experimental tools, which may not yet be stable. |
| `langchain-tavily` | A dedicated package for the Tavily search tool. |
| `langchain-brave-search` | A dedicated package for the Brave Search tool. |
| `langchain-google-genai` | Includes tools for interacting with Google's Generative AI services. |
| `langchain-anthropic` | Includes tools for interacting with the Anthropic API. |
| `langchain-openai` | Contains integrations for OpenAI's models and services. |
| `langchain-mongodb` | A package for interacting with MongoDB. |
| `langchain-postgres` | A package for interacting with PostgreSQL. |
| `langchain-ollama` | A package implements tools to request models lanched with ollama. |
| `langchain-huggingface` | A package implements tools to interact with models from hugging face. |

## Setup

Typically, you must export the API key corresponding to the model type you want to use from your environment.

---

The following cell shows the code that will only work if the "GOOGLE_API_KEY" variable exists in your environment with the corresponding value.

In [4]:
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")
ans = llm.invoke("Test request")
print(ans.content)

Please provide me with the details of your test request.  I need information such as:

* **What kind of test are you requesting?** (e.g., a unit test, an integration test, a performance test, a stress test, a usability test, a grammar test, a logic test, a factual accuracy test, etc.)
* **What is the subject of the test?** (e.g., a piece of code, a website, a document, a sentence, an argument, etc.)
* **What are the inputs or data for the test?** (If applicable)
* **What are the expected outputs or results?** (If applicable)
* **What are the acceptance criteria?** (How will you know if the test passed or failed?)

The more information you give me, the better I can assist you.


## Text splitters

Langchain contains module `text_splitter` which contains implementations of approaches to split texts into pieces. Can be usefull, for example for chunking in RAG pipeline.

The following table shows the awailable text splitters.

| Class Name | Description | Common Use Case |
| :--- | :--- | :--- |
| `CharacterTextSplitter` | Splits text based on a specified character (e.g., `\n`, ` `). | Simple, quick splitting where structural integrity is not a major concern. |
| `RecursiveCharacterTextSplitter` | The recommended default. Splits text based on a list of characters in a hierarchical order (e.g., `["\n\n", "\n", " "]`) to maintain logical chunks. | General-purpose text, such as articles, essays, and unstructured documents. |
| `TokenTextSplitter` | Splits text based on the number of tokens, using a specific tokenizer (e.g., `tiktoken` for OpenAI models). | Preparing text to fit within a specific LLM's context window. |
| `HTMLHeaderTextSplitter` | Splits HTML documents based on specified header tags (`h1`, `h2`, etc.). | Processing HTML content where you want to preserve sections defined by headers. |
| `MarkdownTextSplitter` | Splits Markdown documents based on Markdown syntax, such as headers and code blocks. | Processing Markdown files while keeping logical sections together. |
| `SentenceTransformersTokenTextSplitter` | Splits text using a tokenizer from the `sentence-transformers` library, based on a token count. | Working with models from the `sentence-transformers` library. |
| `NLTKTextSplitter` | Splits text into sentences using the `NLTK` library's sentence tokenizer. | Splitting a document into individual sentences for fine-grained processing. |
| `SpacyTextSplitter` | Splits text into sentences using the `spaCy` library. | Similar to NLTK, but leverages `spaCy` for sentence boundary detection, which can be more robust for some languages. |
| `SemanticChunker` | A more advanced splitter that uses an embedding model to identify semantic breakpoints (topic shifts) in the text. | Creating semantically coherent chunks for more effective retrieval-augmented generation. |
| `Language-specific Code Splitters` | A family of splitters for various programming languages (e.g., `PythonCodeTextSplitter`, `JavaScriptCodeTextSplitter`). | Processing code files to keep functions, classes, and other logical blocks intact. |

---

After initializing the splitter, use the `sprit_text` method to split the given text. The following cell demonstrates the application of the recursive text splitter to a given text.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

data = """M r. and Mrs. Dursley, of number four, Privet Drive, were proud to say
that they were perfectly normal, thank you very much. They were the last people
you’d expect to be involved in anything strange or mysterious, because they just
didn’t hold with such nonsense.

Mr. Dursley was the director of a firm called Grunnings, which made
drills. He was a big, beefy man with hardly any neck, although he did have a
very large mustache. Mrs. Dursley was thin and blonde and had nearly twice the
usual amount of neck, which came in very useful as she spent so much of her
time craning over garden fences, spying on the neighbors. The Dursleys had a
small son called Dudley and in their opinion there was no finer boy anywhere.

The Dursleys had everything they wanted, but they also had a secret, and
their greatest fear was that somebody would discover it. They didn’t think they
could bear it if anyone found out about the Potters. Mrs. Potter was Mrs.
Dursley’s sister, but they hadn’t met for several years; in fact, Mrs. Dursley
pretended she didn’t have a sister, because her sister and her good-for-nothing
husband were as unDursleyish as it was possible to be. The Dursleys shuddered
to think what the neighbors would say if the Potters arrived in the street. The
Dursleys knew that the Potters had a small son, too, but they had never even
seen him. This boy was another good reason for keeping the Potters away; they
didn’t want Dudley mixing with a child like that."""

out = RecursiveCharacterTextSplitter(
    chunk_size=500, chunk_overlap=20
).split_text(data)

for t in out:
    print(t, end="\n\n")

M r. and Mrs. Dursley, of number four, Privet Drive, were proud to say
that they were perfectly normal, thank you very much. They were the last people
you’d expect to be involved in anything strange or mysterious, because they just
didn’t hold with such nonsense.

Mr. Dursley was the director of a firm called Grunnings, which made
drills. He was a big, beefy man with hardly any neck, although he did have a
very large mustache. Mrs. Dursley was thin and blonde and had nearly twice the
usual amount of neck, which came in very useful as she spent so much of her
time craning over garden fences, spying on the neighbors. The Dursleys had a
small son called Dudley and in their opinion there was no finer boy anywhere.

The Dursleys had everything they wanted, but they also had a secret, and
their greatest fear was that somebody would discover it. They didn’t think they
could bear it if anyone found out about the Potters. Mrs. Potter was Mrs.
Dursley’s sister, but they hadn’t met for several ye

## Embeddings

LangChain porvides interfaces for interacting with embedding models. The core class here is `langchain_core.embeddings.Embeddings`, the api reference [here](https://api.python.langchain.com/en/latest/embeddings/langchain_core.embeddings.Embeddings.html).

The following table shows the classes that implement the different embeddings model interfaces.

| Class Name | Package |
| :--- | :--- |
| **`Embeddings`** | `langchain_core.embeddings` |
| **`OpenAIEmbeddings`** | `langchain_openai` |
| **`AzureOpenAIEmbeddings`** | `langchain_openai` |
| **`HuggingFaceEmbeddings`** | `langchain_community.embeddings.huggingface` |
| **`GoogleGenerativeAIEmbeddings`** | `langchain_google_genai` |
| **`GoogleVertexAIEmbeddings`** | `langchain_google_vertexai` |
| **`CohereEmbeddings`** | `langchain_cohere` |
| **`OllamaEmbeddings`** | `langchain_ollama` |
| **`VoyageEmbeddings`** | `langchain_voyageai` |
| **`JinaEmbeddings`** | `langchain_community.embeddings.jina` |
| **`FakeEmbeddings`** | `langchain_core.embeddings.fake` |

---

Consider the following example that uses `OllamaEmbeddings`. It uses ollama as inference of the model, so ollama is supposed to be available.

In [None]:
from langchain_ollama import OllamaEmbeddings
embedder = OllamaEmtbeddings(model="all-minilm")

Use the `embed_documents` method to obtain the embeddings for a givel list of strings.

In [23]:
embeddings = embedder.embed_documents(
    ["Test embeddings", "some more complex text"]
)
type(embeddings)

list

An embedding is provided for each of the given documents.

In [24]:
len(embeddings)

2

And a dimentionality of embeddings depemends on the model used.

In [25]:
len(embeddings[0])

384

## Vector stores

Langchain integrates with various vector stores. The following table shows a few of them:

| Class name                            | Package                                                                     |
| ------------------------------------- | --------------------------------------------------------------------------- |
| `InMemoryVectorStore`                 | `langchain-core.vectorstores`                                               |
| `FAISS`                               | `langchain_community.vectorstores.faiss`                                    |
| `PGVector`                            | `langchain-postgres` (`langchain.vectorstores.pgvector`)                    |
| `ElasticsearchStore`                  | `langchain-elasticsearch` (`langchain.vectorstores.elasticsearch`)          |
| `AzureCosmosDBMongoVCoreVectorSearch` | `langchain-azure-ai` (`langchain.vectorstores.azure_cosmos_db_mongo_vcore`) |
| `AzureCosmosDBNoSqlVectorSearch`      | `langchain-azure-ai` (`langchain.vectorstores.azure_cosmos_db_no_sql`)      |
| `AzureSearch`                         | `langchain-azure-ai` (`langchain.vectorstores.azuresearch`)                 |
| `SQLServer_VectorStore`               | `langchain-sqlserver` (`langchain.vectorstores.sqlserver`)                  |

Check more details in the page [vector stores](https://python.langchain.com/docs/integrations/vectorstores/) of the official documentation.

---

Consider the simpliest launch option option `InMemoryVectorStore`, for basic opeartions.

In order to initialize the corresponding object, you must first create the embedding object. In this case, we will use `OllamaEmbeddings`, so you're supposed to launch Ollama locally first.

In [4]:
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_ollama import OllamaEmbeddings
from langchain_core.documents.base import Document
vector_store = InMemoryVectorStore(OllamaEmbeddings(model="all-minilm"))

Use the `add_documents` method to add items to the vector storage. This method takes a list of documents.

In [5]:
documents = [
    Document(s) for s in [
        "This is dog",
        "This is cat.",
        "My car was crased"
    ]
]

vector_store.add_documents(documents=documents)

['0ab9b5d0-8063-49fa-9b96-3bf11d710ef4',
 'ad1fa14b-0c3d-4238-bef2-aca75162542a',
 '84c794bf-106b-48e2-9221-fce9f50dfead']

The `similarity_search` method locates documents that are similar to the provided text. The following cells show some outputs for selected examles to make the outputs easier to interpret. 

In [6]:
vector_store.similarity_search("This is cow")

[Document(id='0ab9b5d0-8063-49fa-9b96-3bf11d710ef4', metadata={}, page_content='This is dog'),
 Document(id='ad1fa14b-0c3d-4238-bef2-aca75162542a', metadata={}, page_content='This is cat.'),
 Document(id='84c794bf-106b-48e2-9221-fce9f50dfead', metadata={}, page_content='My car was crased')]

In [7]:
vector_store.similarity_search("Accidents sometimes happens")

[Document(id='84c794bf-106b-48e2-9221-fce9f50dfead', metadata={}, page_content='My car was crased'),
 Document(id='0ab9b5d0-8063-49fa-9b96-3bf11d710ef4', metadata={}, page_content='This is dog'),
 Document(id='ad1fa14b-0c3d-4238-bef2-aca75162542a', metadata={}, page_content='This is cat.')]

## LLMs

That's an excellent point. It's crucial to know the specific package needed for each integration to correctly set up the environment. LLM is a core component of the any agent based system the following cell shows the classes that implement different LLMs into the LangChain module.

| Provider | Type | LangChain Class Name (Python) | Python Package |
|---|---|---|---|
| OpenAI | Commercial API | `langchain_openai.chat_models.ChatOpenAI`, `langchain_openai.llms.OpenAI` | `langchain-openai` |
| Google | Commercial API | `langchain_google_genai.chat_models.ChatGoogleGenerativeAI` | `langchain-google-genai` |
| Anthropic | Commercial API | `langchain_anthropic.chat_models.ChatAnthropic`, `langchain_anthropic.llms.AnthropicLLM` | `langchain-anthropic` |
| Mistral AI | Commercial API | `langchain_mistralai.chat_models.ChatMistralAI` | `langchain-mistralai` |
| Cohere | Commercial API | `langchain_cohere.chat_models.ChatCohere`, `langchain_cohere.llms.CohereLLM` | `langchain-cohere` |
| AWS | Cloud Platform | `langchain_aws.chat_models.ChatBedrock`, `langchain_aws.llms.BedrockLLM` | `langchain-aws` |
| Hugging Face | Community/Open-Source | `langchain_huggingface.llms.HuggingFaceHub`, `langchain_huggingface.llms.HuggingFacePipeline` | `langchain-huggingface` |
| Ollama | On-Premise/Local | `langchain_ollama.ChatOllama`, `langchain_community.llms.OllamaLLM` | `langchain-ollama` |
| Llama.cpp | On-Premise/Local | `langchain_community.llms.LlamaCpp` | `llama-cpp-python` |
| Replicate | Commercial API | `langchain_replicate.llms.Replicate` | `langchain-replicate` |
| Fireworks AI | Commercial API | `langchain_fireworks.chat_models.ChatFireworks`, `langchain_fireworks.llms.FireworksLLM` | `langchain-fireworks` |
| Databricks | Cloud Platform | `databricks_langchain.llms.Databricks` | `databricks-langchain` |
| Azure OpenAI | Commercial API | `langchain_openai.chat_models.AzureChatOpenAI`, `langchain_openai.llms.AzureOpenAI` | `langchain-openai` |
| AI21 Labs | Commercial API | `langchain_ai21.llms.AI21LLM`, `langchain_ai21.chat_models.ChatAI21` | `langchain-ai21` |
| Aleph Alpha | Commercial API | `langchain_community.llms.AlephAlpha` | `langchain-aleph-alpha` |
| Groq | Commercial API | `langchain_groq.chat_models.ChatGroq` | `langchain-groq` |
| Together AI | Commercial API | `langchain_together.llms.TogetherLLM`, `langchain_together.chat_models.ChatTogether` | `langchain-together` |
| IBM | Cloud Platform | `langchain_community.chat_models.ChatWatsonx` | `langchain-ibm` |
| DeepInfra | Commercial API | `langchain_deepinfra.llms.DeepInfra` | `langchain-deepinfra` |
| Yandex | Commercial API | `langchain_community.llms.YandexGPT`, `langchain_community.llms.YandexGPTPredictor` | `langchain-yandex` |

---

Consider, for example, using Ollama in the LangChain framework. For the following examples to run, ollama must be awailable on your local host.

In [6]:
from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3.1:latest")
ans = llm.invoke("What is the capital of France?")
print(ans.content)

The capital of France is Paris.


## Messages

There are several classes that represent different aspects of prompting with LangChain.

| Class Name    | Role           | General Description                                                                                                                                      |
|---------------|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|
| SystemMessage | System         | Provides instructions or context to "prime" the model's behavior. It sets the persona, tone, or rules for the entire conversation. Typically the first message in a list. |
| HumanMessage  | Human          | Represents the user's input. This is the message that a human sends to the model to ask a question or provide a command.                                  |
| AIMessage     | AI (Assistant) | Represents the response from the language model. This is the output you get after invoking a model. It can contain text, tool calls, or other data.       |
| ToolMessage   | Tool           | Represents the output or result of a tool function that was invoked by the AI. This is used to pass the outcome of a tool call back to the model for further processing. |

The primary design of LangChain is to pass a list of objects to the model. It returns an output of type `AIMessage`.

All LangChain messages are children of the `langchain_core.messages.BaseMessage` class. The  follwing cell shows the relationship:

In [15]:
from langchain_core.messages import (
    HumanMessage,
    SystemMessage,
    AIMessage,
    ToolMessage,
    BaseMessage
)

(
    issubclass(HumanMessage, BaseMessage),
    issubclass(SystemMessage, BaseMessage),
    issubclass(AIMessage, BaseMessage),
    issubclass(ToolMessage, BaseMessage)
)

(True, True, True, True)

## Prompts

In the LangChain paradigm, a prompt is a structured input for a model. It can include a system message, user input, or messaging history. The `lang_chain` package provides various tools for prompt templating. The following cell lists the most popular classes used for templating and their descriptions.

| Class / Function                        | Description                                                                 |
|----------------------------------------|-----------------------------------------------------------------------------|
| **BasePromptTemplate**                 | Abstract base class for all prompt templates.                               |
| **StringPromptTemplate**               | Base class for string-based templates (like f-string).                      |
| **PromptTemplate**                     | Core template class for generating prompts with variables. Supports methods like `from_template`, `from_file`, `from_examples`, `format`, `invoke`, `ainvoke`, and batching. |
| **FewShotPromptTemplate**              | String-based prompt template with few-shot example support.                 |
| **FewShotPromptWithTemplates**         | String template variant with embedded few-shot examples.                    |
| **PipelinePromptTemplate**             | Combines multiple prompt templates into a pipeline.                         |
| **BaseChatPromptTemplate**             | Base class for chat-style prompt templates.                                 |
| **ChatPromptTemplate**                 | Template for chat models; build multi-role messages. Supports `from_messages` and dynamic placeholders. |
| **AgentScratchPadChatPromptTemplate**  | Specialized chat prompt for agent scratchpad patterns.                      |
| **AutoGPTPrompt**                      | Chat prompt variant used in AutoGPT-style workflows.                        |
| **BaseMessagePromptTemplate**          | Base for message-level prompt templates.                                    |
| **BaseStringMessagePromptTemplate**    | Base class for message templates using string patterns.                     |
| **ChatMessagePromptTemplate**          | Generates chat messages (with roles, e.g. system/human/AI) from template strings. |
| **HumanMessagePromptTemplate**         | Template specifically for human messages.                                   |
| **AIMessagePromptTemplate**            | Template specifically for AI messages.                                      |
| **SystemMessagePromptTemplate**        | Template specifically for system messages.                                  |
| **MessagesPlaceholder**                | Placeholder to inject dynamic message history into a chat template.         |

---

Consider the `PromptTemplate` class. You can use the `from_template` method to create a template. A substitutable pattern is specified by the `{}`. The `format` method of the `PromptTempalate` class returns a string with all substituted values.

In [23]:
from langchain.prompts import PromptTemplate

ans = PromptTemplate.from_template("Your input is: {here}")
print(type(ans))
ans.format(here="Hello!")

<class 'langchain_core.prompts.prompt.PromptTemplate'>


'Your input is: Hello!'