LangChain supports several completely free-to-use LLMs (Large Language Models) that you can use without any API costs. Here are some options:

### **1. Local Open-Source LLMs (Run on Your Own Hardware)**
These models are free to use but require you to download and run them locally (or on a free cloud instance like Google Colab). LangChain integrates with many via `HuggingFacePipeline` or `Ollama`.

#### **Popular Free Models:**
- **Mistral 7B / Mistral 7B Instruct** (Small but powerful)
- **Llama 2 (7B, 13B, 70B)** (Meta’s open-weight model, requires approval but free)
- **Zephyr 7B** (Fine-tuned Mistral for chat)
- **Gemma (2B/7B)** (Google’s lightweight open model)
- **Phi-2 (2.7B)** (Microsoft’s small but capable model)

#### **How to Use Them in LangChain:**
- Via **Ollama** (easy local setup):
  ```python
  from langchain_community.llms import Ollama
  llm = Ollama(model="mistral")  # or "llama2", "zephyr", etc.
  ```
- Via **HuggingFace Pipeline** (requires GPU):
  ```python
  from langchain_community.llms import HuggingFacePipeline
  from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

  model_name = "mistralai/Mistral-7B-Instruct-v0.1"
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  model = AutoModelForCausalLM.from_pretrained(model_name)
  pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
  llm = HuggingFacePipeline(pipeline=pipe)
  ```

### **2. Free API-based LLMs (No Local Setup)**
Some APIs offer limited free tiers, but truly free ones are rare. A few options:
- **Ollama API** (if you host locally)
- **HuggingFace Inference API** (free for small models)
- **LocalAI** (self-hosted OpenAI-compatible API)

### **Best Choice?**
- If you have a decent GPU, run **Mistral 7B** or **Zephyr** locally via Ollama.
- If you need a free API, try **HuggingFace’s free tier** for small models.

Would you like help setting one up? 🚀

In [17]:
from langchain_ollama import OllamaLLM,OllamaEmbeddings

In [3]:
llm = Ollama(model="mistral")

In [4]:
# Generate text
response = llm.invoke("Tell me a joke about AI.")
print(response)

 Why don't we trust AI with our secrets? Because it keeps everything in the cloud! (Cloud storage joke)

Or this one: Why did the AI go to therapy? Because it had too many issues with its neural network! (Therapy joke)


# How to use a vectorstore as a retriever

> - A vector store retriever is a retriever that uses a vector store to retrieve documents. It is a lightweight wrapper around the vector store class to make it conform to the retriever interface. It uses the search methods implemented by a vector store, like similarity search and MMR, to query the texts in the vector store.

## Creating a retriever from a vectorstore

You can build a retriever from a vectorstore using its .as_retriever method. Let's walk through an example.

First we instantiate a vectorstore. We will use an in-memory FAISS vectorstore

In [2]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_ollama import OllamaLLM,OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

In [7]:
loader = TextLoader("state_of_the_union.txt",encoding="utf-8")

In [15]:
documents = loader.load()

In [18]:
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

In [19]:
embeddings = OllamaEmbeddings(model="mistral")

In [20]:
vectorstore = FAISS.from_documents(texts, embeddings)

- We can then instantiate a retriever:

In [21]:
retriever = vectorstore.as_retriever()

- This creates a retriever (specifically a VectorStoreRetriever), which we can use in the usual way:

In [22]:
docs = retriever.invoke("what did the president say about ketanji brown jackson?")

In [24]:
len(docs)

4

In [25]:
docs[0]

Document(id='0e5a1922-9689-4efb-9020-d85914f23ec2', metadata={'source': 'state_of_the_union.txt'}, page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.')

## Maximum marginal relevance retrieval
By default, the vector store retriever uses similarity search. If the underlying vector store supports maximum marginal relevance search, you can specify that as the search type.

This effectively specifies what method on the underlying vectorstore is used (e.g., `similarity_search`, `max_marginal_relevance_search`, etc.).

In [26]:
retriever = vectorstore.as_retriever(search_type="mmr")

In [27]:
docs = retriever.invoke("what did the president say about ketanji brown jackson?")

In [28]:
docs[0]

Document(id='0e5a1922-9689-4efb-9020-d85914f23ec2', metadata={'source': 'state_of_the_union.txt'}, page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.')

## Passing search parameters

We can pass parameters to the underlying vectorstore's search methods using `search_kwargs`.

### Similarity score threshold retrieval

For example, we can set a similarity score threshold and only return documents with a score above that threshold.

In [29]:
retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.5}
)

In [30]:
docs = retriever.invoke("what did the president say about ketanji brown jackson?")

No relevant docs were retrieved using the relevance score threshold 0.5


### Specifying top k

We can also limit the number of documents `k` returned by the retriever.

In [31]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

In [32]:
docs = retriever.invoke("what did the president say about ketanji brown jackson?")
len(docs)

1

In [33]:
docs[0]

Document(id='0e5a1922-9689-4efb-9020-d85914f23ec2', metadata={'source': 'state_of_the_union.txt'}, page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.')

___________________________________________

#### **Understanding the MultiQueryRetriever**  

When searching a vector database (a system that stores data as numerical vectors), the system compares how "close" your search query is to stored documents using mathematical distance measurements. However, small changes in how you phrase your question can lead to very different results. Additionally, if the way the data is converted into vectors (embeddings) doesn’t fully capture the meaning, the search might miss relevant information.  

To fix this, people often tweak their search prompts manually—but this can be time-consuming.  

#### **How the MultiQueryRetriever Helps**  

Instead of manually adjusting queries, the **MultiQueryRetriever** uses a large language model (LLM) to automatically generate **multiple versions** of your original question from different angles. For example:  
- If you ask, *"How does photosynthesis work?"*, the retriever might also generate:  
  - *"Explain the process of photosynthesis in plants."*  
  - *"What are the steps involved in converting sunlight into energy in plants?"*  

It then searches the database for each variation and combines the results, ensuring a broader and more accurate set of documents.  

#### **Why This Works Better**  
- Reduces dependency on a single phrasing of the question.  
- Captures related concepts that a basic search might miss.  
- Saves time compared to manual prompt engineering.  

#### **Example Setup**  
Let’s create a vector database using a blog post (like Lilian Weng’s *"LLM Powered Autonomous Agents"*) and apply the MultiQueryRetriever to improve search results.  

In [46]:
# Build a sample vectorDB
import os
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [38]:
# Set USER_AGENT to avoid warnings
os.environ["USER_AGENT"] = "LangChain-RAG-App/1.0"

In [55]:
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/",
    header_template={"User-Agent": os.environ["USER_AGENT"]}
)


In [56]:
data = loader.load()

In [57]:
# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)

In [58]:
vectordb = Chroma.from_documents(documents=splits, embedding=embeddings)

- Specify the LLM to use for query generation, and the retriever will do the rest

In [59]:
llm = OllamaLLM(model="mistral")

In [60]:
question = "What are the approaches to Task Decomposition?"
retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever=vectordb.as_retriever(), llm=llm
)

In [61]:
# Set logging for the queries
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

In [62]:
unique_docs = retriever_from_llm.invoke(question)
len(unique_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. List the methods for breaking down complex tasks into smaller, manageable parts.', '2. Can you find documents detailing strategies for dividing a large task into multiple subtasks?', "3. I'm looking for information on techniques used to decompose a complex problem into smaller, solvable pieces."]


10

### **Customizing the MultiQueryRetriever's Prompt**  

The **MultiQueryRetriever** automatically generates different versions of your search query to improve retrieval results. By default, it uses a predefined prompt, but you can customize it for better control.  

#### **How Query Generation Works**  
- The retriever uses an **LLM** to create multiple variations of your input question.  
- These generated queries are logged at the **INFO** level (useful for debugging).  
- Each query retrieves documents, and the final result combines all unique matches.  

#### **Customizing the Prompt**  
If you want to change how queries are generated:  

1. **Create a Custom PromptTemplate**  
   - Define a `PromptTemplate` with an input variable (e.g., `{question}`) for the original query.  
   - Example:  
     ```python
     from langchain.prompts import PromptTemplate

     custom_prompt = PromptTemplate(
         input_variables=["question"],
         template="""Generate 3 different versions of the following question to improve retrieval:
         Original: {question}
         Variations:"""
     )
     ```

2. **Use an Output Parser**  
   - The LLM’s response must be split into a list of queries.  
   - Example parser (splits by newlines or numbering):  
     ```python
     from langchain.schema.output_parser import StrOutputParser

     def parse_queries(output):
         return [q.strip() for q in output.split("\n") if q.strip()]

     # Or use a more advanced parser if needed
     ```

3. **Pass Both to the Retriever**  
   - Configure the `MultiQueryRetriever` with your custom prompt and parser.  

#### **Why Customize?**  
- **Control query style** (e.g., force technical vs. simple language).  
- **Adjust for domain-specific needs** (e.g., legal vs. casual searches).  
- **Debugging** (logged queries help fine-tune prompts).  

In [63]:
from typing import List

from langchain_core.output_parsers import BaseOutputParser
from langchain_core.prompts import PromptTemplate
from pydantic import BaseModel, Field



- **`BaseOutputParser`**: Base class for parsing structured output from LLMs.  
- **`PromptTemplate`**: Used to define reusable prompt structures with variables.  
- **`pydantic.BaseModel` & `Field`**: Used for data validation (though not directly used in this snippet).  

---
```python
class LineListOutputParser(BaseOutputParser[List[str]]):
    """Output parser for a list of lines."""
```
- **Purpose**: Defines a custom parser to split LLM output (text) into a list of strings.  
- **Inherits from `BaseOutputParser`**: Ensures compatibility with LangChain's parsing system.  

```python
    def parse(self, text: str) -> List[str]:
        lines = text.strip().split("\n")
        return list(filter(None, lines))  # Remove empty lines
```
- **`text.strip()`**: Removes leading/trailing whitespace.  
- **`split("\n")`**: Splits text by newlines into a list.  
- **`filter(None, lines)`**: Removes empty strings (e.g., from extra newlines).  

```python
output_parser = LineListOutputParser()
```
- **Instantiates the parser** for later use in the chain.  

---
```python
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five 
    different versions of the given user question to retrieve relevant documents from a vector 
    database. By generating multiple perspectives on the user question, your goal is to help
    the user overcome some of the limitations of the distance-based similarity search. 
    Provide these alternative questions separated by newlines.
    Original question: {question}""",
)
```
- **`input_variables=["question"]`**: Defines `{question}` as the dynamic variable in the prompt.  
- **`template`**: Instructs the LLM to generate 5 query variations, separated by newlines.  

---

```python
llm_chain = QUERY_PROMPT | llm | output_parser
```
- **`|` (Pipe Operator)**: Chains components together sequentially (LangChain's syntax).  
- **Flow**:  
  1. `QUERY_PROMPT` injects the user's `question` into the template.  
  2. `llm` generates text (5 query variations).  
  3. `output_parser` splits the LLM's response into a clean list of queries.  

---

### **Key Notes**
- **Output Parser**: Ensures the LLM's response is formatted as a list (e.g., for `MultiQueryRetriever`).  
- **Prompt Engineering**: The template explicitly guides the LLM to generate diverse queries.  
- **Temperature=0**: Used for consistency in query generation.  

In [64]:
# Output parser will split the LLM result into a list of queries
class LineListOutputParser(BaseOutputParser[List[str]]):
    """Output parser for a list of lines."""

    def parse(self, text: str) -> List[str]:
        lines = text.strip().split("\n")
        return list(filter(None, lines))  # Remove empty lines


output_parser = LineListOutputParser()

QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five 
    different versions of the given user question to retrieve relevant documents from a vector 
    database. By generating multiple perspectives on the user question, your goal is to help
    the user overcome some of the limitations of the distance-based similarity search. 
    Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

In [65]:
# Chain
llm_chain = QUERY_PROMPT | llm | output_parser

# Other inputs
question = "What are the approaches to Task Decomposition?"

In [74]:
# Run
retriever = MultiQueryRetriever(
    retriever=vectordb.as_retriever(), llm_chain=llm_chain, parser_key="lines",# Ensure each sub-query fetches ranked results
    search_kwargs={"k": 1}  # Gets top 5 per query
)  # "lines" is the key (attribute name) of the parsed output

# Results
unique_docs = retriever.invoke("What does the course say about regression?")
len(unique_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Can you find documents that explain the concepts covered in the course regarding regression?', "2. I'm looking for information about the topic of regression as discussed in this course.", '3. Could you provide me with materials that detail the aspects of regression taught in the given course?', "4. What are the details about regression mentioned in the course I'm currently taking?", "5. I need documents that delve into the regression part of the course I'm enrolled in."]


11

In [75]:
unique_docs

[Document(id='dc3e0c2e-b4e3-4f2a-ac4d-0c17c72f6577', metadata={'language': 'en', 'source': 'https://en.wikipedia.org/wiki/Large_language_model', 'title': 'Large language model - Wikipedia'}, page_content='to the LLM planner can even be the LaTeX code of a paper describing the environment.[62]'),
 Document(id='440bec54-f4fc-4a11-8716-354f4eed53e5', metadata={'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient h

# **Using Pydantic for Structured, Validated LLM Outputs**  

When working with LLMs, we often want structured outputs (e.g., JSON) instead of raw text. **Pydantic** makes this easy by enforcing type safety and validation.  

#### **How It Works**  
1. **Define a Pydantic Model**  
   - Create a class specifying the expected fields and types.  
   - Example:  
     ```python
     from pydantic import BaseModel

     class UserProfile(BaseModel):
         name: str
         age: int
         email: str | None = None  # Optional field
     ```

2. **Pass the Model to the LLM**  
   - LangChain (or OpenAI) will force the LLM to generate output matching the schema.  
     ```

3. **Automatic Validation**  
   - Pydantic checks:  
     - Required fields (e.g., `name` can’t be missing).  
     - Data types (e.g., `age` must be an `int`).  
   - Raises `ValidationError` if the LLM’s output doesn’t match.  

#### **Key Benefits**  
✅ **Structured Outputs** – No more parsing messy text.  
✅ **Validation** – Catches LLM hallucinations early.  
✅ **Integration** – Works with LangChain, OpenAI, and others.  

#### **Use Cases**  
- Extracting entities (e.g., dates, names).  
- APIs requiring strict input formats.  
- Building RAG systems with validated responses.  

In [83]:
from pydantic import BaseModel, Field
from typing import Optional,List
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser

In [105]:
class customResearch(BaseModel):
    title: str = Field(description="The title of the query")
    summary: str = Field(description="The summary of the answer to the query")
    keyword: List[str] = Field(
         description="The keyword in the raw response to the query give me atleast 10"
    )

In [114]:
# Step 1: Create a prompt that forces JSON output
template = """Return a JSON object matching this schema:
{{
    "title": "string",
    "keyword": ["string"],
    "summary": "string"
}}

Generate a research topic about {input} give me atleast 10 keywords and a long summary. Output ONLY valid JSON:"""
prompt = PromptTemplate.from_template(template)

In [115]:
# Step 2: Set up the parser
parser = PydanticOutputParser(pydantic_object=customResearch)

In [116]:
# Step 3: Chain everything together
chain = prompt | llm | parser

In [117]:
# Run it
result = chain.invoke({"input": "AI ethics"})
print(result)
# Output: ResearchTopic(title="...", keywords=["..."], summary="...")

title='Exploring the Ethical Implications of Artificial Intelligence' summary='This research delves into the complex ethical landscape of Artificial Intelligence (AI). It examines the philosophical underpinnings of ethics, focusing on how these principles apply to AI autonomous decision-making. The study also investigates the impact of bias and fairness in AI systems and the need for diversity in data sets. Privacy concerns, transparency, accountability, and safety are critical areas addressed within this research. Furthermore, it considers the potential for effective human-AI collaboration and proposes regulatory frameworks to ensure ethical use of AI.' keyword=['Artificial Intelligence', 'Ethics', 'Moral Philosophy', 'Autonomous Decision-making', 'Bias and Fairness', 'Privacy', 'Transparency', 'Accountability', 'Safety and Security', 'Human-AI Collaboration', 'Regulation']
