# **ü¶úüîóLangChain RoadMap**

![author](https://img.shields.io/badge/author-mohd--faizy-red)



### **üß© What is LangChain?**

* Open-source framework (Python & JS/TS) for building LLM-powered applications‚Äîlike `chatbots`, `RAG` systems, `agents`, and `pipelines`.


### **üìÖ Timeline & Major Releases**


| üìÖ Date          | ‚öôÔ∏è Release             | üìù Details                                                                                                                                                                                                                                              |
| ---------------- | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Oct 2022**     | **Project Launch**     | LangChain was founded by Harrison Chase‚Äîmarking the initial release and growth into a robust LLM application framework                                                                                  |
| **May 10, 2024** | **v0.2 Pre‚ÄëRelease**   | Introduced separation of `langchain` & `langchain-community`, versioned docs, mature agent framework (LangGraph), standardized tool calls, improved streaming, and over 30 partner packages                                                             |
| **May 20, 2024** | **v0.2 Full Release**  | Finalized documentation refresh with clear structure (tutorials, guides, API), event streaming API, async support, and further LangGraph enhancements                                                                                                   |
| **Sep 16, 2024** | **v0.3 (Python & JS)** | Major upgrade: full switch to Pydantic v2 (deprecated v1 in June 2024), dropped Python 3.8 support (end-of-life Oct 2024); JS ecosystem adopted `@langchain/core` as peer dependency, non-blocking callbacks, and deprecated older loaders/entrypoints  |






## **‚≠ï1. Fundamentals**





### **‚≠êa. What is LangChain?**

- **LangChain** is an open-source framework designed to simplify the development of **LLM-based applications**. It provides **modular components** and **end-to-end tools** that help developers build complex AI systems such as:

    -  ü§ñ `Chatbots`
    -  ‚ùì `Question-answering systems`
    -  üîç `Retrieval-Augmented Generation (RAG) applications`
    -  üß† `Autonomous AI agents`
    -  üß∞ `Tool-using agents and more`

- üîë **Key Features:**

    1. ‚úÖ **Supports all major LLMs** (`OpenAI`, `Anthropic`, `Gemnai`,`Hugging Face`, etc.).
    - **Model-agnostic Framework** means LangChain can work with multiplelarge language models (LLMs) like `OpenAI`, `Anthropic`, `HuggingFace`, `gemini`, and more.
    2. ‚öôÔ∏è **Simplifies LLM-based app development** through reusablecomponents like `Chains`, `Prompts`, `Memory`, and `Runnables`.
    3. üîå **Integrates with key tools** such as `vector stores`, `APIs`,and `databases`.
    4. üÜì **Free and open-source**, with active community and continuousupdates
    5. üåê **Supports all major GenAI use cases**, making it a versatileframework for developers

### **‚≠êb. LangChain Components**



- **`Models`**: Interface for LLMs and chat models.
- **`Prompts`**: Templates for structured inputs to LLMs.
- **`Chains`**: Sequences of calls (models, tools).
- **`Indexes`**: This include documents loders, text splitters, vectorstores and retrivers for structured data retrieval.
- **`Memory`**: Allow the LLM to maintain a flow in the conversation bykeeping the entire conversation in contex.
- **`Agents`**: Leverage LLMs as a reasoning engine to determine whichactions to take.

### **‚≠êc. Models**

- LangChain supports multiple LLMs (`OpenAI`, `HuggingFace`, `Cohere`, etc.) via standard interfaces.
- You define which model to use and optionally provide `API` keys or configurations.





### **‚≠êd. Prompts**

- Prompts are templates or examples designed to guide LLM responses.  
- LangChain uses **PromptTemplates** to standardize and reuse prompts.  
- You can include variables and format them at runtime.



### **‚≠êe. Parsing Output**

Post-processing is often necessary to convert LLM output into structuredformats. LangChain provides:

* **Regex parsers**  
* **Pydantic-based output parsers**  
* **Structured JSON parsers**



### **‚≠êf. Runnables & LCEL (LangChain Expression Language)**

LangChain 0.1+ introduces **Runnables**, a functional abstraction over allLangChain components.  
LCEL allows you to compose logic using `|` and `+`, building complex pipelines declaratively.

```python
chain = prompt | llm | output_parser
```



### **‚≠êg. Chains**

Chains are sequences of calls that produce a final result. LangChainprovides:

* **Simple chains** (e.g., Prompt + LLM)  
* **Sequential chains**  
* **Custom chains** (your logic)

Useful for workflows like:

* Input ‚Üí Prompt ‚Üí LLM ‚Üí Output  
* Multi-step reasoning tasks



### **‚≠êh. Memory**

Memory allows LangChain apps to remember past interactions. Useful in:

* Chatbots (session history)  
* Agents (tool-use history)  

Types include:

* **BufferMemory**: Stores raw input/output  
* **SummaryMemory**: Stores summarized history  
* **TokenBufferMemory**: Limited by token count

## **‚≠ï2. Retrieval-Augmented Generation**




### **‚≠êa. Document Loaders**

These ingest data from various sources:
- Files (`PDF`, `.txt`, `CSV`)  
- Web pages (via scraping)  
- `APIs` or `databases`  

LangChain supports many prebuilt loaders.

### **‚≠êb. Text Splitters**

Large documents need splitting to avoid token limits. Splitters include:\

- **`RecursiveCharacterTextSplitter`**: Preserves structure  
- **`TokenTextSplitter`**: Splits by model tokens

Custom splitting strategies are also supported.

### **‚≠êc. Embeddings**

Embeddings convert text chunks into vector representations for semanticsimilarity. Supported models:
-  `OpenAI` Embeddings  
-  `HuggingFace` Sentence Transformers  
-  `Cohere`, `Google`, etc.

### **‚≠êd. Vector Stores**

Storage solutions for embeddings. They support similarity search(k-NN). Popular vector stores:

-  `FAISS (in-memory)`
-  `Pinecone`
-  `Chroma`
-  `Weaviate`



### **‚≠êe. Retrievers**

- Interfaces to get relevant chunks from vector stores.  
- Can apply filters, scores, and combine retrievers for hybrid search (eg., `BM25` + `vectors`).



### **‚≠êf. Building a RAG Application**
    
Combines:

1. Document ingestion (loaders)  
2. Chunking (splitters)  
3. Embedding + storing (vector store)  
4. Query-time retrieval  
5. LLM answering using retrieved context  
    
Use `RetrievalQA` chain or LCEL for more control.

## **‚≠ï3. Agents**

### **‚≠êa. Tools & Toolkits**
- Tools are functions an agent can call. A tool may be:
    - `Web search`  
    - `Calculator`  
    - `API call`
    - `Database query`

> Toolkits group related tools (e.g., a SQL toolkit includes a queryexecutor + schema inspector).



### **‚≠êb. Tool Calling**
- Agents decide whether and when to use tools via reasoning (viaprompts). LangChain supports:
    - OpenAI function calling (tool descriptions)  
    - JSON mode (structured calls)  

> Agents use intermediate steps (thoughts, actions, observations) toreach answers.



### **‚≠êc. Building an AI Agent**

**Steps:**
1. Define tools  
2. Create prompt to guide agent behavior  
3. Choose agent type (`zero-shot-react`, `function-calling`, etc.)  
4. Initialize with `initialize_agent(...)`

- **Agents can:**
    - Reason over multiple steps  
    - Maintain state (via memory)  
    - Decide which tool to use, and how

# **üü¢Example**



## **‚≠êPDF-based RAG (Retrieval-Augmented Generation) Pipeline Architecture**

![LC_00_01](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_img/_oth_img/LC_00_01.png)

- **PDF Upload**: A PDF is uploaded to `AWS S3`.
- **Document Loading**: The file is fetched using a Doc Loader.
- **Text Splitting**: The PDF is split into individual pages or chunks.
- **Embedding Generation**: Each chunk is converted into embeddings.
- **Vector Database**: These embeddings are stored in a vector database.
- **User Query**: The user query is also converted into an embedding.
- **Semantic Search**: A semantic `similarity` search is performed to find relevant pages/chunks.
- **Context Assembly**: Relevant document chunks and the user query are combined into a `system prompt`.
- **LLM Processing**: This prompt is sent to an LLM `API` to generate the final answer.



## üîç **LLM Framework Comparison: `LlamaIndex` vs `Haystack` vs `LangChain`**

| Feature / Criteria                    | **LlamaIndex**                              | **Haystack**                                          | **LangChain**                                 |                              |
| ------------------------------------- | ------------------------------------------- | ----------------------------------------------------- | --------------------------------------------- | ---------------------------- |
| **Primary Use Case**                  | Document indexing & RAG pipelines           | Search & QA pipelines (initially Elasticsearch-based) | Modular LLM app building (RAG, agents, tools) |                              |
| **Modular Components**                | ‚úÖ Nodes, Indexes, Retrievers, Engines       | ‚úÖ Pipelines, Nodes                                    | ‚úÖ Chains, Runnables, Agents, Memory           |                              |
| **Agent Support**                     | ‚ö†Ô∏è Limited (experimental or via plugins)    | ‚ö†Ô∏è Limited / basic (beta-level agent support)         | ‚úÖ Full support (tools, memory, multi-step)    |                              |
| **Retrieval-Augmented Generation**    | ‚úÖ Native & optimized                        | ‚úÖ Well-supported                                      | ‚úÖ Fully supported with LCEL & RAG chains      |                              |
| **Data Ingestion**                    | ‚úÖ Document loaders & structured data        | ‚úÖ Strong file + database support                      | ‚úÖ Rich set of loaders (via integrations)      |                              |
| **Integration with Vector Stores**    | ‚úÖ FAISS, Chroma, Weaviate, Pinecone, etc.   | ‚úÖ FAISS, Milvus, Elasticsearch, etc.                  | ‚úÖ Wide integration with popular stores        |                              |
| **Embeddings Support**                | ‚úÖ OpenAI, HuggingFace, Cohere, etc.         | ‚úÖ Wide support via transformers                       | ‚úÖ Full support via plugins + LLM wrappers     |                              |
| **Memory Management**                 | ‚ö†Ô∏è Basic context injection                  | ‚ö†Ô∏è Session-based history                              | ‚úÖ Buffer, Summary, TokenBufferMemory types    |                              |
| **Tool Use (Calculator, APIs, etc.)** | ‚ùå Not native (needs LangChain-style agents) | ‚ö†Ô∏è Custom node needed                                 | ‚úÖ Native tool use with agents                 |                              |
| **Custom Logic Composition**          | ‚úÖ High-level with Composability APIs        | ‚úÖ With custom pipelines                               | ‚úÖ LCEL (\`                                    | `, `+\`) + Pythonic chaining |
| **Ease of Use**                       | ‚úÖ Simple RAG workflows                      | ‚ö†Ô∏è Moderate (YAML or Python)                          | ‚ö†Ô∏è Moderate‚Äìsteep learning curve              |                              |
| **Community & Docs**                  | ‚úÖ Growing, great tutorials                  | ‚úÖ Strong docs, active GitHub                          | ‚úÖ Large community, frequent updates           |                              |
| **Language Support**                  | üü° Python-only                              | üü¢ Python (core), some JS/TS & Java work              | üü¢ Python (main), TypeScript JS SDK (beta)    |                              |
| **Production Readiness**              | ‚úÖ Stable core with experimental agents      | ‚úÖ Used in enterprise (deployed search apps)           | ‚úÖ Production-ready (widely adopted)           |                              |
| **Best For**                          | Fast RAG setup with structured data         | Hybrid QA systems, search with transformers           | Advanced LLM workflows, agents, tool use      |                              |






### **‚úÖ Summary**

| Use Case                   | Best Framework |
| -------------------------- | -------------- |
| Simple, fast RAG setup     | **LlamaIndex** |
| Hybrid search pipelines    | **Haystack**   |
| Tool-using agents & chains | **LangChain**  |

![divider.png](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_img/_langCompIMG/divider.png)