# **LangChain RoadMap:**

# **📌RoadMap**






## **⭐1. Fundamentals**

- a. **What is LangChain?**

    - **LangChain** is an open-source framework designed to simplify the development of **LLM-based applications**. It provides **modular components** and **end-to-end tools** that help developers build complex AI systems such as:

        * 🤖 Chatbots  
        * ❓ Question-answering systems  
        * 🔍 Retrieval-Augmented Generation (RAG) applications  
        * 🧠 Autonomous AI agents  
        * 🧰 Tool-using agents and more  

- 🔑 **Key Features:**

    1. ✅ **Supports all major LLMs** (OpenAI, Anthropic, Gemnai, Hugging Face, etc.)
    - **Model-agnostic Framework** means LangChain can work with multiple large language models (LLMs) like `OpenAI`, `Anthropic`, `Hugging Face`, `gemini`, and more.
    2. ⚙️ **Simplifies LLM-based app development** through reusable components like `Chains`, `Prompts`, `Memory`, and `Runnables`.
    3. 🔌 **Integrates with key tools** such as vector stores, APIs, and databases
    4. 🆓 **Free and open-source**, with active community and continuous updates
    5. 🌐 **Supports all major GenAI use cases**, making it a versatile framework for developers



## **⭐2. LangChain Components**



- a. **`Models`**: Interface for LLMs and chat models.
- b. **`Prompts`**: Templates for structured inputs to LLMs.
- c. **`Chains`**: Sequences of calls (models, tools).
- d. **`Indexes`**: This include documents loders, text splitters, vectorstores and retrivers for structured data retrieval.
- e. **`Memory`**: Allow the LLM to maintain a flow in the conversation bykeeping the entire conversation in contex.
- f. **`Agents`**: Leverage LLMs as a reasoning engine to determine whichactions to take.

## **⭐3. Models**

- LangChain supports multiple LLMs (OpenAI, HuggingFace, Cohere, etc.) via standard interfaces.
- You define which model to use and optionally provide API keys or configurations.





## **⭐4. Prompts**
- Prompts are templates or examples designed to guide LLM responses.  
- LangChain uses **PromptTemplates** to standardize and reuse prompts.  
- You can include variables and format them at runtime.



## **⭐5. Parsing Output**

Post-processing is often necessary to convert LLM output into structuredformats. LangChain provides:

* **Regex parsers**  
* **Pydantic-based output parsers**  
* **Structured JSON parsers**



## **⭐6. Runnables & LCEL (LangChain Expression Language)**

LangChain 0.1+ introduces **Runnables**, a functional abstraction over allLangChain components.  
LCEL allows you to compose logic using `|` and `+`, building complex pipelines declaratively.

```python
chain = prompt | llm | output_parser
```



## **⭐7. Chains**

Chains are sequences of calls that produce a final result. LangChainprovides:

* **Simple chains** (e.g., Prompt + LLM)  
* **Sequential chains**  
* **Custom chains** (your logic)

Useful for workflows like:

* Input → Prompt → LLM → Output  
* Multi-step reasoning tasks



## **⭐8. Memory**

Memory allows LangChain apps to remember past interactions. Useful in:

* Chatbots (session history)  
* Agents (tool-use history)  

Types include:

* **BufferMemory**: Stores raw input/output  
* **SummaryMemory**: Stores summarized history  
* **TokenBufferMemory**: Limited by token count

## **⭐Retrieval-Augmented Generation**

- **RAG**

    - a. **Document Loaders**

        These ingest data from various sources:

        * Files (PDF, .txt, CSV)  
        * Web pages (via scraping)  
        * APIs or databases  
        LangChain supports many prebuilt loaders.

    - b. **Text Splitters**

        Large documents need splitting to avoid token limits. Splitters include:

        * **RecursiveCharacterTextSplitter**: Preserves structure  
        * **TokenTextSplitter**: Splits by model tokens  
        Custom splitting strategies are also supported.

    - c. **Embeddings**

        Embeddings convert text chunks into vector representations for semantic similarity. Supported models:

        * OpenAI Embeddings  
        * HuggingFace Sentence Transformers  
        * Cohere, Google, etc.

    - d. **Vector Stores**

        Storage solutions for embeddings. They support similarity search (k-NN). Popular vector stores:

        * FAISS (in-memory)  
        * Pinecone  
        * Chroma  
        * Weaviate

    - e. **Retrievers**

        Interfaces to get relevant chunks from vector stores.  
        Can apply filters, scores, and combine retrievers for hybrid search (e.g., BM25 + vectors).

    - f. **Building a RAG Application**

        Combines:

        - 1. Document ingestion (loaders)  
        - 2. Chunking (splitters)  
        - 3. Embedding + storing (vector store)  
        - 4. Query-time retrieval  
        - 5. LLM answering using retrieved context  

        Use `RetrievalQA` chain or LCEL for more control.



## **⭐Agents**

- a. **Tools & Toolkits**
    - Tools are functions an agent can call. A tool may be:
        - `Web search`  
        - `Calculator`  
        - `API call`
        - `Database query`

    > Toolkits group related tools (e.g., a SQL toolkit includes a queryexecutor + schema inspector).

- b. **Tool Calling**
    - Agents decide whether and when to use tools via reasoning (viaprompts). LangChain supports:
        - OpenAI function calling (tool descriptions)  
        - JSON mode (structured calls)  
    
    > Agents use intermediate steps (thoughts, actions, observations) toreach answers.

- c. **Building an AI Agent**
    - Steps:
        - 1. Define tools  
        - 2. Create prompt to guide agent behavior  
        - 3. Choose agent type (`zero-shot-react`, `function-calling`, etc.)  
        - 4. Initialize with `initialize_agent(...)`

    - Agents can:
        - Reason over multiple steps  
        - Maintain state (via memory)  
        - Decide which tool to use, and how

# **⭕Example:**



## **⭐PDF-based RAG (Retrieval-Augmented Generation) Pipeline Architecture**

![LC_00_01](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_img/_oth_img/LC_00_01.png)

- **PDF Upload**: A PDF is uploaded to `AWS S3`.
- **Document Loading**: The file is fetched using a Doc Loader.
- **Text Splitting**: The PDF is split into individual pages or chunks.
- **Embedding Generation**: Each chunk is converted into embeddings.
- **Vector Database**: These embeddings are stored in a vector database.
- **User Query**: The user query is also converted into an embedding.
- **Semantic Search**: A semantic `similarity` search is performed to find relevant pages/chunks.
- **Context Assembly**: Relevant document chunks and the user query are combined into a `system prompt`.
- **LLM Processing**: This prompt is sent to an LLM `API` to generate the final answer.



## 🔍 **LLM Framework Comparison: `LlamaIndex` vs `Haystack` vs `LangChain`**

| Feature / Criteria                    | **LlamaIndex**                              | **Haystack**                                          | **LangChain**                                 |                              |
| ------------------------------------- | ------------------------------------------- | ----------------------------------------------------- | --------------------------------------------- | ---------------------------- |
| **Primary Use Case**                  | Document indexing & RAG pipelines           | Search & QA pipelines (initially Elasticsearch-based) | Modular LLM app building (RAG, agents, tools) |                              |
| **Modular Components**                | ✅ Nodes, Indexes, Retrievers, Engines       | ✅ Pipelines, Nodes                                    | ✅ Chains, Runnables, Agents, Memory           |                              |
| **Agent Support**                     | ⚠️ Limited (experimental or via plugins)    | ⚠️ Limited / basic (beta-level agent support)         | ✅ Full support (tools, memory, multi-step)    |                              |
| **Retrieval-Augmented Generation**    | ✅ Native & optimized                        | ✅ Well-supported                                      | ✅ Fully supported with LCEL & RAG chains      |                              |
| **Data Ingestion**                    | ✅ Document loaders & structured data        | ✅ Strong file + database support                      | ✅ Rich set of loaders (via integrations)      |                              |
| **Integration with Vector Stores**    | ✅ FAISS, Chroma, Weaviate, Pinecone, etc.   | ✅ FAISS, Milvus, Elasticsearch, etc.                  | ✅ Wide integration with popular stores        |                              |
| **Embeddings Support**                | ✅ OpenAI, HuggingFace, Cohere, etc.         | ✅ Wide support via transformers                       | ✅ Full support via plugins + LLM wrappers     |                              |
| **Memory Management**                 | ⚠️ Basic context injection                  | ⚠️ Session-based history                              | ✅ Buffer, Summary, TokenBufferMemory types    |                              |
| **Tool Use (Calculator, APIs, etc.)** | ❌ Not native (needs LangChain-style agents) | ⚠️ Custom node needed                                 | ✅ Native tool use with agents                 |                              |
| **Custom Logic Composition**          | ✅ High-level with Composability APIs        | ✅ With custom pipelines                               | ✅ LCEL (\`                                    | `, `+\`) + Pythonic chaining |
| **Ease of Use**                       | ✅ Simple RAG workflows                      | ⚠️ Moderate (YAML or Python)                          | ⚠️ Moderate–steep learning curve              |                              |
| **Community & Docs**                  | ✅ Growing, great tutorials                  | ✅ Strong docs, active GitHub                          | ✅ Large community, frequent updates           |                              |
| **Language Support**                  | 🟡 Python-only                              | 🟢 Python (core), some JS/TS & Java work              | 🟢 Python (main), TypeScript JS SDK (beta)    |                              |
| **Production Readiness**              | ✅ Stable core with experimental agents      | ✅ Used in enterprise (deployed search apps)           | ✅ Production-ready (widely adopted)           |                              |
| **Best For**                          | Fast RAG setup with structured data         | Hybrid QA systems, search with transformers           | Advanced LLM workflows, agents, tool use      |                              |






### **✅ Summary**

| Use Case                   | Best Framework |
| -------------------------- | -------------- |
| Simple, fast RAG setup     | **LlamaIndex** |
| Hybrid search pipelines    | **Haystack**   |
| Tool-using agents & chains | **LangChain**  |

![divider.png](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_img/_langCompIMG/divider.png)