![](https://i.imgur.com/N0zUCi0.png)

# LangChain LLM Interfacing Guide

This notebook demonstrates how to connect to different Large Language Models (LLMs) using LangChain. We'll explore two main approaches:

1. **`init_chat_model`** - Universal model initializer (recommended for flexibility)
2. **Provider-specific interfaces** - Direct class imports like `ChatOpenAI`, `ChatGoogleGenerativeAI`, `ChatGroq`

## What You'll Learn

- How to use `init_chat_model` for unified LLM initialization
- How to use provider-specific chat interfaces
- Simple prompting with `.invoke()`
- Streaming responses

## Models Covered

- **OpenAI**: GPT-4.1-mini
- **Google**: Gemini 2.5 Flash
- **Groq**: OpenAI GPT OSS 120B (Open Source)


---

## Installation

First, install all required packages:

In [None]:
# Install all required packages
!pip install -q langchain==1.2.8
!pip install -q langchain-openai==1.1.7
!pip install -q langchain-google-genai==4.2.0
!pip install -q langchain-groq==1.1.2

[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m84.8/84.8 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m66.5/66.5 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m137.5/137.5 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25h

---

## Setup Environment Variables

Set up your API keys securely. You'll need to obtain:

- **OpenAI API Key**: Get from [OpenAI Platform](https://platform.openai.com/api-keys)
- **Google API Key**: Get from [Google AI Studio](https://aistudio.google.com/app/apikey)
- **Groq API Key**: Get from [Groq Console](https://console.groq.com/keys)

In [None]:
import os
from getpass import getpass

# Set up OpenAI API key
os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API Key: ")

# Set up Google API key
os.environ["GOOGLE_API_KEY"] = getpass("Enter your Google API Key: ")

# Set up Groq API key
os.environ["GROQ_API_KEY"] = getpass("Enter your Groq API Key: ")

Enter your OpenAI API Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑
Enter your Google API Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑
Enter your Groq API Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


---

# Part 1: Using `init_chat_model`

The `init_chat_model` function is a **universal model initializer** introduced in LangChain. It provides a consistent interface to initialize any supported chat model without needing to remember different import paths or class names.

## Advantages of `init_chat_model`:

- ‚úÖ **Single import** for all providers
- ‚úÖ **Consistent API** across different LLMs
- ‚úÖ **Easy model switching** - change one line to switch providers
- ‚úÖ **Runtime configuration** - can make models configurable at runtime

## Syntax

```python
from langchain.chat_models import init_chat_model

# Option 1: With explicit model_provider
model = init_chat_model(
    "model-name",
    model_provider="provider-name",
    temperature=0
)

# Option 2: Auto-infer provider (works for common models)
model = init_chat_model("gpt-4o", temperature=0)
```

## Example 1: OpenAI GPT-4.1-mini with `init_chat_model`

In [None]:
from langchain.chat_models import init_chat_model

# Initialize GPT-4.1-mini using init_chat_model
# We explicitly specify model_provider for clarity
gpt_mini = init_chat_model(
    "gpt-4.1-mini",
    model_provider="openai",  # Explicitly set provider
    temperature=0  # Deterministic responses
)

print("‚úÖ GPT-4.1-mini initialized successfully using init_chat_model")

‚úÖ GPT-4.1-mini initialized successfully using init_chat_model


### Simple Prompt with `.invoke()`

The `.invoke()` method sends a prompt to the LLM and returns the response.

In [None]:
# Simple prompt example
prompt = "Explain what LangChain is 3 bullet points."

# Invoke the model
response = gpt_mini.invoke(prompt)

# Display the response
print("Prompt:", prompt)
print("\nResponse:")
print(response)

Prompt: Explain what LangChain is 3 bullet points.

Response:
content='- LangChain is a framework designed to simplify the development of applications that use large language models (LLMs) by providing tools to manage prompts, chains, and integrations.  \n- It enables the creation of complex workflows by connecting LLMs with external data sources, APIs, and other computational resources.  \n- LangChain supports building applications like chatbots, question-answering systems, and document analysis tools with enhanced context management and memory capabilities.' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 88, 'prompt_tokens': 17, 'total_tokens': 105, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint'

In [None]:
response

AIMessage(content='- LangChain is a framework designed to simplify the development of applications that use large language models (LLMs) by providing tools to manage prompts, chains, and integrations.  \n- It enables the creation of complex workflows by connecting LLMs with external data sources, APIs, and other computational resources.  \n- LangChain supports building applications like chatbots, question-answering systems, and document analysis tools with enhanced context management and memory capabilities.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 88, 'prompt_tokens': 17, 'total_tokens': 105, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_75546bd1a7', 'id': 'chatcmpl-D9DDwDSRBmEaDhN

In [None]:
print(response.text)

- LangChain is a framework designed to simplify the development of applications that use large language models (LLMs) by providing tools to manage prompts, chains, and integrations.  
- It enables the creation of complex workflows by connecting LLMs with external data sources, APIs, and other computational resources.  
- LangChain supports building applications like chatbots, question-answering systems, and document analysis tools with enhanced context management and memory capabilities.


In [None]:
print(response.usage_metadata)

{'input_tokens': 17, 'output_tokens': 88, 'total_tokens': 105, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}


---

# Part 2: Using Provider-Specific Chat Interfaces

While `init_chat_model` is convenient, you can also use **provider-specific chat classes**. These give you more direct control and access to provider-specific features.

## When to use provider-specific interfaces:

- Need access to **provider-specific parameters**
- Want **type hints and autocomplete** for that provider
- Building a production app with a **fixed provider**

## Example 2: OpenAI GPT-4.1-mini with `ChatOpenAI`

In [None]:
from langchain_openai import ChatOpenAI

# Initialize GPT-4.1-mini using the ChatOpenAI class
# This is the traditional approach - importing the provider-specific class
chat_openai = ChatOpenAI(
    model="gpt-4.1-mini",  # Specify the model name
    temperature=0,  # Controls randomness (0 = deterministic, 1 = creative)
    max_tokens=None,  # Maximum tokens in response (None = no limit)
    timeout=None,  # Request timeout in seconds
    max_retries=2,  # Number of retries on failure
)

print("‚úÖ GPT-4.1-mini initialized successfully using ChatOpenAI")

‚úÖ GPT-4.1-mini initialized successfully using ChatOpenAI


### Simple Prompt with `.invoke()`

In [None]:
# Simple prompt example
prompt = "What are the three main benefits of using LLMs?"

# Invoke the model
response = chat_openai.invoke(prompt)

# Display the response
print("Prompt:", prompt)
print("\nResponse:")
print(response)

Prompt: What are the three main benefits of using LLMs?

Response:
content='The three main benefits of using Large Language Models (LLMs) are:\n\n1. **Natural Language Understanding and Generation:** LLMs can comprehend and produce human-like text, enabling more intuitive and effective communication between humans and machines.\n\n2. **Versatility Across Tasks:** They can perform a wide range of language-related tasks‚Äîsuch as translation, summarization, question answering, and content creation‚Äîwithout needing task-specific training.\n\n3. **Efficiency and Scalability:** LLMs can automate complex language tasks at scale, improving productivity and enabling applications that require processing large volumes of text quickly and accurately.' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 120, 'prompt_tokens': 19, 'total_tokens': 139, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rej

In [None]:
response

AIMessage(content='The three main benefits of using Large Language Models (LLMs) are:\n\n1. **Natural Language Understanding and Generation:** LLMs can comprehend and produce human-like text, enabling more intuitive and effective communication between humans and machines.\n\n2. **Versatility Across Tasks:** They can perform a wide range of language-related tasks‚Äîsuch as translation, summarization, question answering, and content creation‚Äîwithout needing task-specific training.\n\n3. **Efficiency and Scalability:** LLMs can automate complex language tasks at scale, improving productivity and enabling applications that require processing large volumes of text quickly and accurately.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 120, 'prompt_tokens': 19, 'total_tokens': 139, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': 

In [None]:
print(response.content)

The three main benefits of using Large Language Models (LLMs) are:

1. **Natural Language Understanding and Generation:** LLMs can comprehend and produce human-like text, enabling more intuitive and effective communication between humans and machines.

2. **Versatility Across Tasks:** They can perform a wide range of language-related tasks‚Äîsuch as translation, summarization, question answering, and content creation‚Äîwithout needing task-specific training.

3. **Efficiency and Scalability:** LLMs can automate complex language tasks at scale, improving productivity and enabling applications that require processing large volumes of text quickly and accurately.


In [None]:
from IPython.display import display, Markdown

display(Markdown(response.content))

The three main benefits of using Large Language Models (LLMs) are:

1. **Natural Language Understanding and Generation:** LLMs can comprehend and produce human-like text, enabling more intuitive and effective communication between humans and machines.

2. **Versatility Across Tasks:** They can perform a wide range of language-related tasks‚Äîsuch as translation, summarization, question answering, and content creation‚Äîwithout needing task-specific training.

3. **Efficiency and Scalability:** LLMs can automate complex language tasks at scale, improving productivity and enabling applications that require processing large volumes of text quickly and accurately.

---

## Example 3: Google Gemini 2.5 Flash Preview with `ChatGoogleGenerativeAI`

Google's Gemini models are powerful multimodal LLMs. The 2.5 Flash Preview offers fast inference with strong capabilities.

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

# Initialize Gemini 2.5 Flash Preview
# Model ID format: gemini-2.5-flash-preview-04-17
# Note: Preview models may have different features than stable releases
chat_gemini = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)

print("‚úÖ Gemini 2.5 Flash initialized successfully")

‚úÖ Gemini 2.5 Flash initialized successfully


### Simple Prompt with `.invoke()`

In [None]:
# Simple prompt example
prompt = "Explain the concept of prompt engineering in 3 bullets."

# Invoke the model
response = chat_gemini.invoke(prompt)

# Display the response
print("Prompt:", prompt)
print("\nResponse:")
print(response)

Prompt: Explain the concept of prompt engineering in 3 bullets.

Response:
content="Here are 3 bullets explaining prompt engineering:\n\n*   **Crafting Effective Inputs:** Prompt engineering is the discipline of designing, refining, and optimizing the text inputs (prompts) given to AI models (especially large language models) to elicit specific, accurate, and desired outputs.\n*   **Guiding AI Behavior:** It involves understanding how AI models process information and using various techniques‚Äîsuch as providing context, examples, constraints, or specific instructions‚Äîto steer the model's generation process towards a particular goal or format.\n*   **Maximizing AI Utility:** The ultimate aim is to enhance the performance, reliability, and relevance of AI systems by minimizing ambiguity, reducing errors, and ensuring the AI consistently delivers high-quality, useful responses for a given task or application." additional_kwargs={} response_metadata={'finish_reason': 'STOP', 'model_name

In [None]:
response

AIMessage(content="Here are 3 bullets explaining prompt engineering:\n\n*   **Crafting Effective Inputs:** Prompt engineering is the discipline of designing, refining, and optimizing the text inputs (prompts) given to AI models (especially large language models) to elicit specific, accurate, and desired outputs.\n*   **Guiding AI Behavior:** It involves understanding how AI models process information and using various techniques‚Äîsuch as providing context, examples, constraints, or specific instructions‚Äîto steer the model's generation process towards a particular goal or format.\n*   **Maximizing AI Utility:** The ultimate aim is to enhance the performance, reliability, and relevance of AI systems by minimizing ambiguity, reducing errors, and ensuring the AI consistently delivers high-quality, useful responses for a given task or application.", additional_kwargs={}, response_metadata={'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'model_provider': 

In [None]:
print(response.content)

Here are 3 bullets explaining prompt engineering:

*   **Crafting Effective Inputs:** Prompt engineering is the discipline of designing, refining, and optimizing the text inputs (prompts) given to AI models (especially large language models) to elicit specific, accurate, and desired outputs.
*   **Guiding AI Behavior:** It involves understanding how AI models process information and using various techniques‚Äîsuch as providing context, examples, constraints, or specific instructions‚Äîto steer the model's generation process towards a particular goal or format.
*   **Maximizing AI Utility:** The ultimate aim is to enhance the performance, reliability, and relevance of AI systems by minimizing ambiguity, reducing errors, and ensuring the AI consistently delivers high-quality, useful responses for a given task or application.


In [None]:
print(response.text)

Here are 3 bullets explaining prompt engineering:

*   **Crafting Effective Inputs:** Prompt engineering is the discipline of designing, refining, and optimizing the text inputs (prompts) given to AI models (especially large language models) to elicit specific, accurate, and desired outputs.
*   **Guiding AI Behavior:** It involves understanding how AI models process information and using various techniques‚Äîsuch as providing context, examples, constraints, or specific instructions‚Äîto steer the model's generation process towards a particular goal or format.
*   **Maximizing AI Utility:** The ultimate aim is to enhance the performance, reliability, and relevance of AI systems by minimizing ambiguity, reducing errors, and ensuring the AI consistently delivers high-quality, useful responses for a given task or application.


In [None]:
display(Markdown(response.text))

Here are 3 bullets explaining prompt engineering:

*   **Crafting Effective Inputs:** Prompt engineering is the discipline of designing, refining, and optimizing the text inputs (prompts) given to AI models (especially large language models) to elicit specific, accurate, and desired outputs.
*   **Guiding AI Behavior:** It involves understanding how AI models process information and using various techniques‚Äîsuch as providing context, examples, constraints, or specific instructions‚Äîto steer the model's generation process towards a particular goal or format.
*   **Maximizing AI Utility:** The ultimate aim is to enhance the performance, reliability, and relevance of AI systems by minimizing ambiguity, reducing errors, and ensuring the AI consistently delivers high-quality, useful responses for a given task or application.

---

## Example 4: Groq with OpenAI GPT OSS 120B using `ChatGroq`

Groq provides **ultra-fast inference** for open-source models. The OpenAI GPT OSS 120B model is a powerful open-source LLM available through Groq's LPU inference technology.

In [None]:
from langchain_groq import ChatGroq

# Initialize GPT OSS 120B on Groq
# Groq offers extremely fast inference speeds with their LPU architecture
chat_groq = ChatGroq(
    model="openai/gpt-oss-120b",  # OpenAI GPT OSS 120B LLM
    temperature=0,  # Deterministic responses
    max_tokens=None,
    timeout=None,
    max_retries=2,
)

print("‚úÖ OpenAI GPT OSS 120B (Groq) initialized successfully")

‚úÖ OpenAI GPT OSS 120B (Groq) initialized successfully


### Simple Prompt with `.invoke()`

In [None]:
# Simple prompt example
prompt = "What makes open-source LLMs valuable for developers?"

# Invoke the model
response = chat_groq.invoke(prompt)

# Display the response
print("Prompt:", prompt)
print("\nResponse:")
print(response)

Prompt: What makes open-source LLMs valuable for developers?

Response:
content='### Why Open‚ÄëSource LLMs Are a Big Deal for Developers\n\n| Benefit | What It Means for You | Typical Use‚ÄëCases |\n|---------|----------------------|-------------------|\n| **Full Transparency** | You can inspect the model architecture, training data (or at least the data‚Äëpipeline), and inference code. No hidden ‚Äúblack‚Äëbox‚Äù tricks. | Debugging unexpected outputs, compliance audits, academic research. |\n| **Customizability** | Fine‚Äëtune, prune, quantize, or even modify the tokenizer to fit a niche domain (legal, medical, gaming, etc.). | Building domain‚Äëspecific assistants, code‚Äëcompletion tools, or chatbots that respect company terminology. |\n| **Cost Control** | No per‚Äëtoken API fees; you only pay for the compute you run (or even run on‚Äëpremise on spare GPUs). | Scaling internal tools, batch processing large corpora, or running offline on edge devices. |\n| **Data Privacy & Securit

In [None]:
display(Markdown(response.content))

### Why Open‚ÄëSource LLMs Are a Big Deal for Developers

| Benefit | What It Means for You | Typical Use‚ÄëCases |
|---------|----------------------|-------------------|
| **Full Transparency** | You can inspect the model architecture, training data (or at least the data‚Äëpipeline), and inference code. No hidden ‚Äúblack‚Äëbox‚Äù tricks. | Debugging unexpected outputs, compliance audits, academic research. |
| **Customizability** | Fine‚Äëtune, prune, quantize, or even modify the tokenizer to fit a niche domain (legal, medical, gaming, etc.). | Building domain‚Äëspecific assistants, code‚Äëcompletion tools, or chatbots that respect company terminology. |
| **Cost Control** | No per‚Äëtoken API fees; you only pay for the compute you run (or even run on‚Äëpremise on spare GPUs). | Scaling internal tools, batch processing large corpora, or running offline on edge devices. |
| **Data Privacy & Security** | All inference happens inside your own environment, so sensitive prompts never leave your network. | Handling PHI, financial data, or any proprietary information. |
| **Rapid Innovation Cycle** | Community contributions (new layers, LoRA adapters, efficient kernels) appear weeks after a research paper is released. | Staying on the cutting edge without waiting for a commercial vendor to add the feature. |
| **Vendor Independence** | No lock‚Äëin to a single cloud provider; you can move the model between on‚Äëprem, private cloud, or any public cloud that supports the runtime. | Multi‚Äëcloud strategies, disaster‚Äërecovery, or cost‚Äëoptimizing workloads. |
| **Ecosystem & Tooling** | Rich libraries (ü§ó‚ÄØTransformers, vLLM, DeepSpeed, Axolotl, OpenAI‚Äëcompatible APIs) make training, serving, and monitoring straightforward. | End‚Äëto‚Äëend pipelines: data ingestion ‚Üí fine‚Äëtuning ‚Üí deployment ‚Üí observability. |
| **Community Support** | Forums, Discord/Slack channels, GitHub issues, and shared checkpoints accelerate problem‚Äësolving. | Getting help with GPU memory errors, prompt engineering, or model conversion. |
| **Licensing Flexibility** | Many models use permissive licenses (Apache‚ÄØ2.0, MIT, CC‚ÄëBY‚Äë4.0) that allow commercial use, redistribution, or modification. | Embedding the model in a SaaS product, bundling with hardware, or releasing a derivative model. |
| **Educational Value** | Hands‚Äëon exposure to state‚Äëof‚Äëthe‚Äëart architectures (Transformer variants, Mixture‚Äëof‚ÄëExperts, Retrieval‚ÄëAugmented Generation). | Training new hires, university courses, or personal skill development. |

---

#### Concrete Examples

| Open‚ÄëSource Model | License | Typical Strengths | Example Projects |
|-------------------|---------|-------------------|------------------|
| **LLaMA‚Äë2** (Meta) | Meta‚ÄëLLAMA‚Äë2‚ÄëCommunity (non‚Äëcommercial) / LLaMA‚Äë2‚ÄëChat (commercial) | Strong general‚Äëpurpose performance, good instruction following. | Internal Q&A bots, code‚Äëreview assistants. |
| **Mistral‚Äë7B** | Apache‚ÄØ2.0 | Efficient 7‚ÄëB model with competitive zero‚Äëshot results. | Low‚Äëlatency inference on a single GPU, edge deployments. |
| **Phi‚Äë3** (Microsoft) | MIT | Tiny (2.7‚ÄëB) but high‚Äëquality, optimized for CPU/ONNX. | Desktop assistants, mobile apps, or serverless functions. |
| **Gemma** (Google) | Apache‚ÄØ2.0 | Balanced trade‚Äëoff between size (2‚ÄëB/7‚ÄëB) and instruction following. | Rapid prototyping of chat interfaces. |
| **OpenChatKit** | MIT | End‚Äëto‚Äëend chat framework with UI, retrieval, and moderation tools. | Building a customer‚Äësupport portal with minimal code. |

---

#### How to Get Started Quickly

1. **Pick a Model & Runtime**  
   ```bash
   # Example: pull Mistral‚Äë7B with vLLM (fast serving)
   pip install vllm
   vllm serve mistralai/Mistral-7B-Instruct-v0.1 --port 8000
   ```

2. **Wrap It in an OpenAI‚Äëcompatible API (optional)**  
   ```bash
   # Using `vllm`'s built‚Äëin OpenAI‚Äëcompatible endpoint
   curl http://localhost:8000/v1/chat/completions \
        -H "Content-Type: application/json" \
        -d '{"model":"mistral","messages":[{"role":"user","content":"Explain quantum tunneling in 2 sentences"}]}'
   ```

3. **Fine‚ÄëTune (if needed)**  
   - Use **PEFT** (LoRA) for cheap adaptation: `pip install peft transformers accelerate`.  
   - Example script: `python finetune_lora.py --model mistralai/Mistral-7B-Instruct-v0.1 --train_file data.jsonl`.

4. **Deploy**  
   - **Docker**: `docker run -p 8000:8000 ghcr.io/vllm/vllm:latest --model mistralai/Mistral-7B-Instruct-v0.1`.  
   - **Kubernetes**: Helm chart `vllm` or `text-generation-inference` for autoscaling.

---

#### TL;DR

Open‚Äësource LLMs give developers **control, cost‚Äëefficiency, privacy, and a fast path to innovation**‚Äîall backed by a vibrant community and permissive licensing. They let you turn a generic language model into a **tailored, production‚Äëready service** without being locked into a proprietary API. Whether you‚Äôre building a chatbot, a code‚Äëassistant, or a data‚Äëanalysis pipeline, the open‚Äësource stack provides the building blocks to iterate quickly and own the entire stack.

---

# Part 3: Streaming Responses

Streaming allows you to receive the LLM's response **token by token** as it's being generated, rather than waiting for the complete response. This is useful for:

- **Better UX** - Users see responses appearing in real-time
- **Long responses** - Start processing/displaying before completion
- **Interactive applications** - Chat interfaces, assistants, etc.

## Comparison: `.invoke()` vs `.stream()`

| Method | Behavior | Use Case |
|--------|----------|----------|
| `.invoke()` | Returns complete response at once | Simple Q&A, batch processing |
| `.stream()` | Yields response token-by-token | Chat UIs, real-time display |

## Example 5: Regular `.invoke()` - Full Response at Once

In [None]:
# Using ChatOpenAI for this example
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="gpt-4.1-mini",
    temperature=0
)

prompt = "Write a short report on pros and cons of Generative AI with 3-5 bullets per topic."

print("Using .invoke() - Full response at once:")
print("=" * 50)

# Get full response
response = model.invoke(prompt)

# Print the complete response
print(response.content)
print("=" * 50)
print("‚úÖ Complete response received")

Using .invoke() - Full response at once:
**Report on Pros and Cons of Generative AI**

**Pros:**
- **Creativity Enhancement:** Generative AI can produce original content such as text, images, music, and designs, aiding creative professionals and hobbyists.
- **Efficiency and Automation:** It automates content creation tasks, saving time and reducing human effort in areas like writing, coding, and design.
- **Personalization:** Enables highly personalized user experiences by generating tailored recommendations, messages, or products.
- **Innovation Catalyst:** Facilitates rapid prototyping and idea generation, accelerating innovation in various industries.
- **Accessibility:** Helps democratize content creation by providing tools that require less specialized skill or knowledge.

**Cons:**
- **Quality and Accuracy Issues:** Generated content may contain errors, biases, or misleading information, requiring careful review.
- **Ethical Concerns:** Risks include misuse for deepfakes, misinf

In [None]:
display(Markdown(response.content))

**Report on Pros and Cons of Generative AI**

**Pros:**
- **Creativity Enhancement:** Generative AI can produce original content such as text, images, music, and designs, aiding creative professionals and hobbyists.
- **Efficiency and Automation:** It automates content creation tasks, saving time and reducing human effort in areas like writing, coding, and design.
- **Personalization:** Enables highly personalized user experiences by generating tailored recommendations, messages, or products.
- **Innovation Catalyst:** Facilitates rapid prototyping and idea generation, accelerating innovation in various industries.
- **Accessibility:** Helps democratize content creation by providing tools that require less specialized skill or knowledge.

**Cons:**
- **Quality and Accuracy Issues:** Generated content may contain errors, biases, or misleading information, requiring careful review.
- **Ethical Concerns:** Risks include misuse for deepfakes, misinformation, plagiarism, and intellectual property violations.
- **Job Displacement:** Automation of creative and repetitive tasks may threaten certain jobs, leading to economic and social challenges.
- **Dependence and De-skilling:** Overreliance on AI tools might reduce human skills and critical thinking over time.
- **Resource Intensive:** Training and running generative AI models require significant computational power and energy, impacting sustainability.

## Example 6: Streaming with `.stream()` - Token-by-Token

When you call `.stream()`, it returns a **generator** that yields response chunks as they arrive.

In [None]:
# Same model and prompt as above
prompt = "Write a short report on pros and cons of Generative AI with 3-5 bullets per topic."

print("Using .stream() - Token-by-token streaming:")
print("=" * 50)

full_response = ""
# Stream the response
# Each chunk contains a small piece of the response
for chunk in model.stream(prompt):
    # Print each chunk as it arrives (without newline)
    print(chunk.content, end="", flush=True)
    full_response += chunk.content

print()  # Add newline at the end
print("=" * 50)
print("‚úÖ Streaming complete")

Using .stream() - Token-by-token streaming:
**Report on Pros and Cons of Generative AI**

**Pros:**
- **Creativity Enhancement:** Generative AI can produce original content such as text, images, music, and code, aiding creative professionals and hobbyists.
- **Efficiency and Automation:** It automates repetitive tasks like content creation, data synthesis, and design, saving time and reducing human effort.
- **Personalization:** Enables highly personalized user experiences by generating tailored recommendations, responses, and products.
- **Innovation Catalyst:** Facilitates rapid prototyping and idea generation, accelerating innovation across industries.
- **Accessibility:** Helps democratize content creation by providing tools that require less specialized skill or knowledge.

**Cons:**
- **Quality and Accuracy Issues:** Generated content may contain errors, biases, or misleading information, requiring careful review.
- **Ethical Concerns:** Risks include misuse for deepfakes, misinf

In [None]:
display(Markdown(full_response))

**Report on Pros and Cons of Generative AI**

**Pros:**
- **Creativity Enhancement:** Generative AI can produce original content such as text, images, music, and code, aiding creative professionals and hobbyists.
- **Efficiency and Automation:** It automates repetitive tasks like content creation, data synthesis, and design, saving time and reducing human effort.
- **Personalization:** Enables highly personalized user experiences by generating tailored recommendations, responses, and products.
- **Innovation Catalyst:** Facilitates rapid prototyping and idea generation, accelerating innovation across industries.
- **Accessibility:** Helps democratize content creation by providing tools that require less specialized skill or knowledge.

**Cons:**
- **Quality and Accuracy Issues:** Generated content may contain errors, biases, or misleading information, requiring careful review.
- **Ethical Concerns:** Risks include misuse for deepfakes, misinformation, plagiarism, and intellectual property violations.
- **Job Displacement:** Automation of creative and routine tasks may threaten certain jobs, leading to economic and social challenges.
- **Dependence and De-skilling:** Overreliance on AI tools might reduce human skills and critical thinking over time.
- **Resource Intensive:** Training and running generative models demand significant computational power and energy, impacting sustainability.

In [None]:
from IPython.display import display, Markdown, clear_output, HTML

prompt = "Write a short report on pros and cons of Generative AI with 3-5 bullets per topic."

full_response = ""
markdown_display = display(Markdown("*Generating response...*"), display_id=True)

for chunk in model.stream(prompt):
    full_response += chunk.content
    # Update the same display object
    markdown_display.update(Markdown(full_response + " ‚ñå"))  # ‚ñå is a cursor effect

# Remove cursor at the end
markdown_display.update(Markdown(full_response))
print("‚úÖ Streaming complete")

**Report on Pros and Cons of Generative AI**

**Pros:**
- **Creativity Enhancement:** Generative AI can produce original content such as text, images, music, and designs, aiding creative professionals and hobbyists.
- **Efficiency and Automation:** It automates content creation, reducing time and effort required for tasks like writing, coding, and graphic design.
- **Personalization:** Enables highly personalized user experiences by generating tailored recommendations, messages, or products.
- **Innovation Support:** Facilitates rapid prototyping and idea generation in various fields including marketing, entertainment, and research.
- **Accessibility:** Helps individuals with disabilities by generating assistive content, such as text-to-speech or image descriptions.

**Cons:**
- **Quality and Accuracy Issues:** Generated content may contain errors, biases, or misleading information, requiring human oversight.
- **Ethical Concerns:** Risks include misuse for deepfakes, misinformation, plagiarism, and intellectual property violations.
- **Job Displacement:** Automation of creative and routine tasks may threaten certain jobs, leading to economic and social challenges.
- **Dependence and De-skilling:** Overreliance on AI tools might reduce human skills and critical thinking over time.
- **Resource Intensive:** Training and running generative AI models require significant computational power and energy, impacting sustainability.

‚úÖ Streaming complete


In [None]:
display(Markdown(full_response))

**Report on Pros and Cons of Generative AI**

**Pros:**
- **Creativity Enhancement:** Generative AI can produce original content such as text, images, music, and designs, aiding creative professionals and hobbyists.
- **Efficiency and Automation:** It automates content creation, reducing time and effort required for tasks like writing, coding, and graphic design.
- **Personalization:** Enables highly personalized user experiences by generating tailored recommendations, messages, or products.
- **Innovation Support:** Facilitates rapid prototyping and idea generation in various fields including marketing, entertainment, and research.
- **Accessibility:** Helps individuals with disabilities by generating assistive content, such as text-to-speech or image descriptions.

**Cons:**
- **Quality and Accuracy Issues:** Generated content may contain errors, biases, or misleading information, requiring human oversight.
- **Ethical Concerns:** Risks include misuse for deepfakes, misinformation, plagiarism, and intellectual property violations.
- **Job Displacement:** Automation of creative and routine tasks may threaten certain jobs, leading to economic and social challenges.
- **Dependence and De-skilling:** Overreliance on AI tools might reduce human skills and critical thinking over time.
- **Resource Intensive:** Training and running generative AI models require significant computational power and energy, impacting sustainability.

### Understanding the Streaming Output

In the streaming example above:

- `model.stream(prompt)` returns a **generator**
- Each iteration yields a **chunk** (small piece of the response)
- `chunk.content` contains the text for that chunk
- `end=""` prevents Python from adding newlines after each chunk
- `flush=True` ensures immediate display of each chunk

---

# Summary

## Key Takeaways

### 1. **Two Ways to Initialize Models**

```python
# Method 1: init_chat_model (flexible, universal)
from langchain.chat_models import init_chat_model
model = init_chat_model("gpt-4.1-mini", model_provider="openai")

# Method 2: Provider-specific (more control)
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4.1-mini")
```

### 2. **Simple Prompting**

```python
# Get full response
response = model.invoke("Your prompt here")
print(response.content)
```

### 3. **Streaming Responses**

```python
# Stream token-by-token
for chunk in model.stream("Your prompt here"):
    print(chunk.content, end="", flush=True)
```



Now that you understand LLM connections, you can explore:

- **Prompt Templates** - Create reusable, parameterized prompts
- **Tool Use** - Give LLMs access to external tools/APIs
- **RAG (Retrieval Augmented Generation)** - Connect LLMs to your data
- **Agents** - Build autonomous AI systems

Happy building! üöÄ