# üîë Setting Up Your API Key (IMPORTANT!)

Before running this notebook, you need to set up your Google Gemini API key securely.

## üìù Step-by-Step Setup

### 1. Get Your API Key
Visit [Google AI Studio](https://aistudio.google.com/app/apikey) and create a free API key.

### 2. Create a `.env` File
In your project root directory (same folder as this notebook), create a file named `.env`:

```bash
# In terminal (Linux/Mac):
touch .env

# Or in Windows (PowerShell):
New-Item .env
```

### 3. Add Your API Key to `.env`
Open the `.env` file and add:

```
GOOGLE_API_KEY=your_actual_api_key_here
```

Replace `your_actual_api_key_here` with your real API key!

### 4. Verify `.env` is Protected
Make sure `.env` is in your `.gitignore` file (already done for you!) so it never gets pushed to GitHub.

---

## üîí Security Best Practices

‚úÖ **DO:**
- Keep your API key in `.env` file
- Add `.env` to `.gitignore`
- Use environment variables in code

‚ùå **DON'T:**
- Hard-code API keys in notebooks
- Commit `.env` to Git
- Share API keys publicly

---

## üí° For Google Colab Users

If you're using Google Colab, you can use Colab's built-in secrets feature:

```python
from google.colab import userdata
google_api_key = userdata.get('GOOGLE_API_KEY')
```

Or temporarily set it in the notebook (for learning only):
```python
import os
os.environ['GOOGLE_API_KEY'] = 'your_key_here'  # Not recommended for production!
```

---

Now run the cell below to load your API key and create your first LLM! üöÄ


In [None]:
import os
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI

# Load environment variables from .env file
load_dotenv()

# Get API key from environment variable
google_api_key = os.getenv("GOOGLE_API_KEY")

# Check if API key is loaded
if not google_api_key:
    raise ValueError(
        "‚ùå GOOGLE_API_KEY not found!\n"
        "Please create a .env file in the project root and add:\n"
        "GOOGLE_API_KEY=your_actual_api_key_here\n\n"
        "Get your API key at: https://aistudio.google.com/app/apikey"
    )

llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    api_key=google_api_key,
)
output = llm.invoke("Hello, world!")

# Understanding ChatGoogleGenerativeAI Parameters

When we create an AI chatbot using Google's Gemini model, we need to configure it with several parameters. Let's understand each one in simple terms:

---

## ü§ñ Core Parameters

### 1. `model="gemini-2.0-flash"`
**What it is:** The specific AI model we want to use  
**Simple explanation:** Think of this like choosing which "brain" the AI should use. Different models have different capabilities and speeds.
- `gemini-2.0-flash` is a fast and efficient model from Google
- Other options: `gemini-2.5-pro`, `gemini-2.5-flash`, etc.

**Example:** Just like you might choose between a calculator or a computer for math - both work, but have different speeds and capabilities!

---

### 2. `temperature=0`
**What it is:** Controls how creative or random the AI's responses are  
**Range:** 0.0 to 2.0  
**Simple explanation:** 
- **`temperature=0`** ‚Üí Very predictable, same answer every time (like a calculator)
- **`temperature=1`** ‚Üí Balanced, some creativity
- **`temperature=2`** ‚Üí Very creative, different answers each time (like a creative writer)

**Example:**
```python
# temperature=0: "Hello! How can I help you?"
# temperature=1: "Hello there! How may I assist you today?"
# temperature=2: "Greetings! What exciting question can I help you explore?"
```

**When to use what:**
- Use **0** for factual answers (math, science, coding)
- Use **0.7-1.0** for creative writing, brainstorming
- Use **1.5-2.0** for maximum creativity (stories, poems)

---

### 3. `max_tokens=None`
**What it is:** Maximum number of words/pieces the AI can generate  
**Simple explanation:** This limits how long the AI's response can be.
- `None` = No limit (AI decides based on the question)
- `100` = Short response (~75 words)
- `1000` = Long response (~750 words)

**Why it matters:**
- Prevents extremely long responses
- Saves API costs (you pay per token)
- Controls response length

**Real-world analogy:** Like telling someone "explain in 100 words or less" vs "explain in detail"

---

### 4. `timeout=None`
**What it is:** How long to wait for the AI to respond (in seconds)  
**Simple explanation:** If the AI takes too long, stop waiting and show an error.
- `None` = Wait forever (not recommended for production)
- `30` = Wait maximum 30 seconds
- `60` = Wait maximum 1 minute

**Example scenario:** If your internet is slow or the AI service is busy, this prevents your program from hanging forever.

---

### 5. `max_retries=2`
**What it is:** How many times to try again if something fails  
**Simple explanation:** If the request fails (network error, server busy), automatically try again.
- `0` = Don't retry, fail immediately
- `2` = Try 2 more times before giving up (total 3 attempts)
- `5` = Try 5 more times (total 6 attempts)

**Real-world example:**
```
1st attempt: ‚ùå Server busy
2nd attempt: ‚ùå Network timeout
3rd attempt: ‚úÖ Success!
```

**Why it's useful:** Makes your code more reliable by handling temporary network issues.

---

### 6. `api_key=google_api_key`
**What it is:** Your personal password to access Google's AI service  
**Simple explanation:** Like a key to unlock the AI service - proves you have permission to use it.

**Important notes:**
- üîí Keep this SECRET! Never share it publicly
- üí∞ Tracks your usage for billing
- üé´ Get it from: [Google AI Studio](https://aistudio.google.com/app/apikey)

**Security tip:** In real projects, store this in environment variables, not directly in code!

---

## üìù Quick Summary Table

| Parameter | Purpose | Common Values |
|-----------|---------|---------------|
| `model` | Which AI brain to use | `"gemini-2.0-flash"`, `"gemini-pro"` |
| `temperature` | Creativity level | `0` (factual) to `2` (creative) |
| `max_tokens` | Response length limit | `None`, `100`, `500`, `1000` |
| `timeout` | Wait time limit | `None`, `30`, `60` (seconds) |
| `max_retries` | Retry attempts | `0`, `2`, `5` |
| `api_key` | Your access key | Your secret key string |

---

## üéØ Best Practices for Students

1. **Start simple:** Use default values (`temperature=0`, `max_retries=2`)
2. **Experiment:** Try different temperatures to see how responses change
3. **Set limits:** Use `max_tokens` to control costs when learning
4. **Be secure:** Never commit API keys to GitHub or share them
5. **Handle errors:** Use `max_retries` to make your code more robust


# üß™ Hands-On Experiments

Now let's see these parameters in action! Run each experiment below to see the differences.

---


## üå°Ô∏è Temperature Experiments

### Experiment 1: Temperature = 0 (Factual & Deterministic)
**Use Case:** Math problems, factual questions, coding help

This will give the same answer every time - perfect for when you need consistency!


In [None]:
# Temperature = 0 - Very Factual
llm_temp_0 = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0,
    api_key=google_api_key
)

prompt = "Write a short greeting message for a new user visiting our website."
response_0 = llm_temp_0.invoke(prompt)
print("üîµ Temperature = 0 (Factual):")
print(response_0.content)
print("\n" + "="*50 + "\n")


### Experiment 2: Temperature = 0.7 (Balanced)
**Use Case:** General conversations, customer support, Q&A

This provides a good balance between consistency and variety.


In [None]:
# Temperature = 0.7 - Balanced
llm_temp_07 = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0.7,
    api_key=google_api_key
)

response_07 = llm_temp_07.invoke(prompt)
print("üü¢ Temperature = 0.7 (Balanced):")
print(response_07.content)
print("\n" + "="*50 + "\n")


### Experiment 3: Temperature = 1.5 (Very Creative)
**Use Case:** Creative writing, brainstorming, story generation

This will give more creative and varied responses!


In [None]:
# Temperature = 1.5 - Very Creative
llm_temp_15 = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=1.5,
    api_key=google_api_key
)

response_15 = llm_temp_15.invoke(prompt)
print("üü° Temperature = 1.5 (Creative):")
print(response_15.content)
print("\n" + "="*50 + "\n")


### Experiment 4: Temperature = 2.0 - Multiple Runs (Showing Randomness)
**Use Case:** Maximum creativity for artistic content, diverse idea generation

Run this cell multiple times to see how different the responses can be!


In [None]:
# Temperature = 2.0 - Maximum Creativity (try running this multiple times!)
llm_temp_20 = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=2.0,
    api_key=google_api_key
)

print("üî¥ Temperature = 2.0 (Maximum Creativity):")
print("Running 3 times to show variety:\n")

for i in range(1, 4):
    response_20 = llm_temp_20.invoke(prompt)
    print(f"Attempt {i}:")
    print(response_20.content)
    print("\n" + "-"*50 + "\n")


---

## üìè Max Tokens Experiments

Now let's see how `max_tokens` controls the length of responses!

**Note:** ~1 token ‚âà 0.75 words, so 100 tokens ‚âà 75 words


### Experiment 5: max_tokens = 50 (Very Short Response)
**Use Case:** Quick answers, notifications, short summaries

Perfect when you need brief, to-the-point responses!


In [None]:
# max_tokens = 50 - Very Short
llm_tokens_50 = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0.7,
    max_tokens=50,
    api_key=google_api_key
)

long_prompt = "Explain what artificial intelligence is and how it's used in everyday life."
response_50 = llm_tokens_50.invoke(long_prompt)

print("üìù Max Tokens = 50 (Very Short):")
print(response_50.content)
print(f"\nüìä Tokens used: {response_50.usage_metadata['output_tokens']}")
print("="*50 + "\n")


### Experiment 6: max_tokens = 200 (Medium Response)
**Use Case:** Explanations, descriptions, short articles

Good balance between detail and brevity.


In [None]:
# max_tokens = 200 - Medium Length
llm_tokens_200 = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0.7,
    max_tokens=200,
    api_key=google_api_key
)

response_200 = llm_tokens_200.invoke(long_prompt)

print("üìÑ Max Tokens = 200 (Medium):")
print(response_200.content)
print(f"\nüìä Tokens used: {response_200.usage_metadata['output_tokens']}")
print("="*50 + "\n")


### Experiment 7: max_tokens = None (No Limit)
**Use Case:** Detailed explanations, essays, comprehensive answers

Let the AI decide how long the response should be based on the question!


In [None]:
# max_tokens = None - No Limit
llm_tokens_none = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0.7,
    max_tokens=None,
    api_key=google_api_key
)

response_none = llm_tokens_none.invoke(long_prompt)

print("üìö Max Tokens = None (No Limit):")
print(response_none.content)
print(f"\nüìä Tokens used: {response_none.usage_metadata['output_tokens']}")
print("="*50 + "\n")


---

## üìä Compare All Max Token Settings Side-by-Side

Run this cell to see all three responses together and compare their lengths!


In [None]:
# Side-by-side comparison
print("="*60)
print("COMPARISON: Response Lengths with Different max_tokens")
print("="*60)

print(f"\nüìù 50 tokens  ‚Üí {response_50.usage_metadata['output_tokens']} tokens used")
print(f"üìÑ 200 tokens ‚Üí {response_200.usage_metadata['output_tokens']} tokens used")
print(f"üìö No limit   ‚Üí {response_none.usage_metadata['output_tokens']} tokens used")

print("\n" + "="*60)
print("\nüí° Key Takeaway:")
print("Lower max_tokens = Shorter, faster, cheaper responses")
print("Higher max_tokens = Longer, more detailed, costlier responses")
print("="*60)


---

## üéì What You Learned from These Experiments

### Temperature Effects:
- **Temperature = 0** ‚Üí Same output every time (deterministic)
- **Temperature = 0.7** ‚Üí Slight variations (balanced)
- **Temperature = 1.5** ‚Üí More creative variations
- **Temperature = 2.0** ‚Üí Maximum creativity, very different each time

### Max Tokens Effects:
- **50 tokens** ‚Üí Very brief, cuts off mid-response (~37 words)
- **200 tokens** ‚Üí Good for short explanations (~150 words)
- **None (no limit)** ‚Üí Full response based on what's needed

### üí° Pro Tips for Your Projects:
1. **For factual tasks** (homework, calculations): Use `temperature=0`
2. **For creative tasks** (stories, brainstorming): Use `temperature=1.0-2.0`
3. **To save costs**: Set appropriate `max_tokens` limits
4. **For production apps**: Always set `timeout` and `max_retries` for reliability

---

**üé¨ Tutorial Tip:** Try modifying the prompts and running the experiments again to see how different inputs affect the outputs!


In [None]:
print(output.content)
print(output.text)


# Difference Between `output.content` vs `output.text`

## Key Differences

### `output.content`
- **Type**: Can be either `str` OR `list[str | dict]`
- **Contains**: The raw, complete content of the message as returned by the LLM
- **Use case**: When you need the full message structure, including multimodal content (text, images, tool calls, etc.)

### `output.text`
- **Type**: Always returns a `str` (specifically a `TextAccessor` that behaves like a string)
- **Contains**: Extracts and concatenates **only the text portions** from the content
- **Use case**: When you only need the text output, especially for multimodal responses

---

## When They're the Same
In simple text-only responses (like our "Hello, world!" example), both return identical values:
```python
output.content == output.text  # True for simple text responses
```

---

## When They're Different

### 1. **Multimodal Responses** (text + images)
```python
# content would be:
[
    {"type": "text", "text": "Here's the image you requested:"},
    {"type": "image_url", "url": "https://..."}
]

# text extracts only:
"Here's the image you requested:"
```

### 2. **Tool Calls / Function Calling**
```python
# content might be:
[
    {"type": "text", "text": "Let me check that for you"},
    {"type": "tool_use", "name": "search", "args": {...}}
]

# text extracts only:
"Let me check that for you"
```

### 3. **Multiple Text Blocks**
```python
# content could be:
[
    "First paragraph",
    {"type": "text", "text": "Second paragraph"},
    "Third paragraph"
]

# text joins them:
"First paragraphSecond paragraphThird paragraph"
```

---

## Recommendation
- ‚úÖ Use **`.text`** for most cases - it's safe and always gives you a string
- ‚úÖ Use **`.content`** when you need to access non-text elements (images, tool calls, structured data)
- ‚ÑπÔ∏è In simple text-only responses, they're functionally identical


# üí∞ Understanding Usage Metadata - Tracking API Costs

When you make a request to the AI, it returns `usage_metadata` that tells you exactly how many tokens were used. This is **super important** for understanding costs!

Let's examine what each field means:


## üìä Breaking Down the Usage Metadata

Here's what you'll see when you print `output.usage_metadata`:

```python
{
    'input_tokens': 4,
    'output_tokens': 11,
    'total_tokens': 15,
    'input_token_details': {'cache_read': 0}
}
```

Let's understand each field:

---

### 1. `input_tokens` (Number of tokens in your question)

**What it is:** The number of tokens in the message **you sent** to the AI

**Example:** When you send "Hello, world!" ‚Üí This gets broken into 4 tokens
- Token 1: "Hello"
- Token 2: ","
- Token 3: " world"
- Token 4: "!"

**Why it matters:**
- Longer questions = More input tokens
- You pay for input tokens (though usually cheaper than output tokens)
- Some models have input token limits (e.g., max 8,000 or 32,000 tokens)

**Think of it as:** The "question length" counter

---

### 2. `output_tokens` (Number of tokens in the AI's response)

**What it is:** The number of tokens in the response **the AI generated**

**Example:** AI responds: "Hello there! How can I help you today?" ‚Üí 11 tokens

**Why it matters:**
- This is what you **pay the most** for (output tokens cost more than input)
- Longer responses = Higher costs
- This is what `max_tokens` parameter controls

**üí° Cost Tip:** If you want to save money, use `max_tokens` to limit the response length!

**Think of it as:** The "answer length" counter

---

### 3. `total_tokens` (Total tokens used)

**What it is:** Simply `input_tokens + output_tokens`

**Formula:** `total_tokens = input_tokens + output_tokens`

**Example:** 4 (input) + 11 (output) = 15 (total)

**Why it matters:**
- Quick way to see overall usage
- Some API pricing is based on total tokens
- Helps track your monthly usage limits

**Think of it as:** The "complete conversation size" counter

---

### 4. `input_token_details` (Advanced tracking)

**What it is:** Extra information about the input tokens

**Structure:**
```python
'input_token_details': {
    'cache_read': 0  # Tokens read from cache
}
```

#### `cache_read` (Cached tokens - Advanced Feature)

**What it is:** Number of tokens that were retrieved from the cache instead of being processed again

**How it works:**
- If you send the **same prompt multiple times**, Google might cache it
- Cached tokens are **cheaper** or sometimes **free**!
- `cache_read: 0` means no caching happened (first time asking)
- `cache_read: 50` would mean 50 tokens were reused from cache

**Example scenario:**
```python
# First request - no cache
output1 = llm.invoke("What is Python?")
# input_token_details: {'cache_read': 0}

# Second request with same context - might use cache
output2 = llm.invoke("What is Python?")  
# input_token_details: {'cache_read': 4}  # Reused from cache!
```

**Think of it as:** The "money saved by recycling" counter

---

## üíµ Real-World Pricing Example

Let's say Google charges (hypothetical rates):
- Input tokens: $0.01 per 1,000 tokens
- Output tokens: $0.03 per 1,000 tokens

For our example:
```python
{
    'input_tokens': 4,
    'output_tokens': 11,
    'total_tokens': 15
}
```

**Cost calculation:**
- Input cost: (4 / 1,000) √ó $0.01 = $0.00004
- Output cost: (11 / 1,000) √ó $0.03 = $0.00033
- **Total cost: $0.00037** (less than a penny!)

But if you make 1,000 requests:
- Total cost: $0.37
- Total cost for 10,000 requests: $3.70

**This is why tracking tokens matters!** üí∞

---

## üìà Practical Tips for Students

### 1. **Monitor Your Usage**
Always check `usage_metadata` to see how many tokens you're using:
```python
response = llm.invoke("Your question here")
print(f"Cost estimate: {response.usage_metadata['total_tokens']} tokens")
```

### 2. **Optimize for Cost**
- Use shorter, clearer prompts (reduces input tokens)
- Set `max_tokens` to limit output length
- Use `temperature=0` for consistent, often shorter responses

### 3. **Watch for Token Limits**
- If your prompt + response > model limit, you'll get an error
- Example: 8K token limit means `input_tokens + output_tokens ‚â§ 8,000`

### 4. **Free Tier Management**
Most AI services give free tokens per month:
- Google Gemini: Often 50-100 requests/day free
- Track your daily usage to stay within limits

---

## üßÆ Quick Token Rules of Thumb

| Text Length | Approximate Tokens |
|-------------|-------------------|
| 1 word | ~1-2 tokens |
| 1 sentence (10 words) | ~13-15 tokens |
| 1 paragraph (100 words) | ~130-150 tokens |
| 1 page (500 words) | ~650-750 tokens |

**Remember:** Tokens ‚â† Words! Punctuation, spaces, and special characters count too!

---

## üéØ Try This Exercise!

Run the next cell to see the usage metadata for our "Hello, world!" example:


In [None]:
print(output.usage_metadata)

---

## üîç Let's Analyze This Output!

From the output above, we can see:

1. **Input Tokens = 4**
   - Our prompt was: "Hello, world!"
   - This simple phrase = 4 tokens

2. **Output Tokens = 11**
   - AI's response: "Hello there! How can I help you today?"
   - This response = 11 tokens

3. **Total Tokens = 15**
   - Total usage: 4 + 11 = 15 tokens

4. **Cache Read = 0**
   - This was a fresh request (not cached)

### üí° Key Insight:
Notice that the AI's response (11 tokens) was **almost 3x longer** than our question (4 tokens)! This is why output tokens cost more - the AI does more work generating responses than processing your input.

---

## üß™ Experiment Idea:

Try running this with different prompts and see how token counts change:

```python
# Short prompt
short = llm.invoke("Hi")

# Medium prompt  
medium = llm.invoke("Can you explain what machine learning is?")

# Long prompt
long = llm.invoke("I need a detailed explanation of how neural networks work, including backpropagation, activation functions, and gradient descent.")

# Compare their usage
print(f"Short: {short.usage_metadata['total_tokens']} tokens")
print(f"Medium: {medium.usage_metadata['total_tokens']} tokens")
print(f"Long: {long.usage_metadata['total_tokens']} tokens")
```

**What do you think will happen?** ü§î
