# Chapter 7: Gemini API Integration

## Going Straight to the Source

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sanjaynegi309/agentic-ai-course/blob/main/notebooks/07-Gemini-API-Integration.ipynb)

Frameworks like LangChain and CrewAI are fantastic for productivity. But to truly master agentic AI, it's essential to understand how to interact with the core models directly. In this chapter, we'll peel back the layers of abstraction and work directly with the **Google Gemini API** to unlock its most powerful features.

## 🤔 Why Go Native?

Using a model's native API offers several advantages:

- **Direct Access:** Get immediate access to the latest features, models, and updates without waiting for a framework to support them.
- **Maximum Performance:** Reduce potential latency by removing framework overhead.
- **Simplicity:** For straightforward tasks, a direct API call is often simpler and requires less code than building a full chain.
- **Advanced Features:** Directly control powerful features like streaming, function calling, and multimodal inputs.

In [None]:
# Step 1: Install and Setup
!pip install google-generativeai python-dotenv Pillow requests

import os
from dotenv import load_dotenv
if not load_dotenv():
    try:
        from google.colab import userdata
        os.environ['GEMINI_API_KEY'] = userdata.get('GEMINI_API_KEY')
    except ImportError:
        print("Could not load API keys.")

### Part 1: Your First Gemini API Call

Let's start with the most basic operation: generating text from a prompt.

In [None]:
# Step 2: Basic Text Generation
import google.generativeai as genai

genai.configure(api_key=os.environ['GEMINI_API_KEY'])

# Choose the model
model = genai.GenerativeModel('gemini-pro')

# Generate content
response = model.generate_content("Explain the concept of an AI agent in one sentence.")

print(response.text)

### Part 2: Real-Time Responses with Streaming

Waiting for the full response can make your application feel slow. With streaming, you can receive the response token-by-token, creating a much more interactive and real-time experience.

In [None]:
# Step 3: Streaming Content
response_stream = model.generate_content(
    "Write a short story about a robot who discovers music.", 
    stream=True
)

print("--- Streaming Story ---")
for chunk in response_stream:
    print(chunk.text, end="")

### Part 3: Multimodality - The 'Vision' in 'Pro Vision'

This is where Gemini truly shines. We can send both images and text in a single prompt to the `gemini-pro-vision` model. Let's give it an image and ask a question about it.

In [None]:
# Step 4: Multimodal Input (Text and Image)
import requests
from PIL import Image
from io import BytesIO

# Fetch an image from a URL
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a8/Tour_Eiffel_Wikimedia_Commons.jpg/800px-Tour_Eiffel_Wikimedia_Commons.jpg"
response = requests.get(image_url)
img = Image.open(BytesIO(response.content))

# Choose the vision model
vision_model = genai.GenerativeModel('gemini-pro-vision')

# Create the multimodal prompt
prompt_parts = [
    "What is this famous landmark and in what city is it located?",
    img
]

# Generate content
vision_response = vision_model.generate_content(prompt_parts)

print(vision_response.text)
display(img) # Display the image in the notebook

## ✅ When to Use the Native API vs. a Framework

| Scenario | Recommendation | Why? |
|---|---|---|
| Building a complex, multi-step agent with several tools. | **Framework (LangChain/LangGraph)** | Frameworks excel at managing state, orchestrating components, and abstracting complexity. |
| Need the absolute latest model feature (e.g., a new API parameter). | **Native API** | Frameworks may lag behind in implementing the newest, most niche features. |
| Creating a simple chatbot that requires fast, streaming responses. | **Native API** | A direct streaming call is lightweight and highly performant. |
| Prototyping a new agent idea quickly. | **Framework (CrewAI/LangChain)** | High-level abstractions get you up and running faster. |

In the next chapter, we'll explore a forward-looking concept for agent-to-agent communication: the **Model Context Protocol (MCP)**. We'll learn what it is and even build a simple local server to see it in action.