# Integrating Nemotron Nano 9B v2 with NVIDIA NIM

This notebook demonstrates how to integrate the Nemotron Nano 9B v2 model with NVIDIA NIM for production deployments.

## Prerequisites

- NVIDIA API key from [build.nvidia.com](https://build.nvidia.com/)
- Python 3.8+
- OpenAI Python library

## Installation


In [None]:
# Install required packages with error handling
import sys
import subprocess

def install_package(package_name):
    """Install a package handling various environment scenarios"""
    try:
        # Try using pip module
        subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
        print(f"✅ Successfully installed {package_name} using pip")
    except subprocess.CalledProcessError:
        try:
            # Try ensurepip to bootstrap pip first
            print("⚠️ pip not found, attempting to install pip...")
            subprocess.check_call([sys.executable, "-m", "ensurepip", "--default-pip"])
            subprocess.check_call([sys.executable, "-m", "pip", "install", "--upgrade", "pip"])
            subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
            print(f"✅ Successfully installed pip and {package_name}")
        except:
            try:
                # Try conda as fallback
                subprocess.check_call(["conda", "install", "-y", package_name])
                print(f"✅ Successfully installed {package_name} using conda")
            except:
                print(f"❌ Failed to install {package_name}")
                print(f"Please run manually: pip install {package_name}")
                raise

# Install openai
install_package("openai")


## Setup API Key

Get your API key from [build.nvidia.com](https://build.nvidia.com/) and set it as an environment variable.


In [None]:
import os
from openai import OpenAI

# Set your API key
NVIDIA_API_KEY = "YOUR_API_KEY_HERE"  # Replace with your actual key
# Or use environment variable: os.getenv("NVIDIA_API_KEY")

# Validate API key
if NVIDIA_API_KEY == "YOUR_API_KEY_HERE" or not NVIDIA_API_KEY:
    raise ValueError(
        "❌ ERROR: Please set your NVIDIA API key!\n\n"
        "Steps:\n"
        "1. Go to https://build.nvidia.com/\n"
        "2. Sign in and generate an API key\n"
        "3. Replace 'YOUR_API_KEY_HERE' above with your actual key\n"
        "4. Re-run this cell\n\n"
        "Example: NVIDIA_API_KEY = 'nvapi-xxxxxxxxxxxxx'"
    )

client = OpenAI(
    base_url="https://integrate.api.nvidia.com/v1",
    api_key=NVIDIA_API_KEY
)

print("✅ NVIDIA NIM client initialized!")
print(f"🔑 API key: {NVIDIA_API_KEY[:10]}...{NVIDIA_API_KEY[-4:]}")


## Example 1: Basic Chat Completion

Simple conversational interaction with Nemotron Nano 9B v2.


In [None]:
response = client.chat.completions.create(
    model="nvidia/nemotron-nano-9b-v2",
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    temperature=0.7,
    max_tokens=512
)

print(response.choices[0].message.content)


## Example 2: Streaming Responses

Stream tokens as they are generated for a better user experience.


In [None]:
response = client.chat.completions.create(
    model="nvidia/nemotron-nano-9b-v2",
    messages=[
        {"role": "user", "content": "Write a Python function to calculate Fibonacci numbers."}
    ],
    temperature=0.5,
    max_tokens=1024,
    stream=True
)

print("Streaming response:")
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")


## Example 3: Function Calling

Demonstrate intelligent function calling capabilities.


In [None]:
import json

# Define available functions
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g. San Francisco"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="nvidia/nemotron-nano-9b-v2",
    messages=[
        {"role": "user", "content": "What's the weather like in Tokyo?"}
    ],
    tools=tools,
    tool_choice="auto"
)

# Check if the model wants to call a function
message = response.choices[0].message
if message.tool_calls:
    print("Function call requested:")
    print(f"Function: {message.tool_calls[0].function.name}")
    print(f"Arguments: {message.tool_calls[0].function.arguments}")
else:
    print(message.content)


## Example 4: Multi-turn Conversation

Build context over multiple exchanges.


In [None]:
conversation = [
    {"role": "system", "content": "You are a helpful coding tutor."},
    {"role": "user", "content": "I'm learning Python. What are decorators?"}
]

# First response
response = client.chat.completions.create(
    model="nvidia/nemotron-nano-9b-v2",
    messages=conversation,
    temperature=0.7,
    max_tokens=512
)

assistant_msg = response.choices[0].message.content
conversation.append({"role": "assistant", "content": assistant_msg})
print(f"Assistant: {assistant_msg}\n")

# Follow-up question
conversation.append({"role": "user", "content": "Can you show me a practical example?"})

response = client.chat.completions.create(
    model="nvidia/nemotron-nano-9b-v2",
    messages=conversation,
    temperature=0.7,
    max_tokens=512
)

print(f"Assistant: {response.choices[0].message.content}")


## Production Integration Example

Build a production API server to serve Nemotron Nano 9B v2:

### FastAPI Backend Example

```python
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from openai import OpenAI
import os

app = FastAPI(title="Nemotron Nano API")

# Initialize client
client = OpenAI(
    base_url="https://integrate.api.nvidia.com/v1",
    api_key=os.getenv("NVIDIA_API_KEY")
)

class ChatRequest(BaseModel):
    messages: list
    stream: bool = True
    temperature: float = 0.7
    max_tokens: int = 1024

@app.post("/api/chat")
async def chat(request: ChatRequest):
    """Streaming chat endpoint"""
    try:
        response = client.chat.completions.create(
            model="nvidia/nemotron-nano-9b-v2",
            messages=request.messages,
            temperature=request.temperature,
            max_tokens=request.max_tokens,
            stream=request.stream
        )
        
        if request.stream:
            async def generate():
                for chunk in response:
                    if chunk.choices[0].delta.content:
                        yield f"data: {chunk.choices[0].delta.content}\n\n"
                yield "data: [DONE]\n\n"
            
            return StreamingResponse(generate(), media_type="text/event-stream")
        else:
            return {"content": response.choices[0].message.content}
    
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/health")
async def health():
    """Health check endpoint"""
    return {"status": "healthy", "model": "nemotron-nano-9b-v2"}

# Run with: uvicorn main:app --reload
```

### Deployment Considerations

1. **Rate Limiting**: Implement rate limiting to prevent abuse
2. **Authentication**: Add API key authentication for production
3. **Caching**: Cache common responses to reduce costs
4. **Monitoring**: Add logging and metrics (Prometheus, DataDog)
5. **Load Balancing**: Use NGINX or cloud load balancers for scale

## Resources

- [NVIDIA NIM API Documentation](https://docs.nvidia.com/nim/)
- [OpenAI Python Library](https://github.com/openai/openai-python)
- [FastAPI Documentation](https://fastapi.tiangolo.com/)
- [NVIDIA AI Blueprints](https://github.com/NVIDIA-AI-Blueprints)

## Next Steps

- 📘 Try `Demo_Nemotron_Nano.ipynb` for interactive examples
- 🚀 Deploy your own API endpoint
- 🔧 Customize functions for your use case
- 📚 Explore other NVIDIA models on [build.nvidia.com](https://build.nvidia.com/)

---

**Happy Building with NVIDIA Nemotron!** 🚀
