<a href="https://colab.research.google.com/github/fabriciosantana/mcdia/blob/main/05-iag/nemotron_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NVIDIA Nemotron - Exemplos de Uso

O **Nemotron** √© uma fam√≠lia de modelos da NVIDIA com arquitetura h√≠brida Mamba-Transformer MoE.

Este notebook mostra como usar o Nemotron via NVIDIA NIM API.

## 1. Configurar API Key


In [2]:
import os
from google.colab import userdata

def get_nvidia_api_key():
    key = None
    if key:
        key = os.environ.get("NVIDIA_API_KEY")
    else:
        key = userdata.get('NVIDIA_API_KEY')

    return key

NVIDIA_API_KEY = get_nvidia_api_key()

if NVIDIA_API_KEY:
    print(f"‚úÖ NVIDIA_API_KEY carregada: {NVIDIA_API_KEY[:15]}...")
else:
    print("‚ö†Ô∏è NVIDIA_API_KEY n√£o encontrada. Defina em ~/.bashrc ou manualmente:")
    print('os.environ["NVIDIA_API_KEY"] = "sua_chave_aqui"')

‚úÖ NVIDIA_API_KEY carregada: nvapi-8tdO1NjXD...


## 2. Testar conex√£o com NVIDIA API

In [None]:
import requests

invoke_url = "https://integrate.api.nvidia.com/v1/chat/completions"

headers = {
    "Authorization": f"Bearer {NVIDIA_API_KEY}",
    "Content-Type": "application/json"
}

payload = {
    "model": "nvidia/nemotron-3-nano-30b-a3b",
    "messages": [
        {"role": "user", "content": "Explique o que √© Deep Learning em 3 frases."}
    ],
    "temperature": 0.7,
    "max_tokens": 200
}

response = requests.post(invoke_url, headers=headers, json=payload)

if response.status_code == 200:
    result = response.json()
    print("‚úÖ Resposta do Nemotron:")
    print(result["choices"][0]["message"]["content"])
else:
    print(f"‚ùå Erro {response.status_code}: {response.text}")

## 3. Usando a biblioteca OpenAI (compat√≠vel)

In [None]:
# pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="https://integrate.api.nvidia.com/v1",
    api_key=NVIDIA_API_KEY
)

response = client.chat.completions.create(
    model="nvidia/nemotron-3-nano-30b-a3b",
    messages=[
        {"role": "user", "content": "Qual a diferen√ßa entre RNN e Transformer?"}
    ],
    temperature=0.7,
    max_tokens=300
)

print(response.choices[0].message.content)

## 4. Streaming (resposta em tempo real)

In [None]:
print("ü§ñ Nemotron: ", end="")

for chunk in client.chat.completions.create(
    model="nvidia/nemotron-3-nano-30b-a3b",
    messages=[{"role": "user", "content": "Liste 3 aplica√ß√µes pr√°ticas de LLMs."}],
    temperature=0.7,
    max_tokens=200,
    stream=True
):
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()

## 5. Chat com contexto (m√∫ltiplas mensagens)

In [None]:
messages = [
    {"role": "system", "content": "Voc√™ √© um especialista em IA. Seja t√©cnico e conciso."},
    {"role": "user", "content": "O que √© aten√ß√£o em transformers?"},
]

response = client.chat.completions.create(
    model="nvidia/nemotron-3-nano-30b-a3b",
    messages=messages,
    temperature=0.7
)

print(response.choices[0].message.content)

## 6. Par√¢metros avan√ßados

In [None]:
response = client.chat.completions.create(
    model="nvidia/nemotron-3-nano-30b-a3b",
    messages=[{"role": "user", "content": "Gere um t√≠tulo criativo para um artigo sobre IA."}],
    temperature=0.9,
    max_tokens=50,
    top_p=0.9
)

print(response.choices[0].message.content)

## Modelos Nemotron Dispon√≠veis

| Modelo | Tamanho | Descri√ß√£o |
|--------|---------|----------|
| `nvidia/nemotron-3-nano-30b-a3b` | 30B (3.5B ativos) | MoE h√≠brido, reasoning |
| `nvidia/nemotron-4-340b-instruct` | 340B | Modelo grande para tarefas complexas |

### Caracter√≠sticas:
- **Arquitetura**: Mamba2-Transformer Hybrid MoE
- **Idiomas**: Ingl√™s, Alem√£o, Espanhol, Franc√™s, Italiano, Japon√™s
- **Capacidades**: Reasoning, tool calling, chat, c√≥digo