# NIM Workshop Setup Guide

Welcome to the NVIDIA NIM Workshop! This notebook will help you set up everything needed for working with **Llama 3.1 8B Instruct**.

## 📋 Quick Start

1. **Run cell 1**: Set up your API keys
2. **Run cell 2**: Check prerequisites  
3. **Run cell 3**: Download Llama 3.1 8B model
4. **Run cell 4**: Verify setup

## 🤖 Model Information

**Llama 3.1 8B Instruct** (~15GB)
- Standard NeMo format compatible with training scripts

## 🛠️ Prerequisites

- **NGC Account**: Free account at [ngc.nvidia.com](https://ngc.nvidia.com)
- **NGC API Key**: Generate at [ngc.nvidia.com/setup/api-key](https://ngc.nvidia.com/setup/api-key)
- **NVIDIA API Key**: For cloud NIMs from [build.nvidia.com](https://build.nvidia.com)
- **Docker**: For local NIM deployment

## Step 1: Set Up Your API Keys

You'll need two API keys for this workshop:

1. **NGC API Key** - To download the model
2. **NVIDIA API Key** - To use cloud-hosted NIMs

Run the cell below and enter your keys when prompted:


In [1]:
import os
import getpass

print("🔐 API Key Setup\n")

# Get NGC API Key
print("Enter your NGC API Key (for model downloads):")
print("Get one at: https://ngc.nvidia.com/setup/api-key")
ngc_key = getpass.getpass("NGC API Key: ")

# Get NVIDIA API Key for cloud NIMs
print("\nEnter your NVIDIA API Key (for cloud NIMs):")
print("Get one at: https://build.nvidia.com")
nvidia_key = getpass.getpass("NVIDIA API Key: ")

# Save to environment
os.environ['NGC_API_KEY'] = ngc_key
os.environ['NGC_CLI_API_KEY'] = ngc_key  # New environment variable name
os.environ['NVIDIA_API_KEY'] = nvidia_key

# Save to .env file
with open('.env', 'w') as f:
    f.write(f"NGC_API_KEY={ngc_key}\n")
    f.write(f"NGC_CLI_API_KEY={ngc_key}\n")
    f.write(f"NVIDIA_API_KEY={nvidia_key}\n")

print("\n✅ API keys configured!")


🔐 API Key Setup

Enter your NGC API Key (for model downloads):
Get one at: https://ngc.nvidia.com/setup/api-key

Enter your NVIDIA API Key (for cloud NIMs):
Get one at: https://build.nvidia.com

✅ API keys configured!


## Step 2: Check Prerequisites

Let's verify all required tools are installed:


In [2]:
import subprocess
import shutil

print("🔍 Checking prerequisites...\n")

# 1. Check Docker
try:
    docker_version = subprocess.check_output(['docker', '--version'], text=True).strip()
    print(f"✅ Docker: {docker_version}")
except:
    print("❌ Docker: Not installed - get it from https://docs.docker.com/get-docker/")

# 2. Check/Install NGC CLI
if os.path.exists('ngc-cli/ngc'):
    result = subprocess.run(['./ngc-cli/ngc', '--version'], capture_output=True, text=True)
    if result.returncode == 0:
        print(f"✅ NGC CLI: {result.stdout.strip()}")
    else:
        print("⚠️  NGC CLI found but not working")
else:
    print("📥 Installing NGC CLI...")
    os.system('wget -q https://ngc.nvidia.com/downloads/ngccli_linux.zip')
    os.system('unzip -q ngccli_linux.zip')
    os.system('chmod +x ngc-cli/ngc')
    os.system('rm ngccli_linux.zip')
    print("✅ NGC CLI installed")

# 3. Check GPU (optional)
try:
    gpu = subprocess.check_output(['nvidia-smi', '--query-gpu=name', '--format=csv,noheader'], text=True).strip()
    print(f"✅ GPU: {gpu}")
except:
    print("ℹ️  No GPU detected (you can still use cloud NIMs)")

# 4. Check disk space
free_gb = shutil.disk_usage("/").free // (2**30)
print(f"✅ Disk space: {free_gb} GB free")

print("\n" + "="*50)


🔍 Checking prerequisites...

✅ Docker: Docker version 27.3.1, build ce12230
✅ NGC CLI: NGC CLI 3.160.1
✅ GPU: NVIDIA A100-SXM4-40GB
✅ Disk space: 193 GB free



## Step 3: Download Llama 3.1 8B Model and NIM Container

This will download:
- **Llama 3.1 8B Instruct** (~15 GB) - The model for LoRA fine-tuning
- **NIM Docker Container** - For local deployment

⏱️ Takes 10-30 minutes depending on internet speed

### 📝 Note about NGC CLI Output
The NGC CLI shows detailed progress information. Don't worry about all the progress bars and symbols - just look for:
- `Download status: Completed` - This means success!
- The download summary at the bottom shows total files and size transferred


In [3]:
import subprocess
import glob
import os
import time
import threading

print("📥 Downloading workshop assets...\n")

# Check prerequisites
if not os.environ.get('NGC_API_KEY'):
    print("❌ Please run the API key setup cell first")
else:
    # 1. Download Docker container FIRST (usually faster)
    print("📥 Step 1: Checking Docker container...")
    image = "nvcr.io/nim/meta/llama-3.1-8b-instruct:latest"
    
    # Simple check using docker images
    result = subprocess.run(f"docker images {image} --format '{{{{.Repository}}}}:{{{{.Tag}}}}'", 
                          shell=True, capture_output=True, text=True)
    
    if image in result.stdout:
        print("✅ Docker container already downloaded")
    else:
        print("📥 Pulling NIM Docker container...")
        # Docker login
        os.system(f"echo {os.environ['NGC_API_KEY']} | docker login nvcr.io -u \\$oauthtoken --password-stdin >/dev/null 2>&1")
        
        # Pull container
        result = os.system(f"docker pull {image}")
        if result == 0:
            print("✅ Docker container downloaded")
        else:
            print("⚠️  Container download failed - check Docker and NGC access")
    
    # 2. Download Llama 3.1 8B model
    print("\n📥 Step 2: Checking Llama 3.1 8B model...")
    model_dir = "lora_tutorial/models/llama-3_1-8b-instruct"
    os.makedirs(model_dir, exist_ok=True)
    
    # Check if already downloaded
    nemo_files = glob.glob(f"{model_dir}/**/*.nemo", recursive=True)
    complete_model = None
    
    for nf in nemo_files:
        size_gb = os.path.getsize(nf) / (1024**3)
        if size_gb > 10:  # Full model should be ~15GB
            complete_model = nf
            break
    
    if complete_model:
        print("✅ Model already downloaded!")
        print(f"📂 Location: {complete_model}")
        print(f"💾 Size: {os.path.getsize(complete_model) / (1024**3):.1f} GB")
    else:
        print("📥 Downloading Llama 3.1 8B model (~15 GB)...")
        print("   This will take 10-30 minutes depending on your connection")
        print("   Download starting...\n")
        
        # Set environment variable and run download
        os.environ['NGC_CLI_API_KEY'] = os.environ['NGC_API_KEY']
        
        # Build download command
        cmd = f'cd {model_dir} && ../../../ngc-cli/ngc registry model download-version "nvidia/nemo/llama-3_1-8b-nemo:1.0" --org nvidia'
        
        # Run with subprocess to capture but simplify output
        # Function to show simple progress indicator
        download_complete = False
        def show_progress():
            symbols = ["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"]
            i = 0
            while not download_complete:
                print(f"\r{symbols[i % len(symbols)]} Downloading... (this may take 10-30 minutes)", end="", flush=True)
                time.sleep(0.5)
                i += 1
        
        # Start progress indicator in background
        progress_thread = threading.Thread(target=show_progress)
        progress_thread.start()
        
        # Run download
        process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
        stdout, stderr = process.communicate()
        result = process.returncode
        
        # Stop progress indicator
        download_complete = True
        progress_thread.join()
        print("\r" + " " * 60 + "\r", end="")  # Clear the progress line
        
        if result == 0:
            # Extract summary info from stdout
            if "Download status: Completed" in stdout:
                print("✅ Model download completed successfully!\n")
                # Try to extract useful info
                for line in stdout.split('\n'):
                    if "Total files downloaded:" in line or "Total transferred:" in line:
                        print(f"   {line.strip()}")
            else:
                print("✅ Model download complete!")
            
            # Find the downloaded file
            nemo_files = glob.glob(f"{model_dir}/**/*.nemo", recursive=True)
            if nemo_files:
                print(f"\n📂 Model location: {nemo_files[0]}")
                print(f"💾 Model size: {os.path.getsize(nemo_files[0]) / (1024**3):.1f} GB")
        else:
            print("\n❌ Download failed. Please check:")
            print("   - Your NGC API key is valid")
            print("   - You have internet connectivity")
            print("   - You have enough disk space (need ~15GB)")

print("\n✅ Setup complete!")

📥 Downloading workshop assets...

📥 Step 1: Checking Docker container...
📥 Pulling NIM Docker container...


latest: Pulling from nim/meta/llama-3.1-8b-instruct
8a49af5f9845: Pulling fs layer
5a7e430f96ec: Pulling fs layer
dc8e02c573d5: Pulling fs layer
f59d6eb2c245: Pulling fs layer
69a2b9c38e00: Pulling fs layer
4397f0d4f59f: Pulling fs layer
a86d164c64ba: Pulling fs layer
4e9f661082fc: Pulling fs layer
bec80a6b3a0e: Pulling fs layer
ff6c6c701902: Pulling fs layer
38ed7fbe6701: Pulling fs layer
bdb7462cbb97: Pulling fs layer
3f955f899066: Pulling fs layer
474075394bf2: Pulling fs layer
27a2be24bf5e: Pulling fs layer
fb00c809bb91: Pulling fs layer
22e12f30f4dd: Pulling fs layer
459dee64b7ed: Pulling fs layer
38c04d3213e1: Pulling fs layer
988fc097120c: Pulling fs layer
899a1f93590a: Pulling fs layer
4397f0d4f59f: Waiting
b1fa02a3ce23: Pulling fs layer
38ed7fbe6701: Waiting
ed2758f68444: Pulling fs layer
5cce4eb8ba0e: Pulling fs layer
a86d164c64ba: Waiting
a06c0a98fdc6: Pulling fs layer
bdb7462cbb97: Waiting
6a2923762427: Pulling fs layer
4f4fb700ef54: Pulling fs layer
4e9f661082fc: Waiting
3

## Step 4: Verify Setup

Let's make sure everything is ready for the workshop:


In [4]:
print("🔍 Verifying setup...\n")

# Quick checks - check if any .nemo file exists in the model directory or subdirectories
import glob
nemo_files = glob.glob("lora_tutorial/models/llama-3_1-8b-instruct/**/*.nemo", recursive=True)
# Check if we have a complete model (>10GB)
model_downloaded = False
for nf in nemo_files:
    if os.path.getsize(nf) / (1024**3) > 10:
        model_downloaded = True
        break

checks = {
    "Model downloaded": model_downloaded,
    "Docker container": bool(subprocess.run(['docker', 'images', '-q', 'nvcr.io/nim/meta/llama-3.1-8b-instruct:latest'],
                                       capture_output=True, text=True).stdout.strip()),
    "NGC API Key": bool(os.environ.get('NGC_API_KEY')),
    "NVIDIA API Key": bool(os.environ.get('NVIDIA_API_KEY'))
}

# Print results
for item, status in checks.items():
    print(f"{'✅' if status else '❌'} {item}")

# Test cloud API connection
try:
    import requests
    headers = {"Authorization": f"Bearer {os.environ.get('NVIDIA_API_KEY', '')}"}
    response = requests.get("https://integrate.api.nvidia.com/v1/models", headers=headers, timeout=5)
    print(f"\n📡 Cloud API: {'✅ Connected' if response.status_code == 200 else f'⚠️  Status {response.status_code}'}")
except:
    print("\n📡 Cloud API: ⚠️  Could not test connection")

# Summary
if all(checks.values()):
    print("\n🎉 All set! You're ready for the NIM workshop!")
    # Find the actual model file (look in subdirectories)
    complete_models = [nf for nf in nemo_files if os.path.getsize(nf) / (1024**3) > 10]
    if complete_models:
        print(f"\n📂 Model location: {complete_models[0]}")
        print(f"💾 Model size: {os.path.getsize(complete_models[0]) / (1024**3):.1f} GB")
    else:
        print("\n📂 Model location: Not found - please run the download cell")
    print("🐳 Container: nvcr.io/nim/meta/llama-3.1-8b-instruct:latest")
else:
    print("\n⚠️  Some components missing - please check above")
    
# Create data directory for later use
os.makedirs("lora_tutorial/data", exist_ok=True)


🔍 Verifying setup...

✅ Model downloaded
✅ Docker container
✅ NGC API Key
✅ NVIDIA API Key

📡 Cloud API: ✅ Connected

🎉 All set! You're ready for the NIM workshop!

📂 Model location: lora_tutorial/models/llama-3_1-8b-instruct/llama-3_1-8b-nemo_v1.0/llama3_1_8b.nemo
💾 Model size: 15.0 GB
🐳 Container: nvcr.io/nim/meta/llama-3.1-8b-instruct:latest


## 🎯 Next Steps

### Model Information
- **Model**: Llama 3.1 8B Instruct 
- **Format**: Standard NeMo checkpoint (`.nemo` file)
- **Location**: `lora_tutorial/models/llama-3_1-8b-instruct/*.nemo` (exact filename depends on download)

### What's Next?
1. **01_NIM_API_Tutorial.ipynb** - Learn to use cloud-hosted NIMs
2. **02_Local_NIM_Deployment.ipynb** - Deploy NIMs locally with Docker
3. **03_LoRA_Training.ipynb** - Fine-tune the model with LoRA
4. **04_Deploy_LoRA_with_NIM.ipynb** - Deploy your fine-tuned model

### Troubleshooting

**If download fails:**
- Verify your NGC API key is correct
- Check your internet connection
- Try running the download cell again (downloads can be resumed)

**Docker issues:**
- Make sure Docker daemon is running
- On Linux: `sudo systemctl start docker`
- Test with: `docker run hello-world`

**Model format:**
The Llama 3.1 model uses standard NeMo format:
- Single `.nemo` file containing all weights and configuration
- Compatible with NeMo training scripts without modifications

---

**Ready to start?** Open `01_NIM_API_Tutorial.ipynb` to begin the workshop
