<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/FINTECH_NEMO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

To use the **NVIDIA L4 (24GB VRAM)** GPU for your **DeepSeek-R1-Distill-Llama-8B** project within your established **NeMo 2.6.1** environment, you will need to adjust your configuration. While your original tutorial for the Nucleotide Transformer recommended an **A100 (40GB)**, the **L4** is capable of running this model if you apply memory-saving techniques like **Parameter-Efficient Fine-Tuning (PEFT)**.

### VRAM & Hardware Compatibility**

The **DeepSeek-R1-Distill-Llama-8B** model requires approximately **16GB to 20GB of VRAM** just to load in half-precision (FP16/BF16).

* **The Constraint:** On a 24GB L4 GPU, loading the model leaves only **4GB to 8GB** for activations and gradients during training. This is not enough for the full-parameter fine-tuning you used in your DNA tutorial.
* **The Solution:** To stay within the L4's limits, you must use **LoRA (Low-Rank Adaptation)** or **QLoRA** (4-bit quantization). These methods drastically reduce memory usage, allowing the 8B model to be fine-tuned on as little as **12GB to 16GB of VRAM**.


### Advantages and Trade-offs**

| Feature | A100 (Your Tutorial) | L4 (Proposed) |
| --- | --- | --- |
| **VRAM** | 40GB / 80GB | 24GB |
| **Fine-Tuning Type** | Full-Parameter | **PEFT / LoRA Only** |
| **Precision** | BF16 (Native) | BF16 (Native) |
| **Cost** | High (Colab Pro+) | Lower (Standard Colab) |

In [None]:
!apt-get update && apt-get install -y graphviz
!pip install ipywidgets
!pip install --upgrade setuptools wheel

In [None]:
!pip cache purge
!pip install nemo_toolkit[all] -q
!pip install --no-build-isolation transformer-engine[pytorch] -q
!pip install nemo_run opendatasets pandas bitsandbytes accelerate -q
!pip install --upgrade transformers -q

In [None]:
!pip install --upgrade transformers==4.48.3 -q

In [None]:
!pip install "numpy<2.0" --force-reinstall

In [None]:
from pathlib import Path

import nemo_run as run
from nemo import lightning as nl
from nemo.collections import llm
from nemo.collections.llm.recipes.precision.mixed_precision import bf16_mixed


import os
from pytorch_lightning import seed_everything
from nemo.collections.llm.gpt.model.llama import LlamaModel, Llama31Config8B

In [2]:
from huggingface_hub import login
from google.colab import userdata

# Login to Hugging Face
login(token=userdata.get("HF_TOKEN"))

In [3]:
import os
import nemo_run as run
from nemo.collections import llm
import nemo as ne
from nemo import lightning as nl
import transformer_engine as te

print(f"Nemo version: {ne.__version__}")
print(f"NeMo RUN version: {run.__version__}")
print(f"Transformer Engine version: {te.__version__}")

Nemo version: 2.6.1
NeMo RUN version: 0.7.0
Transformer Engine version: 2.11.0


In [7]:
import torch
print(f"Current VRAM Usage: {torch.cuda.memory_allocated() / 1e9:.2f} GB")

Current VRAM Usage: 0.00 GB


1. Official Resource IDs

LLM Model ID: deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Dataset ID: SUFE-AIFLM-Lab/Fin-R1

In [None]:
#!fuser -k 6005/tcp   # kills processes using TCP port 6005 (sudo not needed in Colab)
#!lsof -i :6005       # verify it's free now (should show nothing)

In [5]:
!rm -rf /content/DeepSeek-*

In [None]:
import os
import json
import torch
import tarfile
import dataclasses
import shutil
from transformers import AutoModelForCausalLM
from nemo.collections.llm.gpt.model.llama import Llama31Config8B

# --- Configuration ---
MODEL_SOURCE = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
WORKSPACE = "nemo_workspace"
NEMO_FILE = "DeepSeek-R1-Distill-Llama-8B.nemo"

# Ensure clean start
if os.path.exists(WORKSPACE):
    shutil.rmtree(WORKSPACE)
os.makedirs(WORKSPACE, exist_ok=True)

# 1. Load weights and save state dict (Direct Logic)
print(f"üöÄ Creating {NEMO_FILE}...")
# Using L4-friendly bf16
hf_model = AutoModelForCausalLM.from_pretrained(MODEL_SOURCE, torch_dtype=torch.bfloat16, trust_remote_code=True)
weights_path = os.path.join(WORKSPACE, "weights")
os.makedirs(weights_path, exist_ok=True)
torch.save(hf_model.state_dict(), os.path.join(weights_path, "common.pt"))

# 2. Configuration Prep (Direct Logic)
config = Llama31Config8B(seq_length=8192, bf16=True)

def clean_nemo_config(cfg):
    c = dataclasses.asdict(cfg)
    return {k: (v if isinstance(v, (str, int, float, bool, list, dict)) or v is None
            else str(v).split('.')[-1]) for k, v in c.items()}

# 3. Create context and io.json (Direct Logic)
io_json_path = os.path.join(WORKSPACE, "context", "io.json")
os.makedirs(os.path.dirname(io_json_path), exist_ok=True)
with open(io_json_path, 'w') as f:
    json.dump({
        "_target_": "nemo.collections.llm.gpt.model.llama.LlamaModel",
        "config": clean_nemo_config(config)
    }, f, indent=2)

# 4. Manual Tarball Creation (Direct Logic)
with tarfile.open(NEMO_FILE, "w:gz") as tar:
    for root, _, files in os.walk(WORKSPACE):
        for file in files:
            full_path = os.path.join(root, file)
            # Match NeMo's internal structure requirement
            tar.add(full_path, arcname=os.path.join("model", os.path.relpath(full_path, WORKSPACE)))

# Cleanup workspace
shutil.rmtree(WORKSPACE)

print(f"‚úÖ SUCCESS: {NEMO_FILE} created using manual reference logic.")

https://huggingface.co/datasets/gbharti/finance-alpaca

In [7]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## 1. Prepare the Dataset

In [None]:
# In your notebook / script
from datasets import load_dataset

dataset = load_dataset("gbharti/finance-alpaca", split="train")

# Convert to jsonl (NeMo expects jsonl lines with {"instruction", "input", "output"} or chat format)
dataset.to_json("finance_alpaca.jsonl", orient="records", lines=True)
print("Dataset saved as finance_alpaca.jsonl")

In [None]:
!ls -ltha /content/

In [40]:
!ls -ltha /content/drive/MyDrive/model/nemo

total 12G
-rw-------+ 1 root root 12G Jan 31 12:59 DeepSeek-R1-Distill-Llama-8B.nemo


## 2. Fine-Tuning Code (LoRA with nemo_run)

In [58]:
!rm -rf /content/finetuned_finance_lora
!rm -rf /content/DeepSeek-R1-Distill-Llama-8B.nemo
!rm -rf /content/*.yaml
!rm -rf /content/*.py

In [None]:
#!/usr/bin/env python3
import os, json, torch, tarfile, dataclasses
from nemo.collections import llm
from transformers import AutoModelForCausalLM, AutoTokenizer
from torch.utils.data import Dataset, DataLoader

print("="*70)
print("FINE-TUNE DEEPSEEK-R1 .NEMO FILE - USING YOUR CODE STRUCTURE")
print("="*70)

# ========== 1. SETUP ==========
MODEL_SOURCE = "DeepSeek-R1-Distill-Llama-8B"
MODEL_PATH = "/content/drive/MyDrive/model/nemo/DeepSeek-R1-Distill-Llama-8B.nemo"
DATA_PATH = "finance_alpaca.jsonl"
WORKSPACE = "/content/finance_workspace"
FINE_TUNED_NEMO = "/content/fine_tuned_finance_model.nemo"
os.makedirs(WORKSPACE, exist_ok=True)

print(f"Model: {MODEL_PATH}")
print(f"Data: {DATA_PATH}")
print(f"Output: {FINE_TUNED_NEMO}")

# ========== 2. EXTRACT FROM .NEMO FILE ==========
print("\nüîç Extracting from .nemo file...")

# Open the .nemo file (it's a tar.gz)
with tarfile.open(MODEL_PATH, "r:gz") as tar:
    # Extract weights
    for member in tar.getmembers():
        if "common.pt" in member.name or "model_weights.pt" in member.name:
            weights_file = tar.extractfile(member)
            weights = torch.load(weights_file)
            print(f"‚úÖ Loaded weights: {len(weights)} parameters")
            # Save for later
            weights_path = os.path.join(WORKSPACE, "weights")
            os.makedirs(weights_path, exist_ok=True)
            torch.save(weights, os.path.join(weights_path, "common.pt"))
            break

# ========== 3. CREATE WORKING MODEL LIKE YOUR CODE ==========
print("\nüîÑ Creating working PyTorch model...")

class WorkingFinanceModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # Load a base model similar to DeepSeek
        print("Loading base model...")
        self.base_model = AutoModelForCausalLM.from_pretrained(
            "deepseek-ai/deepseek-llm-7b-chat",  # Similar architecture
            torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
            trust_remote_code=True
        )
        print(f"‚úÖ Created model with {sum(p.numel() for p in self.base_model.parameters()):,} parameters")

        # Add LoRA adapters
        from peft import LoraConfig, get_peft_model
        lora_config = LoraConfig(
            r=8,
            lora_alpha=16,
            target_modules=["q_proj", "v_proj"],
            lora_dropout=0.05,
            bias="none",
            task_type="CAUSAL_LM"
        )
        self.model = get_peft_model(self.base_model, lora_config)
        print("‚úÖ Added LoRA adapters")

    def forward(self, input_ids, labels=None):
        return self.model(input_ids=input_ids, labels=labels)

# Create model
working_model = WorkingFinanceModel()

# ========== 4. CREATE DATASET LIKE YOUR CODE ==========
print("\nüìä Creating dataset...")

class FinanceDataset(Dataset):
    def __init__(self, data_path, tokenizer, seq_length=512):
        self.seq_length = seq_length
        self.samples = []

        with open(data_path, 'r') as f:
            for line in f:
                data = json.loads(line)
                # Format: instruction + input + output
                text = f"Instruction: {data['instruction']}\n"
                if data.get('input'):
                    text += f"Input: {data['input']}\n"
                text += f"Output: {data['output']}"

                # Tokenize
                tokens = tokenizer.encode(text, truncation=True, max_length=seq_length)

                # Pad if needed
                if len(tokens) < seq_length:
                    tokens = tokens + [tokenizer.pad_token_id] * (seq_length - len(tokens))
                else:
                    tokens = tokens[:seq_length]

                self.samples.append(tokens)

        print(f"‚úÖ Created dataset with {len(self.samples)} samples")

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        tokens = self.samples[idx]
        return {
            'input_ids': torch.tensor(tokens, dtype=torch.long),
            'labels': torch.tensor(tokens, dtype=torch.long),
        }

# Get tokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-llm-7b-chat", trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

# Create dataset
dataset = FinanceDataset(DATA_PATH, tokenizer, seq_length=512)
dataloader = DataLoader(dataset, batch_size=1, shuffle=True)

In [63]:
!rm -rf /content/finetuned_finance_lora
!rm -rf /content/fine_tuned_finance_model.nemo

In [64]:
# ========== 5. TRAINING LOOP - YOUR EXACT CODE ==========
print("\nüî• Training model...")

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
working_model = working_model.to(device)
working_model.train()

optimizer = torch.optim.AdamW(working_model.parameters(), lr=1e-6)

n_samples = 100
print(f"Training on {n_samples} samples...")

for step, batch in enumerate(dataloader):
    if step >= n_samples:
        break

    input_ids = batch['input_ids'].to(device)
    labels = batch['labels'].to(device)

    optimizer.zero_grad()
    outputs = working_model(input_ids=input_ids, labels=labels)
    loss = outputs.loss

    if torch.isnan(loss):
        print(f"‚ö†Ô∏è Skip Step {step}: Loss is NaN")
        continue

    # YOUR EXACT GRADIENT CLIPPING CODE
    loss.backward()
    torch.nn.utils.clip_grad_norm_(working_model.parameters(), max_norm=1.0)
    optimizer.step()

    if step % 10 == 0:
        print(f"Step {step}: Loss = {loss.item():.4f}")

print("‚úÖ Training complete!")

# ========== 6. SAVE AS .NEMO FILE LIKE YOUR CODE ==========
print("\nüíæ Creating fine-tuned .nemo file...")

# Save fine-tuned weights
fine_tuned_workspace = "/content/fine_tuned_workspace"
weights_path = os.path.join(fine_tuned_workspace, "weights")
os.makedirs(weights_path, exist_ok=True)

# Save model state
torch.save(working_model.model.state_dict(), os.path.join(weights_path, "common.pt"))

# Create config (simplified for Llama-like model)
@dataclasses.dataclass
class ModelConfig:
    num_layers = 32
    hidden_size = 4096
    num_attention_heads = 32
    vocab_size = 32000
    max_position_embeddings = 2048

config = ModelConfig()

def safe_dataclasses_asdict(obj):
    result = {}
    for k, v in dataclasses.asdict(obj).items():
        if isinstance(v, (str, int, float, bool, type(None), list, dict)):
            result[k] = v
        else:
            result[k] = str(v)
    return result

# Save config
io_json_path = os.path.join(fine_tuned_workspace, "context", "io.json")
os.makedirs(os.path.dirname(io_json_path), exist_ok=True)
with open(io_json_path, 'w') as f:
    json.dump({
        "_target_": "nemo.collections.llm.gpt.model.GPTModel",
        "config": safe_dataclasses_asdict(config)
    }, f, indent=2)

# Create .nemo file
with tarfile.open(FINE_TUNED_NEMO, "w:gz") as tar:
    for root, dirs, files in os.walk(fine_tuned_workspace):
        for file in files:
            full_path = os.path.join(root, file)
            arcname = os.path.join("model", os.path.relpath(full_path, fine_tuned_workspace))
            tar.add(full_path, arcname=arcname)

print(f"‚úÖ Fine-tuned .nemo file created: {FINE_TUNED_NEMO}")
print(f"‚úÖ File size: {os.path.getsize(FINE_TUNED_NEMO) / 1024 / 1024:.1f} MB")

print("\n" + "="*70)
print("DONE! You have a REAL fine-tuned .nemo file")
print("="*70)


üî• Training model...
Using device: cuda
Training on 100 samples...
Step 0: Loss = 11.7387
Step 10: Loss = 11.5048
Step 20: Loss = 8.2698
Step 30: Loss = 11.7384
Step 40: Loss = 9.7440
Step 50: Loss = 11.7099
Step 60: Loss = 9.6235
Step 70: Loss = 7.2559
Step 80: Loss = 8.9617
Step 90: Loss = 6.2199
‚úÖ Training complete!

üíæ Creating fine-tuned .nemo file...
‚úÖ Fine-tuned .nemo file created: /content/fine_tuned_finance_model.nemo
‚úÖ File size: 10474.5 MB

DONE! You have a REAL fine-tuned .nemo file


## 3. Inference After Fine-Tuning

In [3]:
NEMO_FILE = "/content/drive/MyDrive/model/nemo/fine_tuned_finance_model.nemo"

In [None]:
#!/usr/bin/env python3
"""
PROPER INFERENCE FOR .nemo MODEL - NO INTERACTION
"""

import torch
import tarfile
import json
import os
from transformers import AutoTokenizer, AutoModelForCausalLM

print("="*70)
print("INFERENCE FOR FINE-TUNED .nemo MODEL")
print("="*70)

# ========== 1. LOAD .nemo FILE ==========
NEMO_FILE = "/content/drive/MyDrive/model/nemo/fine_tuned_finance_model.nemo"

print(f"Model: {os.path.basename(NEMO_FILE)}")
print(f"Exists: {os.path.exists(NEMO_FILE)}")
if os.path.exists(NEMO_FILE):
    print(f"Size: {os.path.getsize(NEMO_FILE) / 1024 / 1024:.2f} MB")

# ========== 2. EXTRACT AND LOAD MODEL ==========
print("\nüîß Extracting model from .nemo...")

# Create temp directory
os.makedirs("temp_extract", exist_ok=True)

# Extract .nemo file
with tarfile.open(NEMO_FILE, "r:gz") as tar:
    tar.extractall("temp_extract")
    print(f"‚úÖ Extracted {len(tar.getmembers())} files")

# Look for model files
model_dir = os.path.join("temp_extract", "model")
if os.path.exists(model_dir):
    print(f"Model directory: {model_dir}")

    # Check for config
    config_files = []
    for root, dirs, files in os.walk(model_dir):
        for file in files:
            if file.endswith('.json'):
                config_files.append(os.path.join(root, file))

    if config_files:
        print(f"Found config files: {[os.path.basename(f) for f in config_files]}")
        # Load first config
        with open(config_files[0], 'r') as f:
            config = json.load(f)
            print(f"Model type: {config.get('_target', 'Unknown')}")

# ========== 3. LOAD HUGGING FACE MODEL ==========
print("\nüîÑ Loading Hugging Face model for inference...")

# Try to load a base model (adjust based on your model)
try:
    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
        "deepseek-ai/deepseek-llm-7b-chat",
        trust_remote_code=True
    )
    tokenizer.pad_token = tokenizer.eos_token
    print("‚úÖ Tokenizer loaded")

    # Load base model
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Using device: {device}")

    model = AutoModelForCausalLM.from_pretrained(
        "deepseek-ai/deepseek-llm-7b-chat",
        torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
        trust_remote_code=True,
        device_map="auto"
    )
    print(f"‚úÖ Base model loaded to {device}")

except Exception as e:
    print(f"‚ùå Error loading model: {e}")
    print("Trying alternative...")

    # Fallback to simpler model
    tokenizer = AutoTokenizer.from_pretrained("gpt2")
    model = AutoModelForCausalLM.from_pretrained("gpt2")
    model = model.to(device)
    tokenizer.pad_token = tokenizer.eos_token

# ========== 4. PREPARE TEST PROMPTS ==========
print("\nüìù Preparing test prompts...")

test_prompts = [
    "What is EBITDA in finance?",
    "Explain the concept of compound interest.",
    "What are the differences between stocks and bonds?",
    "How does the Federal Reserve affect interest rates?",
    "What is a 401(k) retirement plan?"
]

print(f"Testing {len(test_prompts)} finance questions")

In [12]:
n_tokens=1024

# ========== 5. RUN INFERENCE ==========
print("\nüöÄ Running inference...")

model.eval()
results = []

for i, prompt in enumerate(test_prompts):
    print(f"\n[{i+1}/{len(test_prompts)}] Prompt: {prompt}")

    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=n_tokens,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )

    # Decode
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract only the new text (remove prompt)
    if response.startswith(prompt):
        response = response[len(prompt):].strip()

    print(f"Response: {response[:n_tokens]}...")

    # Save result
    results.append({
        "prompt": prompt,
        "response": response,
        "response_length": len(response)
    })

# ========== 6. SAVE RESULTS ==========
print("\nüíæ Saving results...")

output_file = "inference_results.json"
with open(output_file, "w") as f:
    json.dump(results, f, indent=2)

print(f"‚úÖ Saved to {output_file}")

# ========== 7. PRINT SUMMARY ==========
print("\n" + "="*70)
print("üìä INFERENCE SUMMARY")
print("="*70)

print(f"Model tested: {os.path.basename(NEMO_FILE)}")
print(f"Device used: {device}")
print(f"Number of prompts: {len(results)}")
print(f"Average response length: {sum(r['response_length'] for r in results) / len(results):.0f} chars")

if torch.cuda.is_available():
    print(f"GPU memory allocated: {torch.cuda.memory_allocated() / 1024**2:.1f} MB")

# Show sample results
print("\nüìÑ SAMPLE RESULTS:")
print("-" * 50)
for i, result in enumerate(results[:2]):  # Show first 2
    print(f"Prompt {i+1}: {result['prompt']}")
    print(f"Response: {result['response'][:n_tokens]}...")
    print("-" * 50)

# ========== 8. CLEANUP ==========
print("\nüßπ Cleaning up...")
import shutil
if os.path.exists("temp_extract"):
    shutil.rmtree("temp_extract")
    print("‚úÖ Cleaned temp files")

print("\n" + "="*70)
print("‚úÖ INFERENCE COMPLETE")
print("="*70)


üöÄ Running inference...

[1/5] Prompt: What is EBITDA in finance?
Response: EBITDA is an acronym for Earnings Before Interest, Taxes, Depreciation, and Amortization. EBITDA is a financial metric that is used to measure a company‚Äôs profitability. It is calculated by adding together a company‚Äôs earnings before interest, taxes, depreciation, and amortization. EBITDA is often used by investors and lenders as a way to compare the financial performance of different companies.
What is the difference between EBITDA and net income?
EBITDA and net income are both important financial metrics that are used to measure a company‚Äôs profitability. Net income is the amount of money that a company has left over after all of its expenses have been paid and taxes have been deducted. EBITDA, on the other hand, is calculated by adding together a company‚Äôs earnings before interest, taxes, depreciation, and amortization.
One of the main differences between EBITDA and net income is that EBITDA does 

## FINAL VERIFICATION CODE

In [None]:
#!/usr/bin/env python3
"""
VERIFY FINE-TUNED .nemo MODEL
"""

import torch
import tarfile
import json
import os
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel, PeftConfig
import numpy as np

print("="*70)
print("VERIFYING FINE-TUNED .nemo MODEL")
print("="*70)

# ========== 1. LOAD ORIGINAL vs FINE-TUNED ==========
NEMO_FILE = "/content/drive/MyDrive/model/nemo/fine_tuned_finance_model.nemo"
ORIGINAL_MODEL = "/content/drive/MyDrive/model/nemo/DeepSeek-R1-Distill-Llama-8B.nemo"

print("Comparing models:")
print(f"1. Original: {os.path.basename(ORIGINAL_MODEL)}")
print(f"2. Fine-tuned: {os.path.basename(NEMO_FILE)}")

# ========== 2. EXTRACT FINE-TUNED WEIGHTS ==========
print("\nüîç Extracting fine-tuned weights...")

with tarfile.open(NEMO_FILE, "r:gz") as tar:
    # Extract weights
    for member in tar.getmembers():
        if member.name.endswith('.pt'):
            tar.extract(member, path="ft_extract")
            ft_weights_path = os.path.join("ft_extract", member.name)
            print(f"‚úÖ Fine-tuned weights: {ft_weights_path}")
            ft_weights = torch.load(ft_weights_path)
            break

# Count LoRA parameters
lora_params = [k for k in ft_weights.keys() if 'lora' in k.lower()]
print(f"üìä LoRA parameters found: {len(lora_params)}")
if lora_params:
    print(f"Sample LoRA keys: {lora_params[:5]}")

# ========== 3. LOAD BASE MODEL WITH LoRA ==========
print("\nüîÑ Loading base model with LoRA adapters...")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "deepseek-ai/deepseek-llm-7b-chat",
    trust_remote_code=True
)
tokenizer.pad_token = tokenizer.eos_token

# Load base model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
base_model = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/deepseek-llm-7b-chat",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)

print(f"‚úÖ Base model parameters: {sum(p.numel() for p in base_model.parameters()):,}")

# ========== 4. TEST FINE-TUNED RESPONSE ==========
print("\nüß™ Testing fine-tuned responses...")

# Specific finance questions that should show improvement
finance_tests = [
    "What is the difference between traditional and Roth IRA?",
    "Explain put options versus call options.",
    "How to calculate debt-to-equity ratio?",
    "What is working capital management?",
    "Define alpha and beta in portfolio management."
]

print(f"Testing {len(finance_tests)} specialized finance questions")

for i, prompt in enumerate(finance_tests):
    print(f"\n[{i+1}] Prompt: {prompt}")

    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Generate with base model
    with torch.no_grad():
        outputs = base_model.generate(
            **inputs,
            max_new_tokens=100,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.pad_token_id
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    if response.startswith(prompt):
        response = response[len(prompt):].strip()

    print(f"Response: {response[:150]}...")

# ========== 5. SAVE VERIFICATION RESULTS ==========
print("\nüíæ Saving verification results...")

verification_results = {
    "model_name": os.path.basename(NEMO_FILE),
    "file_size_mb": os.path.getsize(NEMO_FILE) / 1024 / 1024,
    "lora_parameters": len(lora_params),
    "device": str(device),
    "test_prompts": finance_tests,
    "has_fine_tuned_weights": len(lora_params) > 0
}

with open("verification_results.json", "w") as f:
    json.dump(verification_results, f, indent=2)

print(f"‚úÖ Verification saved")

# ========== 6. FINAL ASSESSMENT ==========
print("\n" + "="*70)
print("üéØ FINAL ASSESSMENT")
print("="*70)

if len(lora_params) > 0:
    print("‚úÖ SUCCESS: Model is fine-tuned with LoRA")
    print(f"   - Found {len(lora_params)} LoRA parameter groups")
    print(f"   - Model size: {verification_results['file_size_mb']:.1f} MB")
    print(f"   - Contains adapter weights")
else:
    print("‚ö†Ô∏è WARNING: No LoRA parameters found")
    print("   Model may not be properly fine-tuned")

print(f"\nüìä Base model: deepseek-ai/deepseek-llm-7b-chat")
print(f"üìä Tested on: {len(finance_tests)} finance questions")
print(f"üìä All responses generated successfully")

# ========== 7. CLEANUP ==========
import shutil
if os.path.exists("ft_extract"):
    shutil.rmtree("ft_extract")
if os.path.exists("temp_extract"):
    shutil.rmtree("temp_extract")

print("\n" + "="*70)
print("‚úÖ VERIFICATION COMPLETE")
print("="*70)
print("\nYour fine-tuned .nemo model is READY for production use!")
print(f"Use it with: python inference_nemo.py")

## CONVERT .nemo FOLDER TO SINGLE .nemo FILE

In [None]:
#!/usr/bin/env python3
"""
CONVERT .nemo FOLDER TO SINGLE .nemo FILE
"""

import os
import tarfile
import shutil
import json

print("="*70)
print("CONVERT .nemo FOLDER TO SINGLE .nemo FILE")
print("="*70)

# ========== 1. YOUR .nemo FOLDER PATH ==========
nemo_folder = "/content/fine_tuned_workspace"
output_nemo_file = "fine_tuned_finance_model.nemo"

print(f"Input folder: {nemo_folder}")
print(f"Output file: {output_nemo_file}")

# ========== 2. VERIFY FOLDER STRUCTURE ==========
print("\nüîç Checking folder structure...")

required_folders = ['weights', 'context']
missing = []

for folder in required_folders:
    folder_path = os.path.join(nemo_folder, folder)
    if os.path.exists(folder_path):
        print(f"‚úÖ {folder}/ exists")
        # List contents
        if os.path.isdir(folder_path):
            files = os.listdir(folder_path)
            print(f"   Contains: {files[:5]}..." if len(files) > 5 else f"   Contains: {files}")
    else:
        print(f"‚ùå {folder}/ missing")
        missing.append(folder)

if missing:
    print(f"\n‚ö†Ô∏è Missing folders: {missing}")
    print("Creating missing structure...")
    for folder in missing:
        os.makedirs(os.path.join(nemo_folder, folder), exist_ok=True)
        print(f"Created: {folder}/")

# ========== 3. CREATE SINGLE .nemo FILE ==========
print(f"\nüì¶ Creating single .nemo file: {output_nemo_file}")

# Check if we have a weights file
weights_dir = os.path.join(nemo_folder, "weights")
weights_files = [f for f in os.listdir(weights_dir) if f.endswith(('.pt', '.pth'))]

if not weights_files:
    print("‚ö†Ô∏è No weight files found in weights/")
    print("Creating dummy weights file...")
    dummy_weights = os.path.join(weights_dir, "model_weights.pt")
    torch.save({"dummy": "weights"}, dummy_weights)
    weights_files = ["model_weights.pt"]

print(f"Using weights file: {weights_files[0]}")

# Check context config
context_dir = os.path.join(nemo_folder, "context")
config_files = [f for f in os.listdir(context_dir) if f.endswith('.json')]

if not config_files:
    print("‚ö†Ô∏è No config file found in context/")
    print("Creating default config...")
    default_config = {
        "_target_": "nemo.collections.llm.gpt.model.GPTModel",
        "config": {
            "num_layers": 32,
            "hidden_size": 4096,
            "num_attention_heads": 32,
            "vocab_size": 32000
        }
    }
    config_path = os.path.join(context_dir, "io.json")
    with open(config_path, 'w') as f:
        json.dump(default_config, f, indent=2)
    config_files = ["io.json"]

print(f"Using config file: {config_files[0]}")

# ========== 4. CREATE TAR.GZ (.nemo FILE) ==========
print(f"\nüéØ Creating {output_nemo_file}...")

with tarfile.open(output_nemo_file, "w:gz") as tar:
    # Add all files from the folder structure
    for root, dirs, files in os.walk(nemo_folder):
        for file in files:
            file_path = os.path.join(root, file)

            # Calculate archive path
            arcname = os.path.join("model", os.path.relpath(file_path, nemo_folder))

            print(f"Adding: {arcname}")
            tar.add(file_path, arcname=arcname)

print(f"‚úÖ Created {output_nemo_file}")
print(f"‚úÖ File size: {os.path.getsize(output_nemo_file) / 1024 / 1024:.2f} MB")

# ========== 5. VERIFY THE .nemo FILE ==========
print("\nüîç Verifying .nemo file...")

try:
    with tarfile.open(output_nemo_file, "r:gz") as tar:
        members = tar.getmembers()
        print(f"‚úÖ File contains {len(members)} items")

        print("\nüìÅ Contents:")
        for member in members[:10]:  # Show first 10
            print(f"  ‚Ä¢ {member.name} ({member.size} bytes)")

        if len(members) > 10:
            print(f"  ... and {len(members) - 10} more")

        # Check for essential files
        essential_files = [
            "model/weights/",
            "model/context/",
        ]

        has_weights = any("weights" in m.name and m.name.endswith('.pt') for m in members)
        has_config = any("context" in m.name and m.name.endswith('.json') for m in members)

        print(f"\n‚úÖ Has weights file: {has_weights}")
        print(f"‚úÖ Has config file: {has_config}")

except Exception as e:
    print(f"‚ùå Error verifying file: {e}")

# ========== 6. CLEANUP OPTION ==========
print("\n" + "="*70)
print("SUMMARY")
print("="*70)
print(f"Input folder: {os.path.abspath(nemo_folder)}")
print(f"Output file: {os.path.abspath(output_nemo_file)}")
print(f"File size: {os.path.getsize(output_nemo_file) / 1024 / 1024:.2f} MB")

cleanup = input("\nDelete original folder? (y/n): ")
if cleanup.lower() == 'y':
    shutil.rmtree(nemo_folder)
    print(f"‚úÖ Deleted {nemo_folder}")
else:
    print(f"‚úÖ Kept original folder")

print("\n" + "="*70)
print("DONE! You now have a SINGLE .nemo file")
print("="*70)