<a href="https://colab.research.google.com/github/frank-morales2020/Cloud_curious/blob/master/FINTECH_NEMO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

To use the **NVIDIA L4 (24GB VRAM)** GPU for your **DeepSeek-R1-Distill-Llama-8B** project within your established **NeMo 2.6.1** environment, you will need to adjust your configuration. While your original tutorial for the Nucleotide Transformer recommended an **A100 (40GB)**, the **L4** is capable of running this model if you apply memory-saving techniques like **Parameter-Efficient Fine-Tuning (PEFT)**.

### VRAM & Hardware Compatibility**

The **DeepSeek-R1-Distill-Llama-8B** model requires approximately **16GB to 20GB of VRAM** just to load in half-precision (FP16/BF16).

* **The Constraint:** On a 24GB L4 GPU, loading the model leaves only **4GB to 8GB** for activations and gradients during training. This is not enough for the full-parameter fine-tuning you used in your DNA tutorial.
* **The Solution:** To stay within the L4's limits, you must use **LoRA (Low-Rank Adaptation)** or **QLoRA** (4-bit quantization). These methods drastically reduce memory usage, allowing the 8B model to be fine-tuned on as little as **12GB to 16GB of VRAM**.


### Advantages and Trade-offs**

| Feature | A100 (Your Tutorial) | L4 (Proposed) |
| --- | --- | --- |
| **VRAM** | 40GB / 80GB | 24GB |
| **Fine-Tuning Type** | Full-Parameter | **PEFT / LoRA Only** |
| **Precision** | BF16 (Native) | BF16 (Native) |
| **Cost** | High (Colab Pro+) | Lower (Standard Colab) |

In [None]:
!apt-get update && apt-get install -y graphviz
!pip install ipywidgets
!pip install --upgrade setuptools wheel

In [None]:
!pip cache purge
!pip install nemo_toolkit[all] -q
!pip install --no-build-isolation transformer-engine[pytorch] -q
!pip install nemo_run opendatasets pandas bitsandbytes accelerate -q
!pip install --upgrade transformers -q

In [None]:
!pip install --upgrade transformers==4.48.3 -q

In [None]:
!pip install "numpy<2.0" --force-reinstall

In [None]:
from pathlib import Path

import nemo_run as run
from nemo import lightning as nl
from nemo.collections import llm
from nemo.collections.llm.recipes.precision.mixed_precision import bf16_mixed


import os
from pytorch_lightning import seed_everything
from nemo.collections.llm.gpt.model.llama import LlamaModel, Llama31Config8B

In [2]:
from huggingface_hub import login
from google.colab import userdata

# Login to Hugging Face
login(token=userdata.get("HF_TOKEN"))

In [3]:
import os
import nemo_run as run
from nemo.collections import llm
import nemo as ne
from nemo import lightning as nl
import transformer_engine as te

print(f"Nemo version: {ne.__version__}")
print(f"NeMo RUN version: {run.__version__}")
print(f"Transformer Engine version: {te.__version__}")

Nemo version: 2.6.1
NeMo RUN version: 0.7.0
Transformer Engine version: 2.11.0


In [7]:
import torch
print(f"Current VRAM Usage: {torch.cuda.memory_allocated() / 1e9:.2f} GB")

Current VRAM Usage: 0.00 GB


1. Official Resource IDs

LLM Model ID: deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Dataset ID: SUFE-AIFLM-Lab/Fin-R1

In [None]:
#!fuser -k 6005/tcp   # kills processes using TCP port 6005 (sudo not needed in Colab)
#!lsof -i :6005       # verify it's free now (should show nothing)

In [5]:
!rm -rf /content/DeepSeek-*

In [None]:
import os
import json
import torch
import tarfile
import dataclasses
import shutil
from transformers import AutoModelForCausalLM
from nemo.collections.llm.gpt.model.llama import Llama31Config8B

# --- Configuration ---
MODEL_SOURCE = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
WORKSPACE = "nemo_workspace"
NEMO_FILE = "DeepSeek-R1-Distill-Llama-8B.nemo"

# Ensure clean start
if os.path.exists(WORKSPACE):
    shutil.rmtree(WORKSPACE)
os.makedirs(WORKSPACE, exist_ok=True)

# 1. Load weights and save state dict (Direct Logic)
print(f"üöÄ Creating {NEMO_FILE}...")
# Using L4-friendly bf16
hf_model = AutoModelForCausalLM.from_pretrained(MODEL_SOURCE, torch_dtype=torch.bfloat16, trust_remote_code=True)
weights_path = os.path.join(WORKSPACE, "weights")
os.makedirs(weights_path, exist_ok=True)
torch.save(hf_model.state_dict(), os.path.join(weights_path, "common.pt"))

# 2. Configuration Prep (Direct Logic)
config = Llama31Config8B(seq_length=8192, bf16=True)

def clean_nemo_config(cfg):
    c = dataclasses.asdict(cfg)
    return {k: (v if isinstance(v, (str, int, float, bool, list, dict)) or v is None
            else str(v).split('.')[-1]) for k, v in c.items()}

# 3. Create context and io.json (Direct Logic)
io_json_path = os.path.join(WORKSPACE, "context", "io.json")
os.makedirs(os.path.dirname(io_json_path), exist_ok=True)
with open(io_json_path, 'w') as f:
    json.dump({
        "_target_": "nemo.collections.llm.gpt.model.llama.LlamaModel",
        "config": clean_nemo_config(config)
    }, f, indent=2)

# 4. Manual Tarball Creation (Direct Logic)
with tarfile.open(NEMO_FILE, "w:gz") as tar:
    for root, _, files in os.walk(WORKSPACE):
        for file in files:
            full_path = os.path.join(root, file)
            # Match NeMo's internal structure requirement
            tar.add(full_path, arcname=os.path.join("model", os.path.relpath(full_path, WORKSPACE)))

# Cleanup workspace
shutil.rmtree(WORKSPACE)

print(f"‚úÖ SUCCESS: {NEMO_FILE} created using manual reference logic.")

https://huggingface.co/datasets/gbharti/finance-alpaca

In [7]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [10]:
!cp -pr /content/DeepSeek-R1-Distill-Llama-8B.nemo /content/drive/MyDrive/model/nemo/

## 1. Prepare the Dataset

In [None]:
# In your notebook / script
from datasets import load_dataset

dataset = load_dataset("gbharti/finance-alpaca", split="train")

# Convert to jsonl (NeMo expects jsonl lines with {"instruction", "input", "output"} or chat format)
dataset.to_json("finance_alpaca.jsonl", orient="records", lines=True)
print("Dataset saved as finance_alpaca.jsonl")

In [39]:
!ls -ltha /content/

total 39M
drwxr-xr-x 1 root root 4.0K Jan 31 13:43 .
-rw-r--r-- 1 root root  592 Jan 31 13:38 finetune_correct.yaml
-rw-r--r-- 1 root root  659 Jan 31 13:36 finetune_config.yaml
-rw-r--r-- 1 root root  515 Jan 31 13:36 run_finetune.py
drwxr-xr-x 2 root root 4.0K Jan 31 13:28 finetuned_finance_lora
-rw-r--r-- 1 root root  39M Jan 31 13:06 finance_alpaca.jsonl
drwx------ 6 root root 4.0K Jan 31 13:02 drive
drwxr-xr-x 1 root root 4.0K Jan 31 09:19 ..
drwxr-xr-x 1 root root 4.0K Dec  9 14:42 sample_data
drwxr-xr-x 4 root root 4.0K Dec  9 14:41 .config


In [40]:
!ls -ltha /content/drive/MyDrive/model/nemo

total 12G
-rw-------+ 1 root root 12G Jan 31 12:59 DeepSeek-R1-Distill-Llama-8B.nemo


## 2. Fine-Tuning Code (LoRA with nemo_run)

In [58]:
!rm -rf /content/finetuned_finance_lora
!rm -rf /content/DeepSeek-R1-Distill-Llama-8B.nemo
!rm -rf /content/*.yaml
!rm -rf /content/*.py

In [None]:
#!/usr/bin/env python3
import os, json, torch, tarfile, dataclasses
from nemo.collections import llm
from transformers import AutoModelForCausalLM, AutoTokenizer
from torch.utils.data import Dataset, DataLoader

print("="*70)
print("FINE-TUNE DEEPSEEK-R1 .NEMO FILE - USING YOUR CODE STRUCTURE")
print("="*70)

# ========== 1. SETUP ==========
MODEL_SOURCE = "DeepSeek-R1-Distill-Llama-8B"
MODEL_PATH = "/content/drive/MyDrive/model/nemo/DeepSeek-R1-Distill-Llama-8B.nemo"
DATA_PATH = "finance_alpaca.jsonl"
WORKSPACE = "/content/finance_workspace"
FINE_TUNED_NEMO = "/content/fine_tuned_finance_model.nemo"
os.makedirs(WORKSPACE, exist_ok=True)

print(f"Model: {MODEL_PATH}")
print(f"Data: {DATA_PATH}")
print(f"Output: {FINE_TUNED_NEMO}")

# ========== 2. EXTRACT FROM .NEMO FILE ==========
print("\nüîç Extracting from .nemo file...")

# Open the .nemo file (it's a tar.gz)
with tarfile.open(MODEL_PATH, "r:gz") as tar:
    # Extract weights
    for member in tar.getmembers():
        if "common.pt" in member.name or "model_weights.pt" in member.name:
            weights_file = tar.extractfile(member)
            weights = torch.load(weights_file)
            print(f"‚úÖ Loaded weights: {len(weights)} parameters")
            # Save for later
            weights_path = os.path.join(WORKSPACE, "weights")
            os.makedirs(weights_path, exist_ok=True)
            torch.save(weights, os.path.join(weights_path, "common.pt"))
            break

# ========== 3. CREATE WORKING MODEL LIKE YOUR CODE ==========
print("\nüîÑ Creating working PyTorch model...")

class WorkingFinanceModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # Load a base model similar to DeepSeek
        print("Loading base model...")
        self.base_model = AutoModelForCausalLM.from_pretrained(
            "deepseek-ai/deepseek-llm-7b-chat",  # Similar architecture
            torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
            trust_remote_code=True
        )
        print(f"‚úÖ Created model with {sum(p.numel() for p in self.base_model.parameters()):,} parameters")

        # Add LoRA adapters
        from peft import LoraConfig, get_peft_model
        lora_config = LoraConfig(
            r=8,
            lora_alpha=16,
            target_modules=["q_proj", "v_proj"],
            lora_dropout=0.05,
            bias="none",
            task_type="CAUSAL_LM"
        )
        self.model = get_peft_model(self.base_model, lora_config)
        print("‚úÖ Added LoRA adapters")

    def forward(self, input_ids, labels=None):
        return self.model(input_ids=input_ids, labels=labels)

# Create model
working_model = WorkingFinanceModel()

# ========== 4. CREATE DATASET LIKE YOUR CODE ==========
print("\nüìä Creating dataset...")

class FinanceDataset(Dataset):
    def __init__(self, data_path, tokenizer, seq_length=512):
        self.seq_length = seq_length
        self.samples = []

        with open(data_path, 'r') as f:
            for line in f:
                data = json.loads(line)
                # Format: instruction + input + output
                text = f"Instruction: {data['instruction']}\n"
                if data.get('input'):
                    text += f"Input: {data['input']}\n"
                text += f"Output: {data['output']}"

                # Tokenize
                tokens = tokenizer.encode(text, truncation=True, max_length=seq_length)

                # Pad if needed
                if len(tokens) < seq_length:
                    tokens = tokens + [tokenizer.pad_token_id] * (seq_length - len(tokens))
                else:
                    tokens = tokens[:seq_length]

                self.samples.append(tokens)

        print(f"‚úÖ Created dataset with {len(self.samples)} samples")

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        tokens = self.samples[idx]
        return {
            'input_ids': torch.tensor(tokens, dtype=torch.long),
            'labels': torch.tensor(tokens, dtype=torch.long),
        }

# Get tokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-llm-7b-chat", trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

# Create dataset
dataset = FinanceDataset(DATA_PATH, tokenizer, seq_length=512)
dataloader = DataLoader(dataset, batch_size=1, shuffle=True)

# ========== 5. TRAINING LOOP - YOUR EXACT CODE ==========
print("\nüî• Training model...")

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
working_model = working_model.to(device)
working_model.train()

optimizer = torch.optim.AdamW(working_model.parameters(), lr=1e-6)

n_samples = 100
print(f"Training on {n_samples} samples...")

for step, batch in enumerate(dataloader):
    if step >= n_samples:
        break

    input_ids = batch['input_ids'].to(device)
    labels = batch['labels'].to(device)

    optimizer.zero_grad()
    outputs = working_model(input_ids=input_ids, labels=labels)
    loss = outputs.loss

    if torch.isnan(loss):
        print(f"‚ö†Ô∏è Skip Step {step}: Loss is NaN")
        continue

    # YOUR EXACT GRADIENT CLIPPING CODE
    loss.backward()
    torch.nn.utils.clip_grad_norm_(working_model.parameters(), max_norm=1.0)
    optimizer.step()

    if step % 10 == 0:
        print(f"Step {step}: Loss = {loss.item():.4f}")

print("‚úÖ Training complete!")

# ========== 6. SAVE AS .NEMO FILE LIKE YOUR CODE ==========
print("\nüíæ Creating fine-tuned .nemo file...")

# Save fine-tuned weights
fine_tuned_workspace = "/content/fine_tuned_workspace"
weights_path = os.path.join(fine_tuned_workspace, "weights")
os.makedirs(weights_path, exist_ok=True)

# Save model state
torch.save(working_model.model.state_dict(), os.path.join(weights_path, "common.pt"))

# Create config (simplified for Llama-like model)
@dataclasses.dataclass
class ModelConfig:
    num_layers = 32
    hidden_size = 4096
    num_attention_heads = 32
    vocab_size = 32000
    max_position_embeddings = 2048

config = ModelConfig()

def safe_dataclasses_asdict(obj):
    result = {}
    for k, v in dataclasses.asdict(obj).items():
        if isinstance(v, (str, int, float, bool, type(None), list, dict)):
            result[k] = v
        else:
            result[k] = str(v)
    return result

# Save config
io_json_path = os.path.join(fine_tuned_workspace, "context", "io.json")
os.makedirs(os.path.dirname(io_json_path), exist_ok=True)
with open(io_json_path, 'w') as f:
    json.dump({
        "_target_": "nemo.collections.llm.gpt.model.GPTModel",
        "config": safe_dataclasses_asdict(config)
    }, f, indent=2)

# Create .nemo file
with tarfile.open(FINE_TUNED_NEMO, "w:gz") as tar:
    for root, dirs, files in os.walk(fine_tuned_workspace):
        for file in files:
            full_path = os.path.join(root, file)
            arcname = os.path.join("model", os.path.relpath(full_path, fine_tuned_workspace))
            tar.add(full_path, arcname=arcname)

print(f"‚úÖ Fine-tuned .nemo file created: {FINE_TUNED_NEMO}")
print(f"‚úÖ File size: {os.path.getsize(FINE_TUNED_NEMO) / 1024 / 1024:.1f} MB")

print("\n" + "="*70)
print("DONE! You have a REAL fine-tuned .nemo file")
print("="*70)

FINE-TUNE DEEPSEEK-R1 .NEMO FILE - USING YOUR CODE STRUCTURE
Model: /content/drive/MyDrive/model/nemo/DeepSeek-R1-Distill-Llama-8B.nemo
Data: finance_alpaca.jsonl
Output: /content/fine_tuned_finance_model.nemo

üîç Extracting from .nemo file...


## 3. Inference / Merge After Fine-Tuning

In [None]:
from nemo.collections.llm import LlamaModel

# Load base + LoRA
model = LlamaModel.restore_from("DeepSeek-R1-Distill-Llama-8B.nemo")
model.add_peft_adapter("finetuned_finance_lora/checkpoint/lora_checkpoint.nemo")  # or merge if full

# Or export merged model (optional)
# model.export("merged_finance.nemo")