# Advanced Fine-Tuning of Mistral-7B for Mental Health Counseling

![Mistral LLM Architecture](https://media.licdn.com/dms/image/v2/D4E12AQEgdey3xglFOA/article-cover_image-shrink_720_1280/article-cover_image-shrink_720_1280/0/1702370058178?e=2147483647&v=beta&t=OeUF9dpRjZKMVvzIQ1ByoYK2i5nUep0qjidl9gn2nWo)

## Overview
This notebook demonstrates state-of-the-art fine-tuning of the Mistral-7B-Instruct-v0.1 model for mental health counseling applications. The implementation showcases advanced techniques in model adaptation while maintaining efficiency and performance.

### Core Technologies
- **Mistral-7B Base Model**: A powerful open-source language model with 7 billion parameters
- **LoRA Fine-tuning**: Parameter-efficient adaptation technique
- **4-bit Quantization**: Memory-efficient model deployment
- **Hugging Face Integration**: Seamless model distribution
- **Weights & Biases**: Comprehensive experiment tracking

### Prerequisites
- Python 3.10+
- GPU with CUDA support
- 24GB+ GPU memory recommended
- Hugging Face account
- Weights & Biases account

## Technical Implementation Details

### 1. Model Architecture
- **Base Model**: Mistral-7B-Instruct-v0.1
- **Quantization**: 4-bit NF4 format with double quantization
- **LoRA Configuration**: 
  - Rank: 16
  - Alpha: 32
  - Target modules: Query, Key, Value projections

### 2. Training Configuration
- **Hardware Requirements**:
  - GPU: NVIDIA with 24GB+ VRAM
  - CPU: 8+ cores recommended
  - RAM: 32GB+ recommended

- **Hyperparameters**:
  - Learning rate: 2e-4
  - Batch size: 1 (effective 4 with gradient accumulation)
  - Training epochs: 3
  - Max sequence length: 256 tokens

### 3. Resource Management
- **Memory Optimization**:
  - 4-bit quantization
  - Gradient checkpointing
  - Efficient caching

- **Performance Monitoring**:
  - Real-time resource tracking
  - GPU memory utilization
  - Training metrics logging

### 4. Dataset Processing
- **Source**: Mental health counseling conversations dataset
- **Format**: Structured dialogue pairs (user-therapist)
- **Preprocessing**: 
  - Chat template formatting
  - Token length normalization
  - Quality filtering

## Code Structure

### 1. Environment Setup
```python
%%capture
%pip install -U transformers datasets accelerate peft trl bitsandbytes wandb evaluate nvidia-ml-py3
```
- Installs required libraries for model fine-tuning, monitoring, and deployment.

In [None]:
%%capture
%pip install -U transformers 
%pip install -U datasets 
%pip install -U accelerate 
%pip install -U peft 
%pip install -U trl 
%pip install -U bitsandbytes 
%pip install -U wandb
%pip install evaluate
%pip install nvidia-ml-py3

### 2. Imports

In [None]:
import os
import gc
import psutil
import numpy as np
import torch
from datetime import datetime
import logging  # Standard Python logging
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
)
from peft import (
    LoraConfig,
    get_peft_model,
)
from datasets import load_dataset
from trl import SFTTrainer
from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
import wandb
from torch.utils.data import Dataset
import json
from typing import Dict, List
import threading
import time

- Imports necessary libraries for model handling, dataset processing, resource monitoring, and logging.

### 3. Resource Monitoring

In [None]:
class ResourceMonitor:
    def __init__(self, interval=1):
        self.interval = interval
        self.running = False
        self.stats = []
        
    def start(self):
        self.running = True
        self.thread = threading.Thread(target=self._monitor)
        self.thread.start()
        
    def stop(self):
        self.running = False
        self.thread.join()
        
    def _monitor(self):
        gpu_utilization = None
        while self.running:
            gpu_memory = torch.cuda.memory_allocated() if torch.cuda.is_available() else 0
            
            # Attempt to get GPU utilization
            if torch.cuda.is_available():
                try:
                    from pynvml import nvmlInit, nvmlDeviceGetHandleByIndex, nvmlDeviceGetUtilizationRates, nvmlShutdown
                    nvmlInit()
                    handle = nvmlDeviceGetHandleByIndex(0)
                    gpu_utilization = nvmlDeviceGetUtilizationRates(handle).gpu
                    nvmlShutdown()
                except ModuleNotFoundError:
                    gpu_utilization = "pynvml not available"
                except Exception as e:
                    gpu_utilization = f"Error: {str(e)}"
            
            self.stats.append({
                'timestamp': datetime.now().isoformat(),
                'cpu_percent': psutil.cpu_percent(),
                'ram_percent': psutil.virtual_memory().percent,
                'gpu_memory_gb': gpu_memory / (1024**3),
                'gpu_utilization': gpu_utilization or 0
            })
            time.sleep(self.interval)

def get_report(self):
    if not self.stats:
        return "No monitoring data available"
    
    stats_array = np.array([(s['cpu_percent'], s['ram_percent'], s['gpu_memory_gb'], s['gpu_utilization']) 
                           for s in self.stats])
    
    return {
        'cpu_percent': {
            'mean': np.mean(stats_array[:, 0]),
            'max': np.max(stats_array[:, 0])
        },
        'ram_percent': {
            'mean': np.mean(stats_array[:, 1]),
            'max': np.max(stats_array[:, 1])
        },
        'gpu_memory_gb': {
            'mean': np.mean(stats_array[:, 2]),
            'max': np.max(stats_array[:, 2])
        },
        'gpu_utilization': {
            'mean': np.mean(stats_array[:, 3]),
            'max': np.max(stats_array[:, 3])
        }
    }

- Implements a thread-based monitor for tracking system resources during training.

### 4. Dataset Preparation

In [None]:
class CustomDataset(Dataset):
    def __init__(self, data: List[Dict], tokenizer, max_length: int = 256):
        self.data = data
        self.tokenizer = tokenizer
        self.max_length = max_length
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        item = self.data[idx]
        
        # Format conversation using chat template
        conversation = [
            {"role": "user", "content": item["Context"]},
            {"role": "assistant", "content": item["Response"]}
        ]
        
        # Apply tokenizer's chat template
        text = self.tokenizer.apply_chat_template(conversation, tokenize=False)
        
        # Tokenize
        encodings = self.tokenizer(
            text,
            truncation=True,
            padding="max_length",
            max_length=self.max_length,
            return_tensors="pt"
        )
        
        return {
            'input_ids': encodings['input_ids'].squeeze(),
            'attention_mask': encodings['attention_mask'].squeeze(),
            'labels': encodings['input_ids'].squeeze()
        }


def setup_logging(log_file="training_log.txt"):
    logging.basicConfig(
        level=logging.INFO,  # Set the logging level to INFO
        format="%(asctime)s - %(levelname)s - %(message)s",
        handlers=[
            logging.FileHandler(log_file),  # Log to a file
            logging.StreamHandler()  # Also log to console
        ]
    )
    return logging.getLogger()  # Return the logger instance

### 5. Model Preparation

In [None]:
def prepare_model_and_tokenizer(base_model_path: str, device_map: str = "auto"):
    # Configure 4-bit quantization
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True,
    )
    
    # Load model with quantization
    model = AutoModelForCausalLM.from_pretrained(
        base_model_path,
        quantization_config=bnb_config,
        device_map=device_map,
        torch_dtype=torch.float16,
    )
    
    # Load and configure tokenizer
    tokenizer = AutoTokenizer.from_pretrained(base_model_path)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"
    
    return model, tokenizer

### 6. LoRA Configuration

In [None]:
def configure_lora(model, r=16, alpha=32):
    config = LoraConfig(
        r=r,
        lora_alpha=alpha,
        lora_dropout=0.01,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules=[
            "q_proj",
            "k_proj",
            "v_proj",
            "o_proj",
            "gate_proj",
            "up_proj",
            "down_proj",
        ],
    )
    return get_peft_model(model, config)

| Component | Configuration |
|-----------|---------------|
| Quantization | NF4 with double quantization |
| LoRA Rank | 16 |
| Batch Size | 1 (Effective 4 via grad accumulation) |
| Learning Rate | 2e-4 |
| Context Length | 256 tokens |

### 7. Training and Evaluation

In [None]:
import json
import torch
import wandb
import gc
import psutil  # System resource monitoring
from huggingface_hub import login
from huggingface_hub import HfApi, Repository
from transformers import TrainingArguments

# Resource monitoring using psutil
cpu_usage = psutil.cpu_percent(interval=1)
memory_usage = psutil.virtual_memory().percent
disk_usage = psutil.disk_usage('/').percent
gpu_usage = "Not available"  # Placeholder, integrate nvidia-smi if needed

resource_stats = {
    "cpu": cpu_usage,
    "memory": memory_usage,
    "disk": disk_usage,
    "gpu": gpu_usage
}
print("Resource stats collected.")

# Authentication
user_secrets = UserSecretsClient()
login(token=user_secrets.get_secret("mental"))
wandb.login(key=user_secrets.get_secret("wandb"))
print("Authentication completed")

run = wandb.init(
    project='Advanced-Mistral-7B-Mental-Health',
    job_type="training",
    config={
        "model_name": "mistral-7b-instruct-v0.1",
        "dataset": "mental_health_counseling_conversations",
        "batch_size": 1,
        "learning_rate": 2e-4,
        "epochs": 3,
        "max_length": 256
    }
)



Resource stats collected.
Authentication completed


In [None]:
# Load model and tokenizer
model, tokenizer = prepare_model_and_tokenizer("/kaggle/input/mistral/pytorch/7b-instruct-v0.1-hf/1")
model = configure_lora(model)

# Load and preprocess dataset
dataset = load_dataset("Amod/mental_health_counseling_conversations", split="all")
dataset = dataset.shuffle(seed=42).select(range(min(2000, len(dataset))))
train_val_split = dataset.train_test_split(test_size=0.1)
train_dataset = CustomDataset(train_val_split["train"], tokenizer)
eval_dataset = CustomDataset(train_val_split["test"], tokenizer)
print("Dataset prepared.")

# Training arguments
training_args = TrainingArguments(
    output_dir="./mistral-7b-therapist-v2",
    num_train_epochs=3,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    evaluation_strategy="steps",
    eval_steps=100,
    save_strategy="steps",
    save_steps=200,
    save_total_limit=2,
    load_best_model_at_end=True,
    report_to="wandb",
    remove_unused_columns=False,
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

README.md:   0%|          | 0.00/2.82k [00:00<?, ?B/s]

combined_dataset.json:   0%|          | 0.00/4.79M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/3512 [00:00<?, ? examples/s]

Dataset prepared.




In [None]:
# Initialize Trainer
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
)

# Train model
trainer.train()
print("Training completed.")

  trainer = SFTTrainer(


Step,Training Loss,Validation Loss
100,1.9952,2.017891
200,2.015,1.832786
300,1.8064,1.699195
400,1.6026,1.61004
500,0.8583,1.533294
600,0.8571,1.472184
700,0.7819,1.394812
800,0.7649,1.329779
900,0.728,1.236731
1000,0.2486,1.356432


Could not locate the best model at ./mistral-7b-therapist-v2/checkpoint-900/pytorch_model.bin, if you are running a distributed training on multiple nodes, you should activate `--save_on_each_node`.


Training completed.


## Best Practices and Recommendations

### 1. Training Process
- Monitor training loss and validation metrics closely
- Use early stopping if validation loss plateaus
- Save checkpoints regularly
- Track resource utilization

### 2. Model Deployment
- Test model throughput and latency
- Implement proper error handling
- Set up monitoring for production use
- Consider model versioning

### 3. Ethical Considerations
- Ensure responsible AI practices
- Monitor for biased responses
- Implement content filtering
- Regular model evaluation

## Troubleshooting Guide

### Common Issues
1. **Out of Memory (OOM)**:
   - Reduce batch size
   - Enable gradient checkpointing
   - Increase quantization level

2. **Training Instability**:
   - Adjust learning rate
   - Check for data quality issues
   - Monitor gradient norms

3. **Poor Performance**:
   - Validate dataset quality
   - Review hyperparameters
   - Check for overfitting

## References
- [Mistral AI Documentation](https://docs.mistral.ai/)
- [LoRA Paper](https://arxiv.org/abs/2106.09685)
- [Quantization Techniques](https://arxiv.org/abs/2208.07339)
- [Mental Health Counseling Best Practices](https://www.who.int/mental_health/)

In [None]:
# Evaluate model
metrics = trainer.evaluate()
eval_performance = {
    "eval_loss": metrics.get("eval_loss"),
    "eval_accuracy": metrics.get("eval_accuracy", "Not available"),
    "eval_f1": metrics.get("eval_f1", "Not available")
}
print("Model evaluation completed.")
# Save and push model to Hugging Face
model_dir = "./mistral-7b-therapist-v2"
trainer.model.save_pretrained(model_dir)
tokenizer.save_pretrained(model_dir)
print("Model saved locally.")

Model evaluation completed.
Model saved locally.


In [None]:
login(token=user_secrets.get_secret("mental"))
repo_name = "mistral-7b-therapist-v1"
trainer.model.push_to_hub(repo_name)
tokenizer.push_to_hub(repo_name)
print(f"Model and tokenizer pushed to Hugging Face under the repository: {repo_name}")

# Cleanup
wandb.finish()
gc.collect()
torch.cuda.empty_cache()
print("Training process completed")

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

Model and tokenizer pushed to Hugging Face under the repository: mistral-7b-therapist-v1


0,1
eval/loss,█▆▅▄▄▃▂▂▁▂▂▂▂▂
eval/runtime,▅▅▅▄▄▆█▄▃▃▃▄▃▁
eval/samples_per_second,▁▅▁▅▅▁▁▅▅▅▅▅▅█
eval/steps_per_second,▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/epoch,▁▁▁▁▁▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇▇▇██
train/global_step,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▆▆▇▇▇██
train/grad_norm,▁▁▁▁▁▂▂▂▂▂▃▂▂▂▂▃▂▃▃▄▄▄▃▃█▄▄▄▆▃▄▄▅▆▅█▃▃▄▁
train/learning_rate,█████▇▇▇▇▇▇▆▆▆▆▆▆▆▆▆▅▄▄▄▄▄▄▃▃▃▂▂▂▂▂▁▁▁▁▁
train/loss,█▇▇▇▇▇▇▆▇▆▆▅▅▆▅▃▄▃▃▃▃▃▃▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
eval/loss,1.29708
eval/runtime,99.6254
eval/samples_per_second,2.008
eval/steps_per_second,0.251
total_flos,5.93265514512384e+16
train/epoch,3.0
train/global_step,1350.0
train/grad_norm,2.35212
train/learning_rate,0.0
train/loss,0.2211


Training process completed


- Saves the fine-tuned model and tokenizer locally and pushes them to Hugging Face's model hub.

## Conclusion
This notebook provides a comprehensive workflow for fine-tuning a large language model for specialized applications like mental health counseling. By leveraging advanced techniques such as quantization, LoRA, and resource monitoring, it ensures efficient and effective model adaptation while maintaining high performance.