# Healthcare QLoRA Fine-tuning Tutorial

This notebook demonstrates how to fine-tune a language model using QLoRA for healthcare Q&A applications.

## Setup and Installation

First, let's install the required dependencies:

In [None]:
# Install required packages (uncomment if needed)
# !pip install -r ../requirements.txt

In [None]:
import sys
import os
from pathlib import Path

# Add src to path
sys.path.append(str(Path().parent / "src"))

# Import our modules
from azure_qlora_healthcare.data.processor import HealthcareDataProcessor
from azure_qlora_healthcare.training.qlora_trainer import QLoRATrainer
from azure_qlora_healthcare.utils.config import get_config
from azure_qlora_healthcare.utils.logger import get_logger

## 1. Data Preparation

Let's start by loading and preparing our healthcare dataset:

In [None]:
# Initialize data processor
data_processor = HealthcareDataProcessor()
logger = get_logger("tutorial")

# Load sample dataset
print("Loading sample healthcare dataset...")
dataset = data_processor._create_sample_dataset()

print(f"Dataset splits: {list(dataset.keys())}")
print(f"Training examples: {len(dataset['train'])}")
print(f"Validation examples: {len(dataset['validation'])}")
print(f"Test examples: {len(dataset['test'])}")

In [None]:
# Let's look at a sample example
sample = dataset['train'][0]
print("Sample Question:", sample['input'])
print("Sample Answer:", sample['output'])
print("Source:", sample['source'])

## 2. PHI Anonymization

Let's test the PHI (Protected Health Information) anonymization feature:

In [None]:
# Test PHI anonymization
sample_text = "Patient John Doe, DOB 01/15/1980, called at 555-123-4567 about his medication."
print("Original text:", sample_text)

anonymized_text = data_processor.anonymize_text(sample_text)
print("Anonymized text:", anonymized_text)

## 3. Dataset Formatting

Format the dataset for QLoRA training:

In [None]:
# Format dataset for training
formatted_dataset = data_processor.format_for_training(dataset)

# Look at formatted example
formatted_sample = formatted_dataset['train'][0]
print("Formatted training text:")
print(formatted_sample['text'])

## 4. Model Configuration

Set up the QLoRA trainer with appropriate configuration:

In [None]:
# Load configuration
config = get_config()

# Show current configuration
print("Model configuration:")
print(f"Base model: {config.get('model.base_model_name')}")
print(f"Max length: {config.get('model.max_length')}")
print(f"Batch size: {config.get('model.batch_size')}")
print(f"Learning rate: {config.get('model.learning_rate')}")

print("\nQLoRA configuration:")
print(f"LoRA rank: {config.get('qlora.lora_r')}")
print(f"LoRA alpha: {config.get('qlora.lora_alpha')}")
print(f"LoRA dropout: {config.get('qlora.lora_dropout')}")
print(f"Target modules: {config.get('qlora.target_modules')}")

## 5. Model Training (Demo)

**Note**: Actual training requires GPU and significant time. This section shows the setup process.

In [None]:
# Initialize trainer
trainer = QLoRATrainer()

print("Trainer configuration:")
print(f"Model name: {trainer.config.model_name}")
print(f"Max length: {trainer.config.max_length}")
print(f"LoRA rank: {trainer.config.lora_r}")

# Note: Uncomment the following lines to actually train (requires GPU)
# trainer.setup_model_and_tokenizer()
# results = trainer.train(formatted_dataset, output_dir="./tutorial_output")

## 6. Model Evaluation (Demo)

After training, you can evaluate the model:

In [None]:
# from azure_qlora_healthcare.evaluation.metrics import HealthcareEvaluator

# # Initialize evaluator
# evaluator = HealthcareEvaluator()

# # Run evaluation (requires trained model)
# # results = evaluator.evaluate_model(trainer, formatted_dataset['test'])
# # print("Evaluation results:", results['metrics'])

print("Evaluation setup ready. See scripts/evaluate.py for full evaluation.")

## 7. Bot Deployment (Demo)

After training, you can deploy the model as a healthcare bot:

In [None]:
# from azure_qlora_healthcare.deployment.bot_service import HealthcareBotManager

# # Initialize bot (requires trained model)
# # bot_manager = HealthcareBotManager(model_path="./tutorial_output")
# # bot = bot_manager.get_bot()

print("Bot deployment setup ready. See scripts/run_bot.py for bot service.")

## 8. Azure ML Integration

For production training, use Azure ML:

In [None]:
# from azure_qlora_healthcare.utils.azure_ml import AzureMLManager

# # Azure ML setup (requires Azure credentials)
# # azure_ml = AzureMLManager()
# # job = azure_ml.submit_training_job("scripts/train.py")

print("Azure ML integration ready. Configure .env and run scripts/setup_azure.py")

## Summary

This tutorial covered:

1. **Data Processing**: Loading and preparing healthcare Q&A data
2. **PHI Anonymization**: Protecting patient information
3. **QLoRA Setup**: Configuring efficient fine-tuning
4. **Training Pipeline**: Setting up model training
5. **Evaluation**: Assessing model performance
6. **Deployment**: Creating a healthcare bot
7. **Azure Integration**: Scaling with cloud resources

## Next Steps

1. **Install dependencies**: `pip install -r requirements.txt`
2. **Configure Azure**: Copy `.env.example` to `.env` and add your credentials
3. **Run training**: `python scripts/train.py --data-path data/examples/sample_healthcare_qa.csv`
4. **Evaluate model**: `python scripts/evaluate.py --model-path ./outputs/model`
5. **Deploy bot**: `python scripts/run_bot.py --model-path ./outputs/model`

For production use, ensure proper HIPAA compliance and data governance practices.