# Healthcare AI Multi-Task Projects - Colab Setup

This notebook sets up the environment and installs all necessary dependencies for the Healthcare AI Multi-Task Projects.

## Projects Included:
1. **Task 1**: ECG Arrhythmia Classification Using CNN
2. **Task 2**: Fine-tune Bio_ClinicalBERT for Clinical Note Classification
3. **Task 3**: LLaMA 3.1 Text Summarization

---

## 1. Enable GPU Runtime (Recommended)

For faster training, enable GPU runtime:
1. Go to **Runtime** → **Change runtime type**
2. Set **Hardware accelerator** to **GPU**
3. Click **Save**

**Note**: GPU runtime is recommended but not required. CPU training will work but be slower.

In [None]:
# Check GPU availability
import torch

print("🖥️  GPU Status Check")
print("=" * 30)

if torch.cuda.is_available():
    print(f"✅ CUDA available: {torch.cuda.get_device_name(0)}")
    print(f"   CUDA version: {torch.version.cuda}")
    print(f"   GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    print(f"   PyTorch version: {torch.__version__}")
else:
    print("⚠️  CUDA not available - training will be slower")
    print(f"   PyTorch version: {torch.__version__}")
    print("   Consider enabling GPU runtime for faster training")

## 2. Install Dependencies

In [None]:
# Install required packages
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install -q transformers datasets accelerate evaluate
!pip install -q rouge-score sacrebleu
!pip install -q scikit-learn matplotlib seaborn plotly
!pip install -q nltk gdown kaggle
!pip install -q tqdm ipywidgets

print("✅ All packages installed successfully!")

In [None]:
# Download NLTK data
import nltk

print("📚 Downloading NLTK data...")
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('punkt_tab')

print("✅ NLTK data downloaded successfully!")

## 3. Verify Installation

In [None]:
# Test key imports
print("🔍 Testing key imports...")
print("=" * 30)

imports_to_test = [
    ('numpy', 'NumPy'),
    ('pandas', 'Pandas'),
    ('torch', 'PyTorch'),
    ('transformers', 'Transformers'),
    ('sklearn', 'Scikit-learn'),
    ('matplotlib', 'Matplotlib'),
    ('seaborn', 'Seaborn'),
    ('plotly', 'Plotly'),
    ('nltk', 'NLTK'),
    ('datasets', 'Datasets'),
    ('evaluate', 'Evaluate'),
    ('rouge_score', 'ROUGE Score'),
    ('sacrebleu', 'SacreBLEU')
]

all_imports_ok = True
for module, name in imports_to_test:
    try:
        __import__(module)
        print(f"✅ {name}")
    except ImportError as e:
        print(f"❌ {name}: {e}")
        all_imports_ok = False

if all_imports_ok:
    print("\n🎉 All imports successful! Environment is ready.")
else:
    print("\n❌ Some imports failed. Please check the error messages above.")

In [None]:
# Test GPU training capability
print("🧪 Testing GPU training capability...")
print("=" * 40)

import torch
import torch.nn as nn
import time

# Create a simple test model
class TestModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(100, 10)
    
    def forward(self, x):
        return self.linear(x)

model = TestModel()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())

# Test on CPU
print("Testing CPU training...")
x_cpu = torch.randn(32, 100)
y_cpu = torch.randint(0, 10, (32,))

start_time = time.time()
for _ in range(10):
    optimizer.zero_grad()
    output = model(x_cpu)
    loss = criterion(output, y_cpu)
    loss.backward()
    optimizer.step()
cpu_time = time.time() - start_time
print(f"CPU training time: {cpu_time:.3f} seconds")

# Test on GPU if available
if torch.cuda.is_available():
    print("\nTesting GPU training...")
    model_gpu = TestModel().cuda()
    optimizer_gpu = torch.optim.Adam(model_gpu.parameters())
    
    x_gpu = torch.randn(32, 100).cuda()
    y_gpu = torch.randint(0, 10, (32,)).cuda()
    
    start_time = time.time()
    for _ in range(10):
        optimizer_gpu.zero_grad()
        output = model_gpu(x_gpu)
        loss = criterion(output, y_gpu)
        loss.backward()
        optimizer_gpu.step()
    gpu_time = time.time() - start_time
    print(f"GPU training time: {gpu_time:.3f} seconds")
    print(f"Speedup: {cpu_time/gpu_time:.2f}x")
else:
    print("\n⚠️  GPU not available - skipping GPU test")

print("\n✅ Training capability test completed!")

## 4. Next Steps

Now that the environment is set up, you can run the three main project notebooks:

### 📊 Task 1: ECG Arrhythmia Classification
- **File**: `Task1_ECG_CNN.ipynb`
- **Description**: Classify arrhythmias from ECG signals using CNN
- **Key Features**: 1D CNN, class imbalance handling, comprehensive evaluation

### 🏥 Task 2: Bio_ClinicalBERT Fine-tuning
- **File**: `Task2_BioClinicalBERT.ipynb`
- **Description**: Fine-tune Bio_ClinicalBERT for clinical note classification
- **Key Features**: 22 medical categories, Hugging Face Transformers, class weights

### 📝 Task 3: LLaMA 3.1 Text Summarization
- **File**: `Task3_LLaMA_Summarization.ipynb`
- **Description**: Fine-tune LLaMA 3.1 for abstractive summarization
- **Key Features**: CNN/DailyMail dataset, ROUGE/BLEU evaluation, sequence-to-sequence

### 🚀 Getting Started
1. **Open any of the three notebooks**
2. **Run all cells sequentially** (Cell → Run All)
3. **Follow the markdown instructions** in each notebook
4. **Monitor training progress** and adjust parameters if needed

### 💡 Tips for Success
- **Enable GPU runtime** for faster training
- **Run cells in order** - don't skip ahead
- **Monitor memory usage** - restart runtime if needed
- **Save your work** - download notebooks and results
- **Check the README.md** for detailed information

### 🔧 Troubleshooting
- **Out of memory**: Restart runtime and reduce batch size
- **Import errors**: Re-run the setup cells
- **Slow training**: Enable GPU runtime
- **Dataset issues**: Check the dataset_links.md file

---

**Ready to start? Open one of the task notebooks and begin your Healthcare AI journey! 🚀**