# Day 1 Practical Session: Environment Setup

Welcome to the practical session for Day 1! In this notebook, we'll set up our Python environment and ensure everything is working correctly for LLM applications.

## Learning Objectives
- Set up a Python environment suitable for LLM applications
- Install essential packages for working with LLMs
- Test the environment with basic imports
- Understand best practices for environment management

## 1. Environment Setup Overview

For this course, we'll be working with several key Python packages:

- **transformers**: HuggingFace's library for pre-trained models
- **torch**: PyTorch for deep learning
- **openai**: Official OpenAI API client
- **python-dotenv**: For managing environment variables
- **pandas**: For data manipulation
- **matplotlib/seaborn**: For visualization
- **jupyter**: For interactive notebooks

## 2. Installing Required Packages

If you haven't already, install the required packages. Run the following cell to install them:

In [None]:
# Install required packages
# Note: Run this cell only once, or if you need to update packages

import subprocess
import sys

def install_package(package):
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
        print(f"✅ {package} installed successfully")
    except subprocess.CalledProcessError:
        print(f"❌ Failed to install {package}")

# Essential packages for LLM work
packages = [
    "torch>=2.0.0",
    "transformers>=4.30.0",
    "datasets>=2.12.0",
    "accelerate>=0.20.0",
    "python-dotenv>=1.0.0",
    "openai>=1.0.0",
    "pandas>=2.0.0",
    "matplotlib>=3.7.1",
    "seaborn>=0.12.0",
    "requests>=2.28.0"
]

# Uncomment the following lines to install packages
# for package in packages:
#     install_package(package)

print("\n📝 To install manually, run:")
print("pip install " + " ".join(packages))

## 3. Testing Your Environment

Let's test that all the essential packages are installed and working correctly:

In [None]:
# Test essential imports
import sys
print(f"Python version: {sys.version}\n")

# Test each package
packages_to_test = {
    'torch': 'PyTorch',
    'transformers': 'HuggingFace Transformers',
    'openai': 'OpenAI API Client',
    'pandas': 'Pandas',
    'matplotlib': 'Matplotlib',
    'requests': 'Requests',
    'dotenv': 'Python-dotenv'
}

for package, name in packages_to_test.items():
    try:
        if package == 'dotenv':
            import python_dotenv as dotenv
            version = dotenv.__version__
        else:
            module = __import__(package)
            version = getattr(module, '__version__', 'Unknown')
        print(f"✅ {name}: {version}")
    except ImportError:
        print(f"❌ {name}: Not installed")
    except Exception as e:
        print(f"⚠️ {name}: Error - {e}")

## 4. Check GPU Availability (Optional)

While not required for this course, having GPU access can speed up model inference:

In [None]:
import torch

print("🔍 GPU Information:")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU count: {torch.cuda.device_count()}")
    for i in range(torch.cuda.device_count()):
        print(f"GPU {i}: {torch.cuda.get_device_name(i)}")
else:
    print("No GPU available - will use CPU (this is fine for most exercises)")

# Check MPS (Apple Silicon) availability
if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
    print("🍎 Apple MPS (Metal Performance Shaders) available")
else:
    print("🍎 Apple MPS not available")

## 5. Create Requirements File

Let's create a requirements.txt file for easy environment reproduction:

In [None]:
# Create requirements.txt file
requirements_content = """# LLMs in Finance Course - Requirements
# Core ML and NLP packages
torch>=2.0.0
transformers>=4.30.0
datasets>=2.12.0
accelerate>=0.20.0
sentencepiece>=0.1.99
tokenizers>=0.13.0

# API clients
openai>=1.0.0
anthropic>=0.18.0

# Environment and configuration
python-dotenv>=1.0.0

# Data manipulation and analysis
pandas>=2.0.0
numpy>=1.24.0

# Visualization
matplotlib>=3.7.1
seaborn>=0.12.0
plotly>=5.14.0

# Utilities
requests>=2.28.0
tqdm>=4.65.0
scikit-learn>=1.2.2

# Jupyter and notebook support
jupyter>=1.0.0
ipywidgets>=8.0.0
"""

# Write to file
with open('requirements.txt', 'w') as f:
    f.write(requirements_content)

print("📄 Created requirements.txt file")
print("\nTo install all requirements in a new environment:")
print("pip install -r requirements.txt")

## 6. Environment Validation Test

Let's run a comprehensive test to ensure everything is working:

In [None]:
# Comprehensive environment test
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from transformers import pipeline
import warnings
warnings.filterwarnings('ignore')

print("🧪 Running comprehensive environment test...\n")

# Test 1: Basic data manipulation
print("1️⃣ Testing data manipulation...")
df = pd.DataFrame({
    'text': ['This is positive', 'This is negative', 'This is neutral'],
    'sentiment': ['positive', 'negative', 'neutral']
})
print(f"   Created DataFrame with {len(df)} rows ✅")

# Test 2: Basic visualization
print("\n2️⃣ Testing visualization...")
fig, ax = plt.subplots(1, 1, figsize=(6, 4))
ax.bar(df['sentiment'], [1, 1, 1])
ax.set_title('Test Plot')
plt.close()  # Close to avoid showing
print("   Created test plot ✅")

# Test 3: HuggingFace pipeline (lightweight)
print("\n3️⃣ Testing HuggingFace pipeline...")
try:
    # Use a very small model for testing
    classifier = pipeline(
        "sentiment-analysis", 
        model="cardiffnlp/twitter-roberta-base-sentiment-latest",
        return_all_scores=False
    )
    result = classifier("This is a test")
    print(f"   Pipeline test result: {result} ✅")
except Exception as e:
    print(f"   Pipeline test failed: {e} ❌")
    print("   This might be due to network issues or missing dependencies")

print("\n🎉 Environment test complete!")
print("\n📝 Next steps:")
print("   - Set up version control (Git)")
print("   - Create .env file for API keys")
print("   - Test API connections")

## 7. Best Practices Summary

### Environment Management
- Always use virtual environments for projects
- Keep a `requirements.txt` file updated
- Pin package versions for reproducibility
- Document any special installation steps

### Project Structure
```
llm-finance-project/
├── .env                 # API keys (DO NOT COMMIT)
├── .gitignore          # Git ignore file
├── requirements.txt    # Package dependencies
├── README.md          # Project documentation
├── notebooks/         # Jupyter notebooks
├── src/              # Source code
└── data/             # Data files
```

### Next Steps
1. Set up version control with Git
2. Create and configure `.env` file for API keys
3. Test API connections to OpenAI and DeepSeek
4. Explore HuggingFace models locally