# Install Requirements - OpenShift AI Workshop
## 📦 Setting Up Python Dependencies

---

**Module:** 1 - Environment Setup  
**Objective:** Install all required Python libraries for the AI e-commerce workshop  
**Estimated Time:** 10-15 minutes

---

This notebook will install all the necessary Python packages for:
- 🤖 Machine Learning (scikit-learn, onnx)
- 🧠 Large Language Models (langchain)
- 📊 Data Analysis (pandas, numpy, plotly)
- 🌐 Web Interfaces (gradio)
- 📈 Monitoring (prometheus-client)
- 🔧 Utilities (requests)

---

pip install git+https://github.com/onnx/sklearn-onnx.git

## 🔍 Step 1: Check Current Environment

In [None]:
import sys
import subprocess
import platform
from datetime import datetime

print("🔍 Environment Information")
print("=" * 50)
print(f"📅 Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"🐍 Python Version: {sys.version}")
print(f"💻 Platform: {platform.platform()}")
print(f"📁 Python Path: {sys.executable}")
print(f"📦 Pip Version: {subprocess.check_output([sys.executable, '-m', 'pip', '--version']).decode().strip()}")
print("\n✅ Environment check complete!")

## 📦 Step 2: Define Core Package Requirements

We'll focus on essential packages first, avoiding those that commonly cause conflicts:

In [None]:
# Core essential packages for the workshop
CORE_PACKAGES = [
    # Machine Learning Core
    "scikit-learn>=1.3.0",
    "numpy>=1.24.0",
    "pandas>=2.0.0",
    "scipy>=1.11.0",
    
    # Model Export
    "onnx>=1.14.0",
    "skl2onnx>=1.15.0",
    "onnxruntime>=1.16.0",
    
    # LLM - Basic
    "langchain>=0.1.0",
    "langchain-community>=0.0.13",  # Additional langchain features
    "openai>=1.3.0",
    
    # Visualization
    "matplotlib>=3.7.0",
    "plotly>=5.17.0",
    "seaborn>=0.12.0",
    
    # Web Interface
    "gradio>=4.0.0",
    "flask>=2.3.0",
    
    # Monitoring
    "prometheus-client>=0.19.0",
    "psutil>=5.9.0",
    
    # Utilities
    "requests>=2.31.0",
    "python-dotenv>=1.0.0",
    "pyyaml>=6.0.0",
    "tqdm>=4.66.0",
    
    # Development
    "ipywidgets>=8.0.0",
]

# Optional packages that may cause conflicts
OPTIONAL_PACKAGES = [
    "streamlit>=1.28.0",       # Tornado conflicts
    "transformers>=4.36.0",    # Heavy dependencies
    "torch>=2.1.0",            # Very large
    "openvino>=2023.2.0",      # Platform specific 
]

print("📋 Package Installation Plan")
print("=" * 50)
print(f"🎯 Core packages to install: {len(CORE_PACKAGES)}")
print(f"⚪ Optional packages: {len(OPTIONAL_PACKAGES)}")

print("\n🎯 Core Packages:")
for package in CORE_PACKAGES:
    print(f"  • {package}")

print("\n⚪ Optional Packages (install manually if needed):")
for package in OPTIONAL_PACKAGES:
    print(f"  • {package}")

## 🛠️ Step 3: Simple Installation Functions

In [None]:
def install_package(package_spec):
    """Install a package using pip - simple and reliable"""
    cmd = [sys.executable, "-m", "pip", "install", package_spec]
    
    try:
        result = subprocess.run(cmd, capture_output=True, text=True, check=True)
        return True, "Success"
    except subprocess.CalledProcessError as e:
        return False, e.stderr

def get_package_name(package_spec):
    """Extract package name from specification"""
    # Handle package>=version format
    for separator in ['>=', '==', '<=', '>', '<', '!=']:
        if separator in package_spec:
            return package_spec.split(separator)[0]
    return package_spec

def check_package_importable(package_name):
    """Check if package can be imported (simple test)"""
    try:
        # Try common import name variations
        import_names = [
            package_name,
            package_name.replace('-', '_'),
            package_name.replace('_', '-'),
        ]
        
        for name in import_names:
            try:
                __import__(name)
                return True
            except ImportError:
                continue
        return False
    except Exception:
        return False

print("🔧 Installation functions ready!")

## 🚀 Step 4: Install Core Packages

Installing essential packages one by one with clear progress reporting:

In [None]:
print("🚀 Installing core packages...\n")

successful_installs = []
failed_installs = []
already_available = []

for i, package_spec in enumerate(CORE_PACKAGES, 1):
    package_name = get_package_name(package_spec)
    
    print(f"[{i:2d}/{len(CORE_PACKAGES)}] {package_name}...", end=" ")
    
    # Quick check if already importable
    if check_package_importable(package_name):
        print("✅ Already available")
        already_available.append(package_name)
        continue
    
    # Try to install
    success, message = install_package(package_spec)
    
    if success:
        print("✅ Installed")
        successful_installs.append(package_name)
    else:
        print("❌ Failed")
        failed_installs.append((package_name, message[:100]))

print("\n" + "=" * 50)
print("📊 Installation Summary")
print("=" * 50)
print(f"✅ Already available: {len(already_available)}")
print(f"✅ Successfully installed: {len(successful_installs)}")
print(f"❌ Failed installations: {len(failed_installs)}")
print(f"📊 Total ready: {len(already_available) + len(successful_installs)}/{len(CORE_PACKAGES)}")

## 🩺 Step 5: Handle Failed Installations

In [None]:
if failed_installs:
    print("🔧 Attempting to fix failed installations...\n")
    
    retry_success = []
    permanent_failures = []
    
    for package_name, error in failed_installs:
        print(f"🩺 Retrying {package_name}...")
        
        # Try different installation strategies
        strategies = [
            ["--no-cache-dir"],
            ["--user"],
            ["--force-reinstall", "--no-deps"],
        ]
        
        success = False
        for strategy in strategies:
            cmd = [sys.executable, "-m", "pip", "install"] + strategy + [package_name]
            try:
                subprocess.run(cmd, capture_output=True, text=True, check=True)
                print(f"  ✅ Fixed with {' '.join(strategy)}")
                retry_success.append(package_name)
                success = True
                break
            except subprocess.CalledProcessError:
                continue
        
        if not success:
            print(f"  ❌ Still failing: {package_name}")
            permanent_failures.append(package_name)
    
    print(f"\n🔧 Retry Results:")
    print(f"  ✅ Fixed: {len(retry_success)}")
    print(f"  ❌ Still failing: {len(permanent_failures)}")
    
    if permanent_failures:
        print(f"\n⚠️ Permanently failed packages: {', '.join(permanent_failures)}")
        print("   These packages can be installed manually later if needed.")

else:
    print("🎉 No failed installations to handle!")

## ✅ Step 6: Verify Core Functionality

In [None]:
print("🔍 Testing critical functionality...\n")

# Test critical imports
critical_tests = {
    "Machine Learning": [
        ("import sklearn", "scikit-learn"),
        ("import pandas as pd", "pandas"),
        ("import numpy as np", "numpy"),
    ],
    "Model Export": [
        ("import onnx", "ONNX"),
        ("import skl2onnx", "scikit-learn to ONNX"),
    ],
    "LLM Support": [
        ("import langchain", "LangChain"),
        ("import openai", "OpenAI client"),
    ],
    "Visualization": [
        ("import matplotlib.pyplot as plt", "matplotlib"),
        ("import plotly.graph_objects as go", "plotly"),
    ],
    "Web Interface": [
        ("import gradio as gr", "Gradio"),
        ("import flask", "Flask"),
    ],
    "Utilities": [
        ("import requests", "HTTP requests"),
        ("import yaml", "YAML processing"),
    ]
}

total_tests = sum(len(tests) for tests in critical_tests.values())
passed_tests = 0
failed_tests = []

for category, tests in critical_tests.items():
    print(f"🧪 {category}:")
    for test_code, description in tests:
        try:
            exec(test_code)
            print(f"  ✅ {description}")
            passed_tests += 1
        except Exception as e:
            print(f"  ❌ {description}: {str(e)[:50]}...")
            failed_tests.append(description)
    print()

print("=" * 50)
print(f"🧪 Test Results: {passed_tests}/{total_tests} passed")

if passed_tests >= total_tests * 0.8:  # 80% success rate
    print("🟢 EXCELLENT: Core functionality is ready!")
elif passed_tests >= total_tests * 0.6:  # 60% success rate
    print("🟡 GOOD: Most functionality available, some optional features missing")
else:
    print("🔴 NEEDS ATTENTION: Many core features are missing")

if failed_tests:
    print(f"\n⚠️ Failed imports: {', '.join(failed_tests)}")

## 🧪 Step 7: Quick ML Pipeline Test

In [None]:
print("🤖 Testing ML pipeline functionality...\n")

try:
    # Test basic ML workflow
    import sklearn
    from sklearn.datasets import make_classification
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import train_test_split
    import pandas as pd
    import numpy as np
    
    print("📊 Creating test dataset...")
    X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    print("🌲 Training Random Forest...")
    clf = RandomForestClassifier(n_estimators=10, random_state=42)
    clf.fit(X_train, y_train)
    
    score = clf.score(X_test, y_test)
    print(f"✅ Model accuracy: {score:.3f}")
    
    # Test ONNX export if available
    try:
        import onnx
        import skl2onnx
        from skl2onnx import convert_sklearn
        from skl2onnx.common.data_types import FloatTensorType
        
        print("🔄 Testing ONNX export...")
        initial_type = [('float_input', FloatTensorType([None, 10]))]
        onnx_model = convert_sklearn(clf, initial_types=initial_type)
        print("✅ ONNX export successful")
        
    except Exception as e:
        print(f"⚠️ ONNX export failed: {str(e)[:50]}...")
    
    print("\n🎉 ML pipeline test: SUCCESS")
    
except Exception as e:
    print(f"❌ ML pipeline test failed: {str(e)}")
    print("   This indicates core ML packages may not be properly installed.")

## 📊 Step 8: Environment Health Report

In [None]:
import json
from datetime import datetime

print("📋 Generating Environment Health Report...\n")

# System information
try:
    import psutil
    memory_gb = psutil.virtual_memory().total / (1024**3)
    available_gb = psutil.virtual_memory().available / (1024**3)
    cpu_count = psutil.cpu_count()
    system_info = {
        "memory_total_gb": round(memory_gb, 2),
        "memory_available_gb": round(available_gb, 2),
        "cpu_cores": cpu_count
    }
except ImportError:
    system_info = {"status": "psutil not available"}

# Create comprehensive report
report = {
    "timestamp": datetime.now().isoformat(),
    "python_version": f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}",
    "platform": platform.platform(),
    "system_info": system_info,
    "installation_summary": {
        "total_packages": len(CORE_PACKAGES),
        "already_available": len(already_available),
        "newly_installed": len(successful_installs),
        "failed": len(failed_installs),
        "success_rate": f"{((len(already_available) + len(successful_installs)) / len(CORE_PACKAGES) * 100):.1f}%"
    },
    "functionality_tests": {
        "total_tests": total_tests,
        "passed_tests": passed_tests,
        "success_rate": f"{(passed_tests / total_tests * 100):.1f}%" if total_tests > 0 else "0%"
    },
    "ready_for_workshop": passed_tests >= total_tests * 0.8 and len(failed_installs) <= 2
}

# Save report
with open("workshop_environment_report.json", "w") as f:
    json.dump(report, f, indent=2)

# Display summary
print("🎯 WORKSHOP READINESS ASSESSMENT")
print("=" * 50)
print(f"📅 Assessment Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"🐍 Python: {report['python_version']}")

if 'memory_total_gb' in system_info:
    print(f"💾 Memory: {system_info['memory_available_gb']:.1f}GB available / {system_info['memory_total_gb']:.1f}GB total")
    print(f"🖥️ CPU: {system_info['cpu_cores']} cores")

print(f"\n📦 Package Installation: {report['installation_summary']['success_rate']}")
print(f"🧪 Functionality Tests: {report['functionality_tests']['success_rate']}")

if report['ready_for_workshop']:
    print("\n🟢 STATUS: READY FOR WORKSHOP! 🎉")
    print("✅ Your environment is properly configured")
    print("✅ Core ML functionality verified")
    print("✅ You can proceed to Module 2: Predictive Model")
else:
    print("\n🟡 STATUS: PARTIALLY READY")
    print("⚠️ Some features may be limited")
    print("💡 Consider installing missing packages manually")
    print("📝 Check the failed tests above for details")

print(f"\n📊 Full report saved to: workshop_environment_report.json")

# Optional packages reminder
print("\n" + "=" * 50)
print("💡 OPTIONAL PACKAGES")
print("=" * 50)
print("For advanced features, you can install these manually:")
for package in OPTIONAL_PACKAGES:
    print(f"  pip install {package}")

print("\n🚀 Ready for Module 2: Predictive Model Development!")

## 🎯 Next Steps

Your Python environment setup is complete! Here's what to do next:

### ✅ If Status is "READY FOR WORKSHOP" (Green):
1. **Continue to Module 2:** Start with predictive model development
2. **Run verification notebook:** `1-environment/verify_environment.ipynb`
3. **Download datasets:** `1-environment/download_datasets.ipynb`

### ⚠️ If Status is "PARTIALLY READY" (Yellow):
1. **Review failed packages:** Check which specific features are missing
2. **Install manually:** Use the pip commands shown above for missing packages
3. **Restart kernel:** After manual installations, restart and re-run verification
4. **Continue anyway:** Most workshop features should still work

### 🔧 Manual Installation Commands:
```bash
# If specific packages failed, try:
pip install --no-cache-dir package_name
pip install --user package_name
pip install --force-reinstall package_name

# For optional advanced features:
pip install streamlit>=1.28.0
pip install transformers>=4.36.0
pip install torch>=2.1.0
```

### 📚 What You Now Have:
- ✅ **Core ML:** scikit-learn, pandas, numpy
- ✅ **Model Export:** ONNX conversion capabilities
- ✅ **LLM Integration:** LangChain framework
- ✅ **Visualization:** matplotlib, plotly, seaborn
- ✅ **Web Interface:** Gradio for dashboards
- ✅ **Monitoring:** System metrics collection

---

**Ready for the next step?** 🚀  
**Next:** [Module 2 - Predictive Model Development](../02-predictive-model.md)

---

*Workshop Progress: Module 1 Complete ✅*