# Install Requirements - OpenShift AI Workshop
## üì¶ Setting Up Python Dependencies

---

**Module:** 1 - Environment Setup  
**Objective:** Install all required Python libraries for the AI e-commerce workshop  
**Estimated Time:** 10-15 minutes

---

This notebook will install all the necessary Python packages for:
- ü§ñ Machine Learning (scikit-learn, onnx)
- üß† Large Language Models (langchain, transformers)
- üìä Data Analysis (pandas, numpy, plotly)
- üåê Web Interfaces (gradio, streamlit)
- üìà Monitoring (prometheus-client)
- üîß Utilities (requests, redis)

---

## üîç Step 1: Check Current Environment

In [None]:
import sys
import subprocess
import platform
import pkg_resources
from datetime import datetime

print("üîç Environment Information")
print("=" * 50)
print(f"üìÖ Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"üêç Python Version: {sys.version}")
print(f"üíª Platform: {platform.platform()}")
print(f"üìÅ Python Path: {sys.executable}")
print(f"üì¶ Pip Version: {subprocess.check_output([sys.executable, '-m', 'pip', '--version']).decode().strip()}")
print("\n‚úÖ Environment check complete!")

## üì¶ Step 2: Define Package Requirements

We'll organize packages by category for better understanding:

In [None]:
# Define package categories and versions
PACKAGE_CATEGORIES = {
    "ü§ñ Machine Learning Core": {
        "scikit-learn": ">=1.3.0",
        "numpy": ">=1.24.0",
        "pandas": ">=2.0.0",
        "scipy": ">=1.11.0"
    },
    
    "üîÑ Model Export & Optimization": {
        "onnx": ">=1.14.0",
        "skl2onnx": ">=1.15.0",
        "onnxruntime": ">=1.16.0",
        "openvino": ">=2023.2.0"
    },
    
    "üß† Large Language Models": {
        "langchain": ">=0.1.0",
        "langchain-community": ">=0.0.13",
        "transformers": ">=4.36.0",
        "torch": ">=2.1.0",
        "openai": ">=1.3.0"
    },
    
    "üìä Data Visualization": {
        "matplotlib": ">=3.7.0",
        "plotly": ">=5.17.0",
        "seaborn": ">=0.12.0",
        "ipywidgets": ">=8.0.0"
    },
    
    "üåê Web Interfaces": {
        "gradio": ">=4.0.0",
        "streamlit": ">=1.28.0",
        "flask": ">=2.3.0",
        "fastapi": ">=0.104.0",
        "uvicorn": ">=0.24.0"
    },
    
    "üìà Monitoring & Metrics": {
        "prometheus-client": ">=0.19.0",
        "psutil": ">=5.9.0",
        "py-cpuinfo": ">=9.0.0"
    },
    
    "üîß Utilities & Infrastructure": {
        "requests": ">=2.31.0",
        "redis": ">=5.0.0",
        "python-dotenv": ">=1.0.0",
        "pyyaml": ">=6.0.0",
        "jsonschema": ">=4.20.0",
        "tqdm": ">=4.66.0"
    },
    
    "üß™ Development & Testing": {
        "pytest": ">=7.4.0",
        "jupyter": ">=1.0.0",
        "ipykernel": ">=6.26.0",
        "nbformat": ">=5.9.0"
    }
}

# Flatten all packages into a single list
ALL_PACKAGES = {}
for category, packages in PACKAGE_CATEGORIES.items():
    ALL_PACKAGES.update(packages)

print("üìã Package Installation Plan")
print("=" * 50)
for category, packages in PACKAGE_CATEGORIES.items():
    print(f"\n{category}:")
    for package, version in packages.items():
        print(f"  ‚Ä¢ {package} {version}")

print(f"\nüìä Total packages to install: {len(ALL_PACKAGES)}")

print("\n‚ö†Ô∏è Note: Some packages that commonly cause conflicts have been moved to optional.")
print("   These include: streamlit, transformers, torch, openvino, pydantic upgrades")
print("   You can install these manually after the main installation completes.")

## üõ†Ô∏è Step 3: Utility Functions for Installation

In [None]:
def check_package_installed(package_name):
    """Check if a package is already installed"""
    try:
        pkg_resources.get_distribution(package_name)
        return True
    except pkg_resources.DistributionNotFound:
        return False

def get_installed_version(package_name):
    """Get the installed version of a package"""
    try:
        return pkg_resources.get_distribution(package_name).version
    except pkg_resources.DistributionNotFound:
        return None

def install_package(package_name, version_spec=None, upgrade=False):
    """Install a package using pip"""
    package_spec = f"{package_name}{version_spec}" if version_spec else package_name
    cmd = [sys.executable, "-m", "pip", "install"]
    
    if upgrade:
        cmd.append("--upgrade")
    
    cmd.append(package_spec)
    
    try:
        result = subprocess.run(cmd, capture_output=True, text=True, check=True)
        return True, result.stdout
    except subprocess.CalledProcessError as e:
        return False, e.stderr

def check_installation_status():
    """Check which packages are already installed"""
    installed = {}
    missing = {}
    
    for package, version_spec in ALL_PACKAGES.items():
        if check_package_installed(package):
            installed[package] = get_installed_version(package)
        else:
            missing[package] = version_spec
    
    return installed, missing

print("üîß Utility functions loaded successfully!")

# Check for known problematic package versions
print("\nüîç Checking for known version conflicts...")
try:
    import pydantic
    pydantic_version = pydantic.__version__
    print(f"üì¶ Current pydantic version: {pydantic_version}")
    
    if pydantic_version.startswith('1.'):
        print("‚ö†Ô∏è Pydantic v1 detected - this may cause conflicts with modern packages")
        print("   Will upgrade to v2 during installation process")
except ImportError:
    print("üì¶ Pydantic not installed - will install latest version")
except Exception as e:
    print(f"‚ö†Ô∏è Could not check pydantic version: {e}")

## üìä Step 4: Check Current Installation Status

In [None]:
print("üîç Checking current package installation status...\n")

installed_packages, missing_packages = check_installation_status()

print("‚úÖ Already Installed Packages:")
print("=" * 40)
if installed_packages:
    for package, version in sorted(installed_packages.items()):
        print(f"  ‚úì {package}: {version}")
else:
    print("  (No packages from our list are currently installed)")

print(f"\n‚ùå Missing Packages ({len(missing_packages)}):")
print("=" * 40)
if missing_packages:
    for package, version_spec in sorted(missing_packages.items()):
        print(f"  ‚úó {package} {version_spec}")
else:
    print("  üéâ All packages are already installed!")

print(f"\nüìà Installation Progress: {len(installed_packages)}/{len(ALL_PACKAGES)} packages ready")

## üöÄ Step 5: Install Missing Packages

We'll install packages in categories to manage dependencies properly:

In [None]:
# Install packages by category to handle dependencies
installation_results = {}
failed_installations = []

print("üöÄ Starting package installation...\n")

for category, packages in PACKAGE_CATEGORIES.items():
    print(f"\n{category}")
    print("=" * len(category))
    
    for package, version_spec in packages.items():
        if package in missing_packages:
            print(f"  Installing {package}...", end=" ")
            
            success, output = install_package(package, version_spec)
            
            if success:
                new_version = get_installed_version(package)
                print(f"‚úÖ Success (v{new_version})")
                installation_results[package] = new_version
            else:
                print(f"‚ùå Failed")
                failed_installations.append((package, output))
        else:
            current_version = get_installed_version(package)
            print(f"  {package}: ‚úì Already installed (v{current_version})")

print("\n" + "=" * 50)
print("üìä Installation Summary")
print("=" * 50)
print(f"‚úÖ Successfully installed: {len(installation_results)} packages")
print(f"‚ùå Failed installations: {len(failed_installations)} packages")
print(f"‚úì Already installed: {len(installed_packages)} packages")

## ü©∫ Step 6: Handle Failed Installations

If any packages failed to install, we'll try alternative methods:

In [None]:
if failed_installations:
    print("üîß Attempting to fix failed installations...\n")
    
    for package, error_output in failed_installations:
        print(f"\nü©∫ Diagnosing {package}:")
        print(f"Error: {error_output[:200]}..." if len(error_output) > 200 else f"Error: {error_output}")
        
        # Try different installation methods
        retry_methods = [
            ("--no-cache-dir", "Using no cache"),
            ("--force-reinstall", "Force reinstall"),
            ("--user", "User installation"),
        ]
        
        for flag, description in retry_methods:
            print(f"  Trying: {description}...", end=" ")
            
            cmd = [sys.executable, "-m", "pip", "install", flag, package]
            try:
                subprocess.run(cmd, capture_output=True, text=True, check=True)
                print("‚úÖ Success!")
                failed_installations.remove((package, error_output))
                break
            except subprocess.CalledProcessError:
                print("‚ùå Failed")
        else:
            print(f"  ‚ö†Ô∏è Could not install {package}. Manual intervention may be required.")

else:
    print("üéâ No failed installations to handle!")

## ‚úÖ Step 7: Final Verification

Let's verify all packages are properly installed and importable:

In [None]:
print("üîç Final package verification...\n")

# Test critical imports
critical_tests = {
    "scikit-learn": "import sklearn; print(f'sklearn: {sklearn.__version__}')",
    "pandas": "import pandas as pd; print(f'pandas: {pd.__version__}')",
    "numpy": "import numpy as np; print(f'numpy: {np.__version__}')",
    "onnx": "import onnx; print(f'onnx: {onnx.__version__}')",
    "langchain": "import langchain; print(f'langchain: {langchain.__version__}')",
    "gradio": "import gradio as gr; print(f'gradio: {gr.__version__}')",
    "requests": "import requests; print(f'requests: {requests.__version__}')",
    "plotly": "import plotly; print(f'plotly: {plotly.__version__}')"
}

successful_imports = 0
failed_imports = []

print("üì¶ Testing Critical Package Imports:")
print("=" * 40)

for package, test_code in critical_tests.items():
    try:
        exec(test_code)
        successful_imports += 1
    except Exception as e:
        print(f"‚ùå {package}: Import failed - {str(e)}")
        failed_imports.append(package)

print(f"\n‚úÖ Successfully imported: {successful_imports}/{len(critical_tests)} critical packages")

if failed_imports:
    print(f"‚ùå Failed imports: {', '.join(failed_imports)}")
    print("‚ö†Ô∏è You may need to restart the kernel and try again.")
else:
    print("üéâ All critical packages imported successfully!")

## üß™ Step 8: Environment Health Check

In [None]:
print("ü©∫ Comprehensive Environment Health Check")
print("=" * 50)

# Check system resources
try:
    import psutil
    memory = psutil.virtual_memory()
    print(f"üíæ Memory: {memory.available / (1024**3):.2f} GB available / {memory.total / (1024**3):.2f} GB total")
    print(f"üñ•Ô∏è CPU cores: {psutil.cpu_count()} cores")
    print(f"üìä CPU usage: {psutil.cpu_percent(interval=1):.1f}%")
except ImportError:
    print("‚ö†Ô∏è psutil not available for system monitoring")

# Check Python environment
print(f"\nüêç Python Environment:")
print(f"  Version: {sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}")
print(f"  Executable: {sys.executable}")

# Check pip package count
try:
    import pkg_resources
    installed_packages = [d.project_name for d in pkg_resources.working_set]
    print(f"  Total packages: {len(installed_packages)}")
except:
    print("  Could not count installed packages")

# Test ML capabilities
print(f"\nü§ñ ML Environment Test:")
try:
    import sklearn
    from sklearn.datasets import make_classification
    from sklearn.ensemble import RandomForestClassifier
    
    # Quick ML test
    X, y = make_classification(n_samples=100, n_features=10, random_state=42)
    clf = RandomForestClassifier(n_estimators=10, random_state=42)
    clf.fit(X, y)
    score = clf.score(X, y)
    print(f"  ‚úÖ ML pipeline test: {score:.2f} accuracy")
except Exception as e:
    print(f"  ‚ùå ML test failed: {e}")

# Test ONNX capabilities
print(f"\nüîÑ ONNX Environment Test:")
try:
    import onnx
    import skl2onnx
    print(f"  ‚úÖ ONNX conversion ready")
except Exception as e:
    print(f"  ‚ùå ONNX test failed: {e}")

# Test web interface capabilities
print(f"\nüåê Web Interface Test:")
try:
    import gradio as gr
    print(f"  ‚úÖ Gradio ready for dashboard creation")
except Exception as e:
    print(f"  ‚ùå Gradio test failed: {e}")

print("\n" + "=" * 50)
print("üéØ Environment Status Summary")
print("=" * 50)

if successful_imports == len(critical_tests) and not failed_imports:
    print("üü¢ READY: Your environment is fully configured!")
    print("‚úÖ All required packages are installed and working")
    print("‚úÖ ML pipeline capabilities verified")
    print("‚úÖ Ready to proceed to Module 2: Predictive Model")
else:
    print("üü° PARTIAL: Some issues detected")
    print(f"‚ö†Ô∏è {len(failed_imports)} packages need attention")
    print("üìù Please review failed installations above")
    print("üîÑ Consider restarting kernel and running this notebook again")

## üìù Step 9: Save Installation Report

In [None]:
import json
from datetime import datetime

# Create installation report
report = {
    "timestamp": datetime.now().isoformat(),
    "python_version": f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}",
    "platform": platform.platform(),
    "total_packages_requested": len(ALL_PACKAGES),
    "packages_already_installed": len(installed_packages),
    "packages_newly_installed": len(installation_results),
    "failed_installations": len(failed_installations),
    "critical_imports_successful": successful_imports,
    "critical_imports_total": len(critical_tests),
    "environment_ready": successful_imports == len(critical_tests) and not failed_imports,
    "installed_packages": dict(sorted({**installed_packages, **installation_results}.items())),
    "failed_packages": [pkg for pkg, _ in failed_installations]
}

# Save report to file
report_file = "installation_report.json"
with open(report_file, 'w') as f:
    json.dump(report, f, indent=2)

print(f"üìä Installation report saved to: {report_file}")
print("\nüìã Quick Summary:")
print(f"  ‚Ä¢ Environment Ready: {'‚úÖ Yes' if report['environment_ready'] else '‚ùå No'}")
print(f"  ‚Ä¢ Total Packages: {report['total_packages_requested']}")
print(f"  ‚Ä¢ Successfully Installed: {report['packages_already_installed'] + report['packages_newly_installed']}")
print(f"  ‚Ä¢ Failed: {report['failed_installations']}")
print(f"  ‚Ä¢ Critical Imports: {report['critical_imports_successful']}/{report['critical_imports_total']}")

## üéØ Next Steps

Your Python environment setup is complete! Here's what to do next:

### ‚úÖ If Environment is Ready (Green Status):
1. **Continue to Module 2:** Start working with predictive models
2. **Run verification notebook:** `1-environment/verify_environment.ipynb`
3. **Download datasets:** `1-environment/download_datasets.ipynb`

### ‚ö†Ô∏è If Issues Detected (Yellow/Red Status):
1. **Restart Jupyter kernel:** Kernel ‚Üí Restart & Clear Output
2. **Re-run this notebook:** Some packages may need fresh imports
3. **Check system resources:** Ensure adequate memory and storage
4. **Contact instructor:** If persistent issues occur

### üìö Environment Features Now Available:
- ‚úÖ **Machine Learning:** scikit-learn, numpy, pandas
- ‚úÖ **Model Export:** ONNX, OpenVINO support
- ‚úÖ **LLM Integration:** LangChain, transformers
- ‚úÖ **Web Interfaces:** Gradio for interactive dashboards
- ‚úÖ **Monitoring:** Prometheus metrics collection
- ‚úÖ **Data Visualization:** Plotly, matplotlib, seaborn

---

**Ready for the next module?** üöÄ  
**Next:** [Module 2 - Predictive Model Development](../02-predictive-model.md)

---

*Workshop Progress: Module 1 Complete ‚úÖ*