<a href="https://colab.research.google.com/github/jiuwong/sfu_AppliedAI_DataAnalytics/blob/main/5_2_model_building_simulation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://sfudial.ca/wp-content/uploads/SFU-DIAL-Logo.png" width=40%>&nbsp;&nbsp;&nbsp;&nbsp;<img src="https://www.sfu.ca/content/dam/sfu/images/brand_extension/SFU-Big-Data_Logo.png" width=40%>

# Lab 5.1: Model Building Simulation

Master AI-enhanced model building techniques using simulation environments. Learn to select, train, and evaluate models with AI support while developing critical thinking skills for model development decisions.

**Use the TODOs and prompt your AI like a teammate. Think critically, experiment often, and document your process.**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/git-steb/18fe403553123ece1b03b1d992273551/5_1_Model_Building_Simulation.ipynb)

## Lab Outline

- **Part 1:** Set up your environment and understand simulation parameters
- **Part 2:** Explore different model types with AI guidance
- **Part 3:** Train and evaluate models with AI assistance
- **Part 4:** Compare model performance and select best approach
- **Deliverable:** Reflection on AI-assisted model building

## Getting Help from Your AI Assistant

**Why AI assistance matters:** AI tools can help you understand model selection criteria, suggest hyperparameters, and interpret results. They're particularly valuable for model building where experimentation and optimization are key.

**Effective AI Prompts for Model Building:**
- "Help me choose the right model type for this problem based on data characteristics and business requirements"
- "What hyperparameters should I tune for this model and what search strategy should I use?"
- "How can I improve this model's performance while maintaining interpretability?"
- "What evaluation metrics should I use for this business context and how do I explain them to stakeholders?"
- "Help me interpret these model results and identify potential issues or improvements"
- "What preprocessing steps would help this model and what's the business justification?"
- "How do I validate this model's performance and ensure it's ready for production?"

**Pro Tip:** Ask "what would a senior data scientist consider" and "how should I validate this model for production" to get more targeted assistance.

## Learning Objectives
- [ ] **Environment Setup**: Configure AI-enhanced model building environment with proper tools and libraries
- [ ] **Model Exploration**: Use AI guidance to select appropriate model types based on data characteristics and business requirements
- [ ] **Model Training**: Train multiple models with AI assistance, including hyperparameter tuning and validation
- [ ] **Model Evaluation**: Evaluate model performance using appropriate metrics and AI-powered interpretation
- [ ] **Model Comparison**: Compare different approaches and select the best model using AI-supported decision-making
- [ ] **Reflection**: Document insights and lessons learned from AI-assisted model building process

## Lab Structure

**Part 1: Environment Setup** - Configure tools and understand data characteristics
**Part 2: Model Exploration** - Use AI to select appropriate model types and approaches  
**Part 3: Model Training** - Train models with AI-guided hyperparameter tuning
**Part 4: Model Evaluation** - Evaluate performance using AI-assisted interpretation
**Part 5: Model Comparison** - Compare approaches and select best model with AI support
**Part 6: Reflection** - Document insights and lessons learned from AI-assisted process

## Part 1: Environment Setup

**Goal**: Set up your AI-enhanced model building environment with the right tools and libraries.

### Step 1: Install Required Packages

In [None]:
# Install required packages for model building and AI assistance
# !pip install  # Double-commented for safety. Remove both # to install. --quiet pandas numpy scikit-learn matplotlib seaborn plotly xgboost lightgbm

### Step 2: Import Libraries

In [None]:
# Core data science libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Machine learning libraries
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, roc_auc_score
from sklearn.preprocessing import StandardScaler, LabelEncoder

# Advanced ML libraries (optional - install if needed)
# import xgboost as xgb
# import lightgbm as lgb

# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Libraries imported successfully!")
print("üìö Ready for AI-assisted model building")

### Step 3: Load Your Dataset

In [None]:
# Load your dataset here - replace with your actual data source
# df = pd.read_csv('your_dataset.csv')
#
# # For this lab, you can use any classification dataset such as:
# # - Credit card default prediction
# # - Customer churn analysis
# # - Medical diagnosis
# # - Fraud detection

# Example with a sample dataset:
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=42)
df = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(20)])
df['target'] = y

print(f"‚úÖ Dataset loaded: {df.shape[0]} samples, {df.shape[1]-1} features")
print(f"üìä Target distribution: {df['target'].value_counts().to_dict()}")
print(f"üìà Data shape: {df.shape}")
print(f"üîç Missing values: {df.isnull().sum().sum()}")

## Part 2: Model Exploration with AI Guidance

**Goal**: Use AI assistance to explore different model types and select the most appropriate approach for your data and business requirements.

### Step 1: Data Understanding with AI

In [None]:
# TODO: Use AI to understand your dataset
# - Ask AI to analyze data characteristics and suggest appropriate model types
# - Get recommendations for feature engineering and preprocessing
# - Understand business context and success metrics
#
# **AI Prompts for Data Understanding:**
# - "Help me analyze this dataset and identify the best model types for this business problem"
# - "What preprocessing steps should I consider based on the data characteristics?"
# - "How do I determine the most important features for this type of problem?"
# - "What evaluation metrics should I use for this business context?"

### Step 2: Model Type Selection

In [None]:
# TODO: Select model types with AI guidance
# - Compare different model families (linear, tree-based, ensemble, etc.)
# - Consider interpretability vs performance trade-offs
# - Plan for model comparison and validation
#
# **AI Prompts for Model Selection:**
# - "Help me choose between linear models, tree-based models, and ensemble methods for this problem"
# - "What are the trade-offs between interpretability and performance for this business context?"
# - "How do I design a fair comparison between different model types?"
# - "What are the deployment considerations for each model type?"

## Part 3: Model Training with AI Assistance

**Goal**: Train multiple models using AI-guided hyperparameter tuning and validation strategies.

### Step 1: Data Preprocessing

In [None]:
# TODO: Preprocess your data with AI guidance
# - Handle missing values and outliers
# - Encode categorical variables
# - Scale numerical features
# - Split data for training and validation
#
# **AI Prompts for Data Preprocessing:**
# - "Help me handle missing values in this dataset - what strategies should I consider?"
# - "How do I encode categorical variables for this type of model?"
# - "What scaling approach should I use for these features?"
# - "How do I create a proper train/validation split for this problem?"

### Step 2: Model Training

In [None]:
# TODO: Train multiple models with AI assistance
# - Implement different model types (logistic regression, random forest, XGBoost, etc.)
# - Use AI to guide hyperparameter tuning
# - Implement cross-validation for robust evaluation
#
# **AI Prompts for Model Training:**
# - "Help me implement a logistic regression model with proper hyperparameter tuning"
# - "How do I train a random forest model and what parameters should I tune?"
# - "Help me implement XGBoost with appropriate hyperparameter search"
# - "How do I set up cross-validation for fair model comparison?"

## Part 4: Model Evaluation with AI Interpretation

**Goal**: Evaluate model performance using AI-assisted interpretation and business-relevant metrics.

### Step 1: Performance Evaluation

In [None]:
# TODO: Evaluate models with AI assistance
# - Calculate appropriate metrics for your business context
# - Use AI to interpret results and identify issues
# - Compare model performance across different approaches
#
# **AI Prompts for Model Evaluation:**
# - "Help me interpret these model results - what do they tell me about performance?"
# - "What evaluation metrics should I use for this business problem?"
# - "How do I identify if my model is overfitting or underfitting?"
# - "Help me compare these models and identify the best approach"

### Step 2: Model Interpretation

In [None]:
# TODO: Interpret model results with AI guidance
# - Analyze feature importance and model behavior
# - Identify potential biases or issues
# - Prepare results for stakeholder communication
#
# **AI Prompts for Model Interpretation:**
# - "Help me analyze feature importance and explain what drives model predictions"
# - "How do I identify potential biases in this model?"
# - "Help me create visualizations that explain model behavior to stakeholders"
# - "What are the limitations of this model and how should I communicate them?"

## Part 5: Model Comparison and Selection

**Goal**: Compare different approaches and select the best model using AI-supported decision-making.

In [None]:
# TODO: Compare models and select the best approach
# - Create a comprehensive comparison of all models
# - Use AI to help weigh trade-offs between different approaches
# - Document your selection rationale
#
# **AI Prompts for Model Comparison:**
# - "Help me create a comprehensive comparison of these models"
# - "What are the trade-offs between accuracy and interpretability for this business context?"
# - "How do I choose between these models considering deployment requirements?"
# - "Help me document the rationale for my model selection"

## Part 6: Reflection and Documentation

**Goal**: Document insights and lessons learned from the AI-assisted model building process.

In [None]:
# TODO: Reflect on your AI-assisted model building experience
# - Document key insights and lessons learned
# - Identify areas for improvement
# - Prepare summary for stakeholders
#
# **Reflection Questions:**
# - How did AI assistance change your model building approach?
# - What decisions were most challenging and why?
# - How would you apply these techniques to a different problem?
# - What would you do differently next time?
#
# **AI Prompts for Reflection:**
# - "Help me summarize the key insights from this model building process"
# - "What are the most important lessons I learned about AI-assisted model development?"
# - "How do I communicate these results effectively to business stakeholders?"
# - "What should I focus on improving for future model building projects?"

## Quality Checklist

- [ ] I used AI assistance throughout the model building process
- [ ] I trained and compared multiple model types
- [ ] I implemented proper validation and evaluation
- [ ] I interpreted results and identified key insights
- [ ] I documented my approach and rationale
- [ ] I reflected on lessons learned and areas for improvement