# Team Structure and Roles

## Recommended Team Roles (3-4 members):

**Data Scientist/Analyst** - Skylar

-   Lead exploratory data analysis
-   Feature engineering and selection
-   Statistical analysis and hypothesis testing
-   Data visualization and reporting

**ML Engineer** - Cynthia

-   Model development and optimization
-   Hyperparameter tuning and validation
-   Pipeline development and automation
-   Performance optimization

**Business Analyst** - Rebecca

-   PRD and interview analysis
-   Domain expertise development
-   Business requirements translation
-   Stakeholder communication

**Software Engineer** - Curtis

-   Production code development
-   Testing framework implementation
-   Deployment and integration
-   Code quality and documentation


# Technical Requirements

 ## Phase 1: Exploratory Data Analysis (Week 1-2: Oct 6 - Oct 19)

**Deliverables:**                                                
                                                                 
1.  **Data Exploration Report** (Jupyter notebook) - Skylar      
                                                                 
    -   Statistical summary of all input/output variables        
    -   Distribution analysis and visualization                  
    -   Correlation analysis between inputs and outputs          
    -   Outlier detection and analysis                           
    -   Missing data assessment

2.  **Business Logic Hypothesis** (Technical report) - Rebecca

    -   Analysis of PRD and interview transcripts
    -   Proposed business rules and logic patterns
    -   Feature importance hypotheses
    -   Potential non-linear relationships identification

3.  **Feature Engineering Strategy** - Cynthia and Curtis

    -   Derived features (e.g., cost per mile, cost per day)
    -   Interaction terms and polynomial features
    -   Domain-specific transformations
    -   Feature scaling and normalization approaches

## Week 1 (10/6 - 10/12): Project Setup and Initial Analysis

-   Team formation and role assignment - ALL
-   Repository setup (ALL) and data exploration - Skylar
-   Initial PRD and interview analysis - Rebecca
-   Preliminary data insights presentation - Skylar?, ALL?

## Week 2 (10/13 - 10/19): Feature Engineering and Baseline Models

-   Complete EDA (Exploratory Data Analysis) report - Skylar
-   Feature engineering implementation - Skylar
-   Baseline model development - Cynthia, Curtis
-   Business logic hypothesis document - Rebecca




## Phase 2: Model Development (Week 3-5: Oct 20 - Nov 9)

**Required ML Approaches** (teams must implement at least 4, choose across the categories): - ALL

1.  **Linear Regression Variants**

    -   Simple linear regression
    -   Ridge/Lasso regression with regularization
    -   Polynomial regression

2.  **Tree-Based Methods**

    -   Decision trees with interpretability analysis
    -   Random Forest with feature importance
    -   Gradient Boosting (XGBoost, LightGBM)

3.  **Advanced Techniques**

    -   Support Vector Regression
    -   Neural Networks (MLPs)
    -   Ensemble methods (stacking, voting)

4.  **Rule-Based Learning**

    -   Decision rule extraction
    -   Association rule mining
    -   Symbolic regression (optional bonus)

**Model Evaluation Framework**: - Cynthia, Curtis, and Skylar

-   Cross-validation strategies (time-series aware if applicable)
-   Multiple evaluation metrics (MAE, RMSE, accuracy within thresholds)
-   Overfitting detection and prevention
-   Model interpretability analysis

## Week 3-4 (10/20 - 11/2): Advanced Model Development

-   Implementation of required ML approaches - Cynthia
-   Model comparison and evaluation - Cynthia, Skylar?
-   Hyperparameter optimization - Cynthia
-   Mid-project progress presentation - Rebecca

## Week 5-6 (11/3 - 11/16): Model Refinement and Ensemble

-   Ensemble method development - Cynthia, Curtis
-   Model interpretability analysis - Skylar
-   Production code implementation - Curtis
-   Comprehensive testing - Curtis



### Phase 3: System Integration (Week 6-7: Nov 10 - Nov 23)

**Implementation Requirements**:

1.  **Production-Ready Code** - Curtis

    -   Script must take exactly 3 parameters and output a single number
    -   Must run in under 5 seconds per test case
    -   Work without external dependencies (no network calls, databases, etc.)
    -   Error handling and input validation

2.  **Model Pipeline** - Cynthia

    -   Feature preprocessing pipeline
    -   Model ensemble or selection logic
    -   Post-processing and rounding logic
    -   Comprehensive testing framework

3.  **Documentation** - Cynthia and Curtis

    -   Code documentation and comments
    -   Model architecture description
    -   Feature engineering rationale
    -   Deployment instructions

## Week 5-6 (11/3 - 11/16): Model Refinement and Ensemble

-   Ensemble method development - Cynthia, Curtis
-   Model interpretability analysis - Skylar
-   Production code implementation - Curtis
-   Comprehensive testing - Curtis

## Week 7 (11/17 - 11/23): Integration and Validation

-   Final model validation and testing - Cynthia, Curtis
-   Performance optimization - Cynthia, Curtis
-   Documentation completion - Rebecca
-   Practice presentation - Rebecca



## Phase 4: Business Communication (Week 8: Nov 24 - Nov 30)

**Final Deliverables**:

1.  **Technical Report** (15-20 pages) - ALL (Rebecca can compile much of this with evidence/info provided from everyone else)

    -   Executive summary for business stakeholders - Rebecca
    -   Methodology and approach description - ALL?
    -   Model performance analysis and comparison - Cynthia and Curtis
    -   Business insights and discovered patterns - Skylar and Rebecca
    -   Recommendations for system improvement - ALL

2.  **Business Presentation** (20 minutes + Q&A) - Rebecca

    -   Problem context and approach
    -   Key findings and model insights
    -   Explanation of legacy system behavior
    -   Recommendations for SomeName, LLC.

3.  **Code Repository** - ALL

    -   Complete, documented codebase
    -   Reproducible analysis notebooks (Quarto/ RMarkdown)
    -   Model artifacts and evaluation results
    -   README with setup and usage instructions

## Week 8 (11/24 - 11/30): Final Presentation and Submission

-   Final business presentation - Rebecca
-   Technical report submission - 
-   Code repository finalization
-   Peer evaluation and reflection



# Weekly Milestones

## Week 1 (10/6 - 10/12): Project Setup and Initial Analysis

-   Team formation and role assignment
-   Repository setup and data exploration
-   Initial PRD and interview analysis
-   Preliminary data insights presentation

## Week 2 (10/13 - 10/19): Feature Engineering and Baseline Models

-   Complete EDA report
-   Feature engineering implementation
-   Baseline model development
-   Business logic hypothesis document

## Week 3-4 (10/20 - 11/2): Advanced Model Development

-   Implementation of required ML approaches
-   Model comparison and evaluation
-   Hyperparameter optimization
-   Mid-project progress presentation

## Week 5-6 (11/3 - 11/16): Model Refinement and Ensemble

-   Ensemble method development
-   Model interpretability analysis
-   Production code implementation
-   Comprehensive testing

## Week 7 (11/17 - 11/23): Integration and Validation

-   Final model validation and testing
-   Performance optimization
-   Documentation completion
-   Practice presentation

## Week 8 (11/24 - 11/30): Final Presentation and Submission

-   Final business presentation
-   Technical report submission
-   Code repository finalization
-   Peer evaluation and reflection

# Advanced Extensions (Bonus Points)

## Interpretable AI Challenge

-   Implement SHAP or LIME for model interpretability
-   Develop custom visualization for business rule explanation
-   Create decision tree surrogate models
-   Extract symbolic rules from ensemble models

## Time Series Analysis

-   Investigate temporal patterns in reimbursement policies
-   Implement time-aware cross-validation
-   Analyze policy changes over the 60-year period
-   Develop change-point detection algorithms

## Automated Machine Learning

-   Implement automated feature selection
-   Develop automated model selection pipeline
-   Create automated hyperparameter optimization
-   Build ensemble model selection framework

## Business Intelligence Dashboard

-   Create interactive dashboard for business users
-   Implement real-time prediction interface
-   Develop what-if scenario analysis tools
-   Build model performance monitoring system

# Resources and Support

## Technical Resources

-   Course lecture materials on supervised learning (ECU Canvas)
-   Scikit-learn, XGBoost, and TensorFlow documentation (On the Web)
-   Jupyter notebook templates and examples
-   Sample code for model evaluation and interpretation

## Business Context

-   Travel and expense management industry resources
-   Corporate reimbursement policy examples
-   Business analysis and requirements gathering guides
-   Stakeholder communication best practices

## Collaboration Tools

-   GitHub repository templates
-   Slack workspace for team communication (or any other tool such as Jira, Taiga)
-   Peer review and feedback sessions