# Challange Activities

This notebook in a markdown formart describe all the steps and answers the proposed challange activities.


### 1 - What steps would you take to solve this problem?

Each of the steps from 1 to 5 presented below will be applied and detailed in a corresponding *Jupyter Notebook* with the respective number.

The steps 6 to 8 are embedded in the notebook number 5.

**Step 1 - Data Understanding and Cleaning:**
- Load the datasets and understand their structure and contents.
    - This part of Step 1 addresses the multiple-choice questions of the challenge.
- Handle missing values denoted by na.
- Generate new cleaned data files for easy access and use.

**Step 2 - Exploratory Data Analysis (EDA):**

- Perform descriptive statistics to understand the distribution of data.
- Visualize data to identify patterns, correlations, and anomalies.

**Step 3 - Feature Engineering:**

- Normalize or standardize numerical features if necessary.
- Apply SMOTE (Synthetic Minority Over-sampling Technique) to balance the class distribution by generating synthetic samples for the minority class.

**Step 4 - Dimensionality Reduction:**

- Use PCA (Principal Component Analysis) and Factor Analysis as techniques to reduce the number of features while retaining most of the variance.
- Use feature selection methods to identify the most important features.
    - Recursive Feature Elimination (RFE)
    - Feature Importance from Tree-based Models (e.g., Random Forest)

**Step 5 - Model Training and Evaluation:**

- Split the data into training and test sets, ensuring the test set reflects the present year's data.
- Train multiple predictive models, such as:
    - KNN Classifier
    - Random Forest
    - Neural Networks
- Use techniques like cross-validation to tune hyperparameters and avoid overfitting.
- Evaluate the models using relevant metrics.
- Select the best-performing model based on evaluation results.

**Step 6 - Interpretability and Insights:**

- Use techniques like SHAP (SHapley Additive exPlanations) or LIME (Local Interpretable Model-agnostic Explanations) to interpret the model's predictions.
- Identify the main factors contributing to air system failures.

**Step 7 - Business Metrics and Recommendations:**

- Translate technical metrics to business metrics to demonstrate cost savings.
- Provide actionable recommendations based on model insights.

**Step 8 - Presentation:**

- Prepare a comprehensive report and presentation for the executive board, highlighting the potential cost savings and the main factors leading to air system failures.

### 2 - Which technical data science metric would you use to solve this challenge?

- **Accuracy:** To measure the proportion of correctly predicted maintenance needs.
- **Precision and Recall:** To balance the trade-off between identifying true positives and avoiding false negatives.
- **F1 Score:** To provide a single metric that balances precision and recall. (Specially the Macro Avg).

### 3 - Which business metric would you use to solve the challenge?

- **Cost Savings:** The primary business metric will be the reduction in maintenance costs for the air system.

### 4 - How do technical metrics relate to the business metrics?

- Reducing false negatives (trucks with defects not identified) translates directly to cost savings by avoiding expensive corrective maintenance.
- Improving precision reduces unnecessary preventive maintenance costs, ensuring resources are allocated efficiently.

### 5 - What types of analyzes would you like to perform on the customer database?

- If we had access to the specific data and time of the data collected:
    - Trend analysis of maintenance costs and occurences.
    - Failure rate analysis over time.
- Correlation analysis between features and air system failures.
- Cost impact analysis of false negatives and false positives.

### 6 - What techniques would you use to reduce the dimensionality of the problem? 

- Principal Component Analysis (PCA)

### 7 - What techniques would you use to select variables for your predictive model?

- Recursive Feature Elimination (RFE)
- Feature Importance from Tree-based Models (e.g., Random Forest)

### 8 - What predictive models would you use or test for this problem? 

- **KNN Classifier:** Simple and effective for capturing complex relationships by considering the distance between data points.
- **Random Forest:** For capturing non-linear relationships and feature importance.
- **Gradient Boosting Machines (GBM):** For high-performance prediction. (I am running in a MacOS Env., so I have some limitations to install and run XGBoost, for example, in the available time).
- **Neural Networks:** For capturing complex patterns in the data.

### 9 - How would you rate which of the trained models is the best?

- Use of technical metrics:
    - **Macro Average F1-Score**
    - **Confusion Matrix:** To visualize true positives, false positives, true negatives, and false negatives.
- Use of business metrics:
    - **Cost Analysis:** To quantify the cost savings achieved by the model in reducing maintenance expenses.

### 10 - How would you explain the result of your model? Is it possible to know which variables are most important?

It's really important to know which variables are most important, both from a technical side to develop better models and from a business side to correctly alert the company about the true root causes.

- **Feature Importance:** Use model-specific methods to identify which features contribute most to predictions.
- **SHAP Values:** Explain individual predictions by showing the impact of each feature.

### 11 - How would you assess the financial impact of the proposed model?

- **Calculate Savings:** Estimate cost savings by comparing predicted maintenance needs versus actual maintenance costs.
- **Scenario Analysis:** Model different scenarios to see potential cost impacts.

### 12 - What techniques would you use to perform the hyperparameter optimization of the chosen model?

- **Grid Search:** Exhaustive search over a specified parameter grid.
- **Random Search:** Randomly samples parameter combinations.

### 13 - What risks or precautions would you present to the customer before putting this model into production?

- **Model Overfitting:** Ensure the model generalizes well to unseen data.
- **Data Quality:** Monitor and address potential issues with missing or erroneous data.
- **Interpretability:** Ensure the model's decisions can be understood by stakeholders.

### 14 - If your predictive model is approved, how would you put it into production?

This is one of the most critical steps of the project, for this reason I would use the follow steps:

**Integration:**

- **Model Serving API:** To integrate the model with existing systems, we can build a Model Serving API in Python. A backbone of this structure is available on my GitHub repository for Flask and FastAPI. See the links below:

    - https://github.com/michelhilg/model-serving-flaskAPI
    - https://github.com/michelhilg/model-serving-fastAPI

    This backbone provides a foundation for serving the model and can be customized to meet the specific needs.

**Deployment:**

- **Docker Containers:** For deployment, we can use Docker containers to encapsulate the model and its dependencies. Docker containers ensure that the model runs consistently across different environments, making it easier to manage and scale.

- **Production Environment:** Deploy the Docker containers inside the client's production environment. This approach helps in maintaining compatibility with the existing infrastructure and facilitates seamless integration.

**Documentation:**

- **Model Documentation:** Document the model's assumptions, features, and usage comprehensively. This documentation will be crucial for ensuring that stakeholders understand how the model works and can effectively use it.

### 15 - If the model is in production, how would you monitor it?

- **Performance Tracking:** Regularly check the model's accuracy and other metrics.
- **Feedback Loop:** Collect feedback from users and adjust the model as needed.

For the Performance Tracking, we can some tools like:

- **Prometheus & Grafana:**

    - **Prometheus:** Collects and stores metrics from your application. It can monitor model performance metrics and alert you based on thresholds.

    - **Grafana:** Visualizes the metrics collected by Prometheus. It allows you to create dashboards and monitor the performance of your model in real time.

    - Both os them runs also inside docker containers, which can help and the deployment section.

- **Custom Tracking:**

    - Build a simple web interface that runs directly within the client's production environment. This approach provides a straightforward and effective solution for monitoring the model, allowing for real-time tracking of performance metrics and easy access to monitoring data.

### 16 - If the model is in production, how would you know when to retrain it?

**1. Performance Drift:**

- **Performance Metrics Monitoring:** Regularly track performance metrics such as accuracy, precision, recall, or F1 score. Retrain the model if these metrics show significant degradation over time.

- **Drift Detection Tools:**

    - **Evidently AI:** Provides tools to detect and visualize performance drift, including data and concept drift.
    - **Deep Checks:** More focused on LLMs models, but with a nice set of Python tools for comprehensive model monitoring and validation, specially in terms of data drift as well.
    - **Custom Alerts:** Set up custom alerts in monitoring tools like Prometheus or Datadog to notify you when performance metrics fall below predefined thresholds.

    Now that we already know that the model should be retrained, we can apply some periodic retraining methods:

- **Periodic Retraining:**

    - **MLflow:** Helps manage the machine learning lifecycle, including scheduling regular retraining based on new data availability.