# Bikeshare Submission

## Packages Used

In [None]:
import sys
sys.path.append('..')

import pandas as pd
import os
from src import DataLoader

You are provided hourly rental data spanning two years. For this competition, the training set is comprised of the first 19 days of each month, while the test set is the 20th to the end of the month. You must predict the total count of bikes rented during each hour covered by the test set, using only information available prior to the rental period.

Data Fields
- datetime - hourly date + timestamp  
- season 
    - 1 = spring, 
    - 2 = summer, 
    - 3 = fall, 
    - 4 = winter 
- holiday - whether the day is considered a holiday
- workingday - whether the day is neither a weekend nor holiday
- weather 
    - 1: Clear, Few clouds, Partly cloudy, Partly cloudy
    - 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
    - 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
    - 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog 
- temp - temperature in Celsius
- atemp - "feels like" temperature in Celsius
- humidity - relative humidity
- windspeed - wind speed
- casual - number of non-registered user rentals initiated
- registered - number of registered user rentals initiated
- count - number of total rentals <- PREDICTION

# Training without Using Data Engineering

In [None]:
from autogluon.tabular import TabularPredictor

## Extracting Training and Testing Data

In [None]:
loader = DataLoader()

loader.load_feature_engineered("no_data_engineering")

# Loading raw trained and test data to process without any changes
train_df, test_df = loader.get_train_test_data()

## - Training without New Features

In [None]:
predictor = TabularPredictor(
        label='count',
        path='autogluon',
        eval_metric='root_mean_squared_error',
    ).fit(
        train_df,
        time_limit=600,
        presets='best_quality'
    )

## - Showing the Leatherboard

In [None]:
try:
    display(predictor.leaderboard(silent=True))
except:
    predictor = TabularPredictor.load('autogluon')
    display(predictor.leaderboard(silent=True))

## - Predictions on Test Data

In [None]:
predictions = predictor.predict(test_df)
predictions.head()

In [None]:
# Identifying negative predictions
predictions.describe()

In [None]:
# Counting negative predictions
negative_prediction_count = (predictions < 0).sum()

print(f"There are {negative_prediction_count} negative predictions to set to zero.")

## - Setting up for Submission

In [None]:
# Setting negative predictions to zero
predictions[predictions < 0] = 0

predictions.describe()

In [None]:
submission_df = pd.read_csv('../data/sampleSubmission.csv')
display(submission_df.head())
submission_df.shape

In [None]:
submission_df['count'] = predictions
submission_df.to_csv('submission.csv', index=False)
display(submission_df.head())
submission_df.shape

## - Submitting Initial Predictions

In [None]:
# Getting the best model name
best_model = predictor.model_best
best_model

In [None]:
# Already submitted
# !kaggle competitions submit -c bike-sharing-demand -f submission.csv -m f"irst submission with {best_model}"

In [None]:
!kaggle competitions submissions -c bike-sharing-demand

# Training with Feature Engineering

In [None]:
import sys
sys.path.append('..')

from src import DataLoader

## - Extract Feature Engineered Data

In [None]:
loader = DataLoader()

loader.load_feature_engineered("no_data_engineering")
# Loading raw trained and test data with changes to the season and weather columns
loader.set_as_category(columns=["season", "weather"])
train_df, test_df = loader.get_train_test_data()

In [None]:
train_df.dtypes

## - Re-training the model with categorical columns

In [None]:
predictor_new_features = TabularPredictor(
    label='count',
    path='autogluon-new-features',
    eval_metric='root_mean_squared_error',
).fit(
    train_df,
    time_limit=600,
    presets='best_quality'
)

## - Showing the Leatherboard

In [None]:
try:
    display(predictor_new_features.leaderboard(silent=True))
except:
    predictor_new_features = TabularPredictor.load('autogluon-new-features')
    display(predictor_new_features.leaderboard(silent=True))

## - Predictions on Test Data

In [None]:
predictions_new_features = predictor_new_features.predict(test_df)

## - Setting up for Submission

In [None]:
# Setting negative predictions to zero
predictions_new_features[predictions_new_features < 0] = 0

In [None]:
submission_new_features_df = pd.read_csv('../data/sampleSubmission.csv')
submission_new_features_df['count'] = predictions_new_features
submission_new_features_df.to_csv('submission-new-features.csv', index=False)

In [None]:
# Compare old and new submissions
submission_df = pd.read_csv('submission.csv')
submission_new_features_df = pd.read_csv('submission-new-features.csv')

submission_df.merge(submission_new_features_df, on='datetime', suffixes=('_old', '_new'))

## - Submitting New Predictions

In [None]:
best_model = predictor_new_features.model_best
best_model

In [None]:
# Already submitted
# !kaggle competitions submit -c bike-sharing-demand -f submission-new-features.csv -m "new features with {best_model}"

In [None]:
!kaggle competitions submissions -c bike-sharing-demand

# Training with Hyperparameter Tuning
The following documentation show how to train a model with hyperparameter tuning using AutoGluon.

Documentation: https://auto.gluon.ai/stable/api/autogluon.tabular.TabularPredictor.fit.html

In [None]:
import sys
sys.path.append('..')

from src import DataLoader

## - Transforming Features as Categories

In [None]:
# Converting to categorical for better performance
loader = DataLoader()

loader.load_feature_engineered(checkpoint_name='hyperparameter_tuning')
loader.set_as_category(columns=["season", "weather"])

train_df, test_df = loader.get_train_test_data()

train_df.dtypes

## - Creating a Validation Set for Local Scoring

In [None]:
from sklearn.model_selection import train_test_split

train_val_df, val_df = train_test_split(train_df, test_size=0.2, random_state=42)

print(f""" 
Train shape: {train_val_df.shape}
Validation shape: {val_df.shape}    
""")

train_val_df.head()

In [None]:
val_df.head()

In [None]:
test_df.head()

## - Re-train with One Hot Encoding & Hyperparameter Tuning

In [None]:
from autogluon.tabular.configs.hyperparameter_configs import get_hyperparameter_config
from autogluon.tabular import TabularPredictor
from autogluon.common import space

In [None]:
hyperparameters = {
    'NN_TORCH': {'num_epochs': 10, 'activation': 'relu', 'dropout_prob': space.Real(0.0, 0.5)},
    'GBM': {'num_boost_round': 1000, 'learning_rate': space.Real(0.01, 0.1, log=True)},
    'XGB': {'n_estimators': 1000, 'learning_rate': space.Real(0.01, 0.1, log=True)}
}


hyper_timeout = 1 * 60  # seconds
time_limit = 3 * 60
print(f"Hyperparameter optimization time: {hyper_timeout/60} minutes")
print(f"Time limit: {time_limit/60} minutes")

# Custom hyperparameter tuning configuration
hyperparameter_tune_kwargs = {
    'num_trials': 20,  # Number of trials to run
    'scheduler': 'local',  # Scheduler to use for parallel training
    'searcher': 'bayes',  # Searcher to use for hyperparameter optimization
    'time_out': hyper_timeout,  # Time limit in seconds for each call to the ML model
}

predictor_new_hpo = TabularPredictor(
    label='count',
    path='autogluon-new-hpo',
    eval_metric='root_mean_squared_error'    
)

predictor_new_hpo.fit(
    train_val_df,
    time_limit=time_limit,
    presets='best_quality',
    hyperparameters=hyperparameters,
    hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
    num_cpus=6,
    num_gpus=1,
    num_stack_levels=3,
    verbosity=1
)

In [None]:
predictor_new_hpo.leaderboard(silent=True)

In [None]:
performance = predictor_new_hpo.evaluate(val_df)
performance

In [None]:
predictions_new_hpo = predictor_new_hpo.predict(val_df)

In [None]:
# Replace negative predictions with zero
predictions_new_hpo[predictions_new_hpo < 0] = 0

In [None]:
# Calculating scores of predictions
from sklearn.metrics import mean_squared_log_error

mean_squared_log_error(val_df["count"], predictions_new_hpo)

## - Loading the Best Model and Predicting on Test Data

In [None]:
best_model = predictor_new_hpo.model_best
print(f"The best model is {best_model}")

In [None]:
saved_predictor = TabularPredictor.load('autogluon-new-hpo')
saved_predictor.leaderboard(silent=True)

In [None]:
hyper_tunning_prediction_df = saved_predictor.predict(test_df)

# Replace negative predictions with zero
hyper_tunning_prediction_df[hyper_tunning_prediction_df < 0] = 0

## - Submitting Fine Tuned Predictions

In [None]:
# submission
submission_hyper_tunning_df = pd.read_csv('../data/sampleSubmission.csv')
submission_hyper_tunning_df['count'] = hyper_tunning_prediction_df
submission_hyper_tunning_df.to_csv('submission-hyper-tunning.csv', index=False)

!kaggle competitions submit -c bike-sharing-demand -f submission-hyper-tunning.csv -m "hyperparameter tunning with {best_model}"

In [None]:
!kaggle competitions submissions -c bike-sharing-demand

# Working: Training with Hyperparameter Tuning & Extra Features

In [None]:
import sys
sys.path.append('..')

from src import DataLoader

## - Loading the Data

In [None]:
loader = DataLoader()

In [None]:
loader.load_feature_engineered("extra_feature_engineering")
loader.set_as_category(columns=["season", "weather"])

train_df, test_df = loader.get_train_test_data()

## - Creating Validation Set for Local Scoring

In [None]:
from sklearn.model_selection import train_test_split

train_val_df, val_df = train_test_split(train_df, test_size=0.2, random_state=42)

print(f""" 
Train shape: {train_val_df.shape}
Validation shape: {val_df.shape}  
Test shape: {test_df.shape}  
""")

train_val_df.head()

## - Re-train with Extra Features & Hyperparameter Tuning

In [None]:
from autogluon.tabular.configs.hyperparameter_configs import get_hyperparameter_config
from autogluon.tabular import TabularPredictor
from autogluon.common import space

In [None]:
hyperparameters = {
    'NN_TORCH': {'num_epochs': 10, 'activation': 'relu', 'dropout_prob': space.Real(0.0, 0.5)},
    'GBM': {'num_boost_round': 1000, 'learning_rate': space.Real(0.01, 0.1, log=True)},
    'XGB': {'n_estimators': 1000, 'learning_rate': space.Real(0.01, 0.1, log=True)}
}


hyper_timeout = 1 * 60  # seconds
# time_limit = 15 * 60
print(f"Hyperparameter optimization time: {hyper_timeout/60} minutes")
print(f"Time limit: {time_limit/60} minutes")

# Custom hyperparameter tuning configuration
hyperparameter_tune_kwargs = {
    'num_trials': 20,  # Number of trials to run
    'scheduler': 'local',  # Scheduler to use for parallel training
    'searcher': 'bayes',  # Searcher to use for hyperparameter optimization
    'time_out': hyper_timeout,  # Time limit in seconds for each call to the ML model
}

predictor_extra_hpo = TabularPredictor(
    label='count',
    path='autogluon-extra-hpo',
    eval_metric='root_mean_squared_error'    
)

predictor_extra_hpo.fit(
    train_val_df,
    # time_limit=time_limit,
    presets='best_quality',
    hyperparameters=hyperparameters,
    hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
    num_cpus=6,
    num_gpus=1,
    num_stack_levels=3,
    verbosity=1
)

In [None]:
predictor_extra_hpo.leaderboard(silent=True)

In [None]:
performance = predictor_extra_hpo.evaluate(val_df)
performance

In [None]:
predictions_extra_hpo = predictor_extra_hpo.predict(val_df)

In [None]:
# Replace negative predictions with zero
predictions_extra_hpo[predictions_extra_hpo < 0] = 0

In [None]:
# Calculating scores of predictions
from sklearn.metrics import mean_squared_log_error

mean_squared_log_error(val_df["count"], predictions_extra_hpo)

## - Loading the Best Model and Predicting on Test Data

In [None]:
best_model = predictor_extra_hpo.model_best
print(f"The best model is {best_model}")

In [None]:
saved_predictor = TabularPredictor.load('autogluon-extra-hpo')
saved_predictor.leaderboard(silent=True)

In [None]:
hyper_tunning_prediction_df = saved_predictor.predict(test_df)

# Replace negative predictions with zero
hyper_tunning_prediction_df[hyper_tunning_prediction_df < 0] = 0

## - Submitting Fine Tuned Predictions

In [None]:
import pandas as pd

In [None]:
# submission
submission_hyper_tunning_df = pd.read_csv('../data/sampleSubmission.csv')
submission_hyper_tunning_df['count'] = hyper_tunning_prediction_df
submission_hyper_tunning_df.to_csv('submission-hyper-tunning.csv', index=False)

!kaggle competitions submit -c bike-sharing-demand -f submission-hyper-tunning.csv -m "hyperparameter tunning with extra features {best_model}"

In [None]:
!kaggle competitions submissions -c bike-sharing-demand

# Including Custom Models on Hyperparameter Tuning

In [None]:
import sys
sys.path.append('..')

from src import DataLoader

## - Loading the data

In [None]:
loader = DataLoader()

In [None]:
loader.load_feature_engineered(checkpoint_name='hyperparameter_tuning')
loader.set_as_category(columns=["season", "weather"])

train_df, test_df = loader.get_train_test_data()

## - Creating Validation Set for Local Scoring

In [None]:
from sklearn.model_selection import train_test_split

train_val_df, val_df = train_test_split(train_df, test_size=0.2, random_state=42)

print(f""" 
Train shape: {train_val_df.shape}
Validation shape: {val_df.shape}  
Test shape: {test_df.shape}  
""")

train_val_df.head()

### TODO: Random Forest Regressor

TODO: Apply advanced ensemble techniques to improve the model performance.

Here is a list of ensemble techniques that you can use to improve the model performance:
- Stacking: Stacked Generalization
- Blending: Weighted Average
- Bagging: Bootstrap Aggregating
- Boosting: AdaBoost, Gradient Boosting, XGBoost, LightGBM, CatBoost

#### - Grid Search

In [None]:
# Import necessary libraries
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

# Random Forest Regressor model
model = RandomForestRegressor(random_state=42)

# Enhanced parameter grid
param_grid = {
    'n_estimators': [100, 200, 300, 400, 500],
    'max_depth': [10, 20, 30, 40, 50],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}


# Set up GridSearchCV with cross-validation
grid_search = GridSearchCV(
    model,
    param_grid,
    cv=5,
    n_jobs=-1,
    verbose=2
)

# Fit the grid search with your training data
grid_search.fit(train_val_df.drop(columns=['count', 'date']), train_val_df['count'])

# Output the best parameters
print(f"Best parameters: {grid_search.best_params_}")

# Use the best model to make predictions on the validation/test set
best_model = grid_search.best_estimator_

# Prepare validation data for predictions
X_val = val_df.drop(columns=['count', 'date'])  # Features from validation data
y_val = val_df['count']  # True target values

# Predict on validation data using the best model
predictions = best_model.predict(X_val)

# Replace negative predictions with zero (if required for your use case)
predictions[predictions < 0] = 0

# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_val, predictions)

# Calculate Mean Absolute Error (MAE)
mae = mean_absolute_error(y_val, predictions)

# Calculate R² score
r2 = r2_score(y_val, predictions)

# Print the evaluation metrics
print(f"Mean Squared Error (MSE): {mse}")
print(f"Mean Absolute Error (MAE): {mae}")
print(f"R² Score: {r2}")


#### - Random Search

In [None]:
# Import necessary libraries
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

# Random Forest Regressor model
model = RandomForestRegressor(random_state=42)

# Enhanced parameter distribution (same as param_grid but for RandomizedSearchCV)
param_distributions = {
    'n_estimators': [100, 200, 300, 400, 500],
    'max_depth': [10, 20, 30, 40, 50],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Set up RandomizedSearchCV with cross-validation
random_search = RandomizedSearchCV(
    model,
    param_distributions,
    n_iter=50,  # Number of random combinations to try
    cv=5,  # 5-fold cross-validation
    n_jobs=-1,  # Use all available processors
    verbose=2,  # Verbosity level
    random_state=42  # Ensures reproducibility
)

# Fit the RandomizedSearchCV with your training data
random_search.fit(train_val_df.drop(columns=['count', 'date']), train_val_df['count'])

# Output the best parameters
print(f"Best parameters: {random_search.best_params_}")

# Use the best model to make predictions on the validation/test set
best_model = random_search.best_estimator_

# Prepare validation data for predictions
X_val = val_df.drop(columns=['count', 'date'])  # Features from validation data
y_val = val_df['count']  # True target values

# Predict on validation data using the best model
predictions = best_model.predict(X_val)

# Replace negative predictions with zero (if required for your use case)
predictions[predictions < 0] = 0

# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_val, predictions)

# Calculate Mean Absolute Error (MAE)
mae = mean_absolute_error(y_val, predictions)

# Calculate R² score
r2 = r2_score(y_val, predictions)

# Print the evaluation metrics
print(f"Mean Squared Error (MSE): {mse}")
print(f"Mean Absolute Error (MAE): {mae}")
print(f"R² Score: {r2}")


## KNN Regressor (Abandoned)
> The KNN Regressor was abandoned due to performing poorly on the dataset compared to the AutoGluon Weighted Ensembles L2.

### - Grid search

In [None]:
# Import necessary libraries
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import BaggingRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_log_error, r2_score

# Define the KNN model
base_knn = KNeighborsRegressor()

# Wrap the KNN regressor inside a Bagging Regressor
model = BaggingRegressor(estimator=base_knn, n_estimators=50, n_jobs=-1, random_state=42)

# Define hyperparameter grid for the base estimator (KNN)
param_grid = {
    'estimator__n_neighbors': [3, 5, 7, 9, 11],
    'estimator__weights': ['uniform', 'distance'],
    'estimator__algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
    'estimator__leaf_size': [10, 20, 30],
    'estimator__p': [1, 2]  # Minkowski distance: p=1 (Manhattan), p=2 (Euclidean)
}

# Set up GridSearchCV with cross-validation
grid_search = GridSearchCV(
    model,
    param_grid,
    cv=10,
    n_jobs=-1,
    verbose=2
)

# Normalize the training data
scaler = StandardScaler()
train_val_df_scaled = scaler.fit_transform(train_val_df.drop(columns=['count', 'date']))

# Fit the GridSearchCV with scaled training data
grid_search.fit(train_val_df_scaled, train_val_df['count'])

# Output the best parameters from GridSearchCV
print(f"Best parameters: {grid_search.best_params_}")

# Scale the validation data using the same scaler
val_df_scaled = scaler.transform(val_df.drop(columns=['count', 'date']))

# Predict on the scaled validation data
predictions = grid_search.predict(val_df_scaled)

# Replace negative predictions with zero
predictions[predictions < 0] = 0

# Ensure there are no zero values in target when calculating mean_squared_log_error
msle = mean_squared_log_error(val_df["count"] + 1e-10, predictions + 1e-10)
r2 = r2_score(val_df["count"], predictions)

# Print the evaluation metrics
print(f"Mean Squared Log Error: {msle}")
print(f"R2 Score: {r2}")


### - Random Search

In [None]:
# Import necessary libraries
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import BaggingRegressor
from sklearn.model_selection import RandomizedSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_log_error, r2_score

# Define the KNN model
base_knn = KNeighborsRegressor()

# Wrap the KNN regressor inside a Bagging Regressor
model = BaggingRegressor(estimator=base_knn, n_estimators=50, n_jobs=-1, random_state=42)

# Define hyperparameter distributions for the base estimator (KNN)
param_distributions = {
    'estimator__n_neighbors': [1, 3, 5, 7, 9, 11, 13, 15, 20, 25],
    'estimator__weights': ['uniform', 'distance'],
    'estimator__algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
    'estimator__leaf_size': [10, 20, 30, 40, 50],
    'estimator__p': [1, 2]  # Minkowski distance: p=1 (Manhattan), p=2 (Euclidean)
}

# Set up RandomizedSearchCV with cross-validation
random_search = RandomizedSearchCV(
    model,
    param_distributions,
    n_iter=50,
    cv=10,
    n_jobs=-1,
    verbose=2
)

# Normalize the training data
scaler = StandardScaler()
train_val_df_scaled = scaler.fit_transform(train_val_df.drop(columns=['count', 'date']))

# Fit the RandomizedSearchCV with scaled training data
random_search.fit(train_val_df_scaled, train_val_df['count'])

# Output the best parameters from RandomizedSearchCV
print(f"Best parameters: {random_search.best_params_}")

# Scale the validation data using the same scaler
val_df_scaled = scaler.transform(val_df.drop(columns=['count', 'date']))

# Predict on the scaled validation data
predictions = random_search.predict(val_df_scaled)

# Replace negative predictions with zero
predictions[predictions < 0] = 0

# Ensure there are no zero values in target when calculating mean_squared_log_error
msle = mean_squared_log_error(val_df["count"] + 1e-10, predictions + 1e-10)
r2 = r2_score(val_df["count"], predictions)

# Print the evaluation metrics
print(f"Mean Squared Log Error: {msle}")
print(f"R2 Score: {r2}")


## TODO: Neural Nets

## TODO: XGBoost

## Detailed Analysis of Custom Model Performance

# Rubric Validation

Here's the rubric in bullet point format:

### Loading the Dataset

- **Download the Bike Sharing Demand data from Kaggle:**
  - <input type='checkbox' checked/> Student uses the Kaggle CLI with the Kaggle API token to download and unzip the Bike Sharing Demand dataset into Sagemaker Studio (or local development). 
  
- **Load all datasets from the Bike Sharing Demand competition into Pandas:**
  - <input type='checkbox' checked/> Student uses Pandas' `read_csv()` function to load the train, test, and sample submission files into DataFrames.
  - <input type='checkbox' checked/> Once loaded, the DataFrames can be viewed in the Jupyter notebook.

### Feature Creation and Data Analysis

- **Create a feature and add it to the train and test dataset:**
  - <input type='checkbox' checked> Student extracts data from one feature column to create a new feature column in both the train and test datasets.

- **Create a histogram of all features in the train dataset:**
  - <input type='checkbox' checked> Student creates a Matplotlib image showing histograms of each feature column in the train DataFrame.

- **Change the datatype of features in the train and test dataset:**
  - <input type='checkbox' checked> Student assigns categorical data types to feature columns that were initially typed as numeric values.

### Model Training With AutoGluon

- **Train a Tabular Prediction model on the training set:**
  - <input type='checkbox' checked> Student uses the `TabularPredictor` class from AutoGluon to create a predictor by calling `.fit()`.

- **Change the hyperparameters when training a Tabular Prediction model:**
  - <input type='checkbox' checked> Student provides additional arguments in the `TabularPredictor .fit()` function to adjust hyperparameters during training.

- **Make predictions with a trained model on a test dataset:**
  - <input type='checkbox' checked> Student uses the predictor created by fitting a model with `TabularPredictor` to predict new values from the test dataset.

### Compare Model Performance

- **Submit a prediction submission from a model to Kaggle for scoring:**
  - <input type='checkbox' checked> Student uses the Kaggle CLI to submit their predictions from the trained AutoGluon Tabular Predictor to Kaggle for public score submission.

- **Graph changes in their model evaluation metric after each model iteration:**
  - Student uses Matplotlib or Google Sheets/Excel to chart model performance metrics in a line chart.
  - The metric is derived from either `fit_summary()` or `leaderboard()` of the predictor.
  - Y-axis: metric number; X-axis: each model iteration.

- **Graph changes to their Kaggle competition score after each model iteration:**
  - Student uses Matplotlib or Google Sheets/Excel to chart changes in the competition score.
  - Y-axis: Kaggle score; X-axis: each model iteration.

### Competition Report

- **Identify which model from AutoGluon performed the best from fitting the train data to the Tabular Predictor:**
  - The report uses `fit_summary()` or `leaderboard()` to detail the results of the training run, indicating the best model as the first entry.

- **Show how doing EDA led to discoveries in the data that impacted model performance:**
  - The report discusses how adding additional features and changing hyperparameters directly improved the Kaggle score.

- **Explain why changes to hyperparameters affected the outcome of the model’s performance:**
  - The report contains a table outlining each hyperparameter used along with the corresponding Kaggle score for each iteration.
  - The report explains why specific changes to a hyperparameter affected the model's performance outcome.

# END OF NOTEBOOK