<p style="padding: 10px; border: 1px solid black;">
<img src="./../images/MLU-NEW-logo.png" alt="drawing" width="400"/> <br/>

# MLU Day One Machine Learning - Hands On

This hands-on notebook will let you practice the concepts you have learned in this course so far.
In the notebook, you will explore a database of books (books of different genres, from thousands of authors).
The goal is to predict book prices using book features.

__Business Problem:__ Books from a large database of books - different genres, thousands of authors, etc., cannot be listed for sale because they are missing one critical piece of information, the price. 

__ML Problem Description:__ Predict book prices using book features, such as genre, release data, ratings, number of reviews.  
> This is a __regression__ task (we have a book price column in our train dataset that we can use as labels). <br>

----


To generate book price predictions, you will be presented with two kinds of exercises throughout the notebook: __TASKS__ and __CHALLENGES__. <br/>


| <img style="float: center;" src="./../images/task_robot.png" alt="drawing" width="100"/>| <img style="float: center;" src="./../images/challenge_robot.png" alt="drawing" width="130"/>|
|:---    |   ---  |
| No coding needed for theses tasks. <br /> Try to understand what is happening and run the cells & code associated to this. | These are challenges where you can practice your coding skills. <br /> Once done, uncomment the challenge answer and check your solution.| 

As we are not trying to measure your coding skills, you will find solutions throughout the notebook: 
All the challenges have answers that you can copy and paste into the challenge coding area: 
    
**No matter how experienced and skilled you are with coding, you will be able to submit a solution!**


----

The notebook consists of 2 parts; please work top to bottom and don't skip sections as this could lead to error messages due to missing code.

### <a href="#1">Part I - Leaderboard Submission</a>
In the first part of the notebook you are going to learn how [__AutoGluon__](https://auto.gluon.ai/stable/index.html#) can solve the book price prediction problem.<br/>

You will learn how to build a simple and quick base model and then implement iterations of this model to improve it. To measure how well you are doing (and to see how the model improves) you have to submit your model's predictions to the [__Book Prices Prediction MLU Leaderboard__](https://mlu.corp.amazon.com/contests/redirect/7).

MLU Leaderboard will assess your prediction performance against other participants. Your submission to the leaderboard also __counts towards your course completion__. 

We ask you to make 2 submissions in Part I:<br/>
1. First a simple prediction trained with a smaller dataset (for a quick first submission).
2. Then another prediction trained with a full dataset, in order to submit an improved result.

Feel free to keep improving your model and make as many submissions as you like to MLU Leaderboard. 

### <a href="#2">Part II - Advanced AutoGluon (OPTIONAL)</a>
In the second part of the notebook you will find some advanced features of AutoGluon. You're welcome to use the insights you can gain from Part II to make an optional 3rd submission. However, a quick word of warning - AutoGluon is very powerful in its base form so you might not see much additional model improvement on MLU Leaderboard.

----
</br>
</br>

## <a name="1">Part I - Leaderboard Submission</a>
Let's solve the book price prediction problem using __AutoGluon__.

- Part I - 1. <a href="#p1-1">Importing AutoGluon</a>
- Part I - 2. <a href="#p1-2">Getting the Data</a>
- Part I - 3. <a href="#p1-3">Model Training with AutoGluon (small train dataset)</a>
- Part I - 4. <a href="#p1-4">AutoGluon Training Results</a>
- Part I - 5. <a href="#p1-5">Model Prediction with AutoGluon</a>
- Part I - 6. <a href="#p1-6">First MLU Leaderboard Submission (with small train data)</a>
- Part I - 7. <a href="#p1-7">Second MLU Leaderboard Submission (with full train data)</a>


### <font color='red'>Please make sure to run the below cells!</font> 

The two code cells below install AutoGluon and will allow you to print solutions for the code challenges. They will take approx. 2-4 minutes to complete.

In [None]:
%%capture
!pip install -q autogluon==0.8.2

In [None]:
# Import utility functions that provide answers to challenges
%load_ext autoreload
%aimport dayone_utils
import pandas as pd

### <a name="p1-1">Part I - 1. Importing AutoGluon</a>


Now we load the libraries needed to work with our Tabular dataset.

In [None]:
# Importing the newly installed AutoGluon code library
from autogluon.tabular import TabularPredictor, TabularDataset

### <a name="p1-2">Part I - 2. Getting the Data</a>

Let's get the data for our business problem.

>  <img style="float: left; padding-right: 20px" src="./../images/task_robot.png" alt="drawing" width="100" /> 
>  Run the cell below to load the train and test data. Then continue and take a look at the first samples of our train dataset. <br/> This is a very basic check when performing Data Exploration.

In [None]:
df_train = TabularDataset(data="../data/training.csv")
df_test = TabularDataset(data="../data/mlu-leaderboard-test.csv")

In [None]:
df_train.head()

### <a name="p1-3">Part I - 3. Model Training with AutoGluon (small train dataset)</a>

We can train a model using AutoGluon with only a single line of code.  All we need to do is to tell it which column from the dataset we are trying to predict, and what the dataset is.


### Sampling data
For this first training, we are going to randomly sample 1000 samples of our train dataset in order to have a faster training.



> <img style="float: left; padding-right: 20px" src="./../images/task_robot.png" alt="drawing" width="100"/>  Run the cell below to prepare the datasets (AutoGluon is doing all the magic for us). <br/>
Here we are randomly selecting 1000 rows of our dataset and splitting it into train and validation datasets.
> 

<br/>

__NOTE__: The `random_state` parameter below allows to have repeatability when running the code multiple times.

In [None]:
# Run this cell

# Sampling 1000
subsample_size = 1000  # subsample subset of data for faster demo, try setting this to much larger values
df_train_smaller = df_train.sample(n=subsample_size, random_state=0)

# Printing the first rows
df_train_smaller.head()

### Training a model with our small sample

> <img style="float: left; padding-right: 20px" src="./../images/task_robot.png" alt="drawing" width="100"/> 
For this first training we are going to use the smaller dataset with 1000 samples of our original train dataset in order to have a faster training.

__NOTE__: AutoGluon uses certain defaults; generally these are good but there is one exception: `eval_metric`.  By default, AutoGluon uses `‘root_mean_squared_error’` as evaluation metric for regression problems. However, MLU Leaderboard is using the `‘mean_squared_error’` metric to measure submissions quality, so we need to explictly pass this metric to AutoGluon. For more information on these options, see sklearn [metrics](https://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics).


---
Let's use `TabularPredictor` to train the first version of our model.

__NOTE__: Training on this smaller dataset might still take approx. 3-4 minutes!

In [None]:
# Run this cell

smaller_predictor = TabularPredictor(
    label="Price", eval_metric="mean_squared_error"
).fit(train_data=df_train_smaller)

### Interpreting the Training Output
AutoGluon outputs a lot of information about what is happening.

<img style="float: left;" src="./../images/challenge_robot.png" alt="drawing" width="130"/> 
<br/><br/>
<br/>
<br/>

> After the prediction above finishes, examine the output and try to find the information below in the print out messages from AutoGluon. <br/>
1. What is the shape of your training dataset?
2. What kind of ML problem type does AutoGluon infer (classification, regression, ...)? Remember, you've never mentioned what kind of problem type it is; you only provided the label column.
3. What does AutoGluon suggest in case it inferred the wrong problem type?
4. Identify the kind of data preprocessing and feature engineering performed by AutoGluon.
5. Find the basic statistics about your label in the print statements from AutoGluon.
6. How many extra features were generated besides the originals in our dataset? What was the runtime for that?
7. What is the evaluation metric used?
8. What does AutoGluon suggests to do if it inferred the wrong metric?
9. What is the ration between train & validation dataset (try looking for `val` or `validation`)?
10. Identify where AutoGluon saved your predictor and how to load it - try to find this folder on your instance. 
11. Identify the folder where the models are saved - try to find this folder on your instance.

__Please, try hard to identify all information above before uncommenting the answer below.__ <br/>

################# LIST YOUR ANSWERS HERE #################
1. <br/>
2. <br/>
3. <br/>
4. <br/>
5. <br/>
6. <br/>
7. <br/>
8. <br/>
9. <br/>
10. <br/>
11. <br/>
12. <br/>

In [None]:
# ## CHALLENGE ANSWER
#dayone_utils.answer_html("CH_FIT_INFO")

### <a name="p1-4">Part I - 4. AutoGluon Results</a>
Now let's take a look at all the information AutoGluon provides via its __leaderboard function__. <br/> 

__NOTE__: Don't confuse this with the MLU Leaderboard. The MLU Leaderboard is where you will make submissions with the predictions from your trained models; the AutoGluon leaderboard function is a summary of all models that AutoGluon trained.

<br/>

> <img style="float: left; padding-right: 20px" src="./../images/challenge_robot.png" alt="drawing" width="130"/> 
> Run the cell below and take a closer look at AutoGluon's leaderboard output. <br/>


__Which one is the best model?__

<br/>

In [None]:
# Run this cell

smaller_predictor.leaderboard(silent=True)

In [None]:
# ## CHALLENGE ANSWER
#dayone_utils.answer_html("CH_BEST")

### <a name="p1-5">Part I - 5. Model Prediction with AutoGluon</a>
#### Now that your model is trained, let's use it to predict prices!

We should always run a final model performance assessment using data that was unseen by the model (the test data). Test data is not used during training and can therefore give a performance assessment. In our case, we will use the test data to make predictions and submit those to MLU Leaderboard in the next step.

> <img style="float: left; padding-right: 20px" src="./../images/task_robot.png" alt="drawing" width="100"/> 
> Run the cell below to show the test dataset that we will use for the MLU Leaderboard. 

In [None]:
# Run this cell

df_test.head()

> <img style="float: left; padding-right: 20px" src="./../images/challenge_robot.png" alt="drawing" width="130"/> Use this new dataset as input to the model you have just trained to predict Book Prices on it.

<br>

__TIP:__ look at the AutoGluon Tasks documentation and look for function __predict__ to see how to implement it:

```
"""
Use trained models to produce predictions of `label` column values for new data.

Parameters
----------
data : `TabularDataset` or `pd.DataFrame`
    The data to make predictions for.

model : str (optional)
    The name of the model to get predictions from. Defaults to None, which uses the highest scoring model on the validation set.

as_pandas : bool, default = True
    Whether to return the output as a `pd.Series` (True) or `np.ndarray` (False).
    
transform_features : bool, default = True
    If True, preprocesses data before predicting with models.
    If False, skips global feature preprocessing.

decision_threshold : float, default = None
    The decision threshold used to convert prediction probabilities to predictions. 
    Only relevant for binary classification, otherwise ignored.

Returns
-------
Array of predictions, one corresponding to each row in given dataset. Either :class:`np.ndarray` or :class:`pd.Series` depending on `as_pandas` argument.
"""
```

Please, try hard to identify all information above before uncomment the answer below. You know, it is about Learn and Be Curious, right?

In [None]:
############## CODE HERE ####################


############## END OF CODE ####################

In [None]:
# ## CHALLENGE ANSWER
#dayone_utils.answer_html("CH_PRED")

### <a name="p1-6">Part I - 6. First MLU Leaderboard Submission (with small train data)</a>
#### Now you are ready for your first submission to our MLU Leaderboard!

> <img style="float: left; padding-right: 20px" src="./../images/task_robot.png" alt="drawing" width="100"/> 
> Run the cell below to save your prediction file in the format expected by the MLU Leaderboard.


__NOTE__: If you have __not used the trained model to make predictions on the test dataset__ in the previous section/cell, you will not have the `price_predictions` needed for the prediction submission file, and running the cell below __will raise an error__. Go back and use the __.predict()__ function on the test dataset to create the `price_prediction` - as suggested by the answer provided in the *dayone_utils* file!

In [None]:
# Run this cell

# Define empty dataset with column headers ID & Price
df_submission = pd.DataFrame(columns=["ID", "Price"])
# Creating ID column from ID list
df_submission["ID"] = df_test["ID"].tolist()
# Creating label column from price prediction list
df_submission["Price"] = price_prediction
# saving your csv file for Leaderboard submission
df_submission.to_csv("./../data/predictions/Prediction_to_Leaderboard.csv", index=False)

#### Let's do a quick check to see if the file is ok!
> <img style="float: left; padding-right: 30px" src="./../images/task_robot.png" alt="drawing" width="100"/> 
> 1. Run the cell below to check if your submission file has the right IDs for the MLU Leaderboard.<br>
> 2. If the difference is zero you are good to go!

In [None]:
# Run the code below
print("Double-check submission file against the original test file")
sample_submission_df = pd.read_csv("./../data/mlu-leaderboard-test.csv", sep=",")
print(
    "Differences between project result IDs and sample submission IDs:",
    (sample_submission_df["ID"] != df_submission["ID"]).sum(),
)

#### Downloading the Prediction File and Submitting
> <img style="float: left; padding-right: 20px" src="./../images/task_robot.png" alt="drawing" width="100"/> 
> 1. Download the file you just saved to your local machine. <br/>
> 2. Follow the instructions on the Leaderboard submission page.

Go [here](https://mlu.corp.amazon.com/contests/redirect/7) to submit your file.
You can find your submission file in the folder <code>data > predictions</code>.

### <a name="p1-7">Part I - 7. Second MLU Leaderboard Submission (with full train data)</a>

> <img style="float: left;" src="./../images/challenge_robot.png" alt="drawing" width="130" /> 
> Now that you made your first submission using the small training sample from your dataset, repeat the process using the full dataset. Create predictions for the test set, and submit again to see if your score gets better.<br>
If you don't know how to write the code for this, uncomment the challenge answer; copy and paste it in the section below.

__NOTE__: It should take around 12-15 minutes to run this training with our CPU. Just in case, use the `time_limit` parameter (in seconds) to limit the run time to 20 minutes.



In [None]:
############## CODE HERE ####################


############## END OF CODE ####################

In [None]:
# ## CHALLENGE ANSWER
#dayone_utils.answer_html("CH_FULL_PRED")

### Second MLU Leaderboard Submission with the Full Train Dataset

> <img style="float: left; padding-right: 20px" src="./../images/challenge_robot.png" alt="drawing" width="130"/> 
>1. Run the AutoGluon leaderboard function for the smaller training dataset in the first cell below.<br>2. Run the AutoGluon leaderboard function for the full training dataset in the second cell below.<br>3. Compare the performances.

__How can you explain the differences in `score_val` and `fit_time` columns?__
 


In [None]:
############## FIRST CODE HERE ####################


############## END OF CODE ####################

In [None]:
############## SECOND CODE HERE ###############


############## END OF CODE ####################

In [None]:
# ## CHALLENGE ANSWER
#dayone_utils.answer_html("CH_FULL_LEAD")

### Get the second submission for MLU Leaderboard ready</a>

><img style="float: left; padding-right: 20px" src="./../images/challenge_robot.png" alt="drawing" width="130"/> 
> Write the code that creates the output file using the predictions from your second model.


In [None]:
############## CODE HERE ####################


############## END OF CODE ####################

In [None]:
# ## CHALLENGE ANSWER
#dayone_utils.answer_html("CH_FULL_SUBM")

#### Let's do a quick check to see if the file is ok related to the IDs expected
> <img style="float: left; padding-right: 20px" src="./../images/task_robot.png" alt="drawing" width="100"/> 
1. Run the cell below to check if your submission file has the right IDs for the MLU Leaderboard.
2. If the difference is zero you are good to go

In [None]:
# Run the code below
print("Double-check submission file against the original test file")
sample_submission_df = pd.read_csv("./../data/mlu-leaderboard-test.csv", sep=",")
print(
    "Differences between project result IDs and sample submission IDs:",
    (sample_submission_df["ID"] != df_full_submission["ID"]).sum(),
)

> <img style="float: left; padding-right: 20px" src="./../images/task_robot.png" alt="drawing" width="100"/> 
> Submit again to MLU leaderboard to improve your score. 

For the submission use the link as before [here](https://mlu.corp.amazon.com/contests/redirect/7).<br>

___
#  <a name="2"> Part II - Advanced AutoGluon (OPTIONAL)</a>

Now that you have made your first Leaderboard submission, let's practice using some advanced features of AutoGluon. <br/>
- Part II - 1. <a href="#p2-1">Explainability: Feature Importance</a>
- Part II - 2. <a href="#p2-2">Data Preprocessing: Cleaning & Missing Values</a>
- Part II - 3. <a href="#p2-3">Final (optional) MLU Leaderboard Submission (with full engineered data)</a>
- Part II - 4. <a href="#p2-4">Before You Go (clean up model artifacts)</a>

### <a name="p2-1">Part II (optional) - 1. Explainability</a>

There are growing business needs and legislative regulations that require explanations of why a model made a certain decision.<br/>
To better understand our trained predictor, we can estimate the overall importance of each feature.

#### Feature Importance
A features importance score represents the performance drop that results when the model makes predictions on a perturbed copy of the dataset where this features values have been randomly shuffled across rows. A feature score of 0.01 would indicate that the predictive performance dropped by 0.01 when the feature was randomly shuffled. The higher the score a feature has, the more important it is to the models performance. If a feature has a negative score, this means that the feature is likely harmful to the final model, and a model trained without that feature  would be expected to achieve a better predictive performance.



> <img style="float: left;padding-right: 20px" src="./../images/task_robot.png" alt="drawing" width="100" align="left"/> 
> Run the code below to see the output of the AutoGluon feature importance function for the first model we have run, with only 1000 samples. <br/>

In [None]:
# Run the code below
smaller_predictor.feature_importance(df_train_smaller)

### <a name="p2-2">Part II (optional) - 2. Data Preprocessing</a>

With AutoGluon you don't have to worry about which model to chose; indeed you can focus on the data itself. 
In the book price case, there are a few columns which are clearly very poorly encoded, most importantly the ```Edition``` column. <br/>

### Data Cleaning

For this experiment, let's use our small dataset __df_train_smaller__ to make everything run a bit faster.

> <img style="float: left;padding-right: 20px" src="./../images/challenge_robot.png" alt="drawing" width="130"/> 
> Use the functions below to clean things up a bit and expand that data out.<br/>
For this experiment, our feature engineering taks will be:<br/><br/>
1. Splitting the Column 
```Edition``` into three new ones: ```hard_paper```, ```year``` and ```month```.
2. Creating two numerical features based on the features ```Reviews``` and ```Ratings```, named ```Reviews-n``` and ```Ratings-n``` respectively.
3. Drop the old columns from the dataset: ```Edition```,  ```Reviews``` and ```Ratings```. 

Please, try hard to solve the challenge before uncommenting for the answer below. Day One is about Learn and Be Curious, right?

In [None]:
# Run this cell

import re
import pandas as pd


def first_num(in_val):
    num_string = in_val.split(" ")[0]
    digits = re.sub(r"[^0-9\.]", "", num_string)
    return float(digits)


def year_get(in_val):
    m = re.compile(r"\d{4}").findall(in_val)
    # print(in_val, m)
    if len(m) > 0:
        return int(m[0])
    else:
        return None


def month_get(in_val):
    m = re.compile(r"Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec").findall(in_val)
    # print(in_val, m)
    if len(m) > 0:
        return m[0]
    else:
        return "None"


# To drop features and save the new dataframe, you can use <name_of_df>.drop([<features_to_drop>], axis=1, inplace=True)

In [None]:
############## CODE HERE ####################


############## END OF CODE ####################

In [None]:
# ## CHALLENGE ANSWER
#dayone_utils.answer_html("CH_FEAT_ENG")

><img style="float: left;padding-right: 20px" src="./../images/task_robot.png" alt="drawing" width="100"/> 
>Now print the dataset with the new features to see how they look like

In [None]:
# Run this cell

train_data_feateng.head(2)

### Identifying Missing values
By doing the feature engineering above we introduced a new potential problem; we might now have some missing data.

> <img style="float: left;padding-right: 20px" src="./../images/challenge_robot.png" alt="drawing" width="130"/> 
> Try to identify the features that may have missing values and how many are missing. 

<br/>

__Are there any missing values?__

Please, try hard to solve the challenge before uncommenting for the answer below. <br/>
Day One is about Learn and Be Curious, right?

In [None]:
############## CODE HERE ####################


############## END OF CODE ####################

In [None]:
# ## CHALLENGE ANSWER
#dayone_utils.answer_html("CH_MISSING")

> <img style="float: left; padding-right: 20px" src="./../images/challenge_robot.png" alt="drawing" width="130"/> 
> Let's train the model again with these new manually created features.



In [None]:
############## CODE HERE ####################


############## END OF CODE ####################

In [None]:
# ## CHALLENGE ANSWER
#dayone_utils.answer_html("CH_PRED_FEAT")

> <img style="float: left; padding-right: 20px" src="./../images/challenge_robot.png" alt="drawing" width="130"/> 
> Compare the AutoGluon leaderboard for the new feateng_predictor to smaller_predictor in the cells below. <br/>

**Are there any significant differences?**


In [None]:
############## FIRST CODE FROM THE ANSWER HERE ####################


############## END OF CODE ########################################

In [None]:
############## SECOND CODE FROM THE ANSWER HERE ####################


############## END OF CODE #########################################

In [None]:
# ## CHALLENGE ANSWER
#dayone_utils.answer_html("CH_LEAD_COMP")

> <img style="float: left; padding-right: 30px" src="./../images/challenge_robot.png" alt="drawing" width="130"/> 
1. Run the AutoGluon `feature_importance` function for the original smaller dataset into the first cell below.
2. Run the feature_importance function again for the feature engineered dataset into the second cell below.
3. Compare the results.

__Are there any significant differences?__


In [None]:
############## CODE FOR THE ORIGINAL DATASET FEATURE IMPORTANCE HERE ####################


############## END OF CODE ############################################################

In [None]:
############## CODE FOR THE FEATURE ENGINEERED DATASET FEATURE IMPORTANCE HERE  ####################


############## END OF CODE #########################################################################

In [None]:
# ## CHALLENGE ANSWER
#dayone_utils.answer_html("CH_FEAT_COMP")

### <a name="p2-3">Part II (optional) - 3. MLU Leaderboard Submission (with full engineered data)</a>
Let's create the full engineered dataset to train a final AutoGluon model & let's also allocate more time to really get the best results.

__NOTE__: As there are few columns in this dataset, we don't necessarily expect additional performance improvement.

> <img style="float: left; padding-right: 20px" src="./../images/task_robot.png" alt="drawing" width="100"/> 
> Now it is time to train your model using using AutoGluon __enhanced version__.

For this experiment we will use a time limit of 30 min (`time_limit` in seconds below).

__NOTE__: 20 minutes may not be enough to have a better score than your previous submission. If you have time, try running for more than 20 minutes to improve your performance!

In [None]:
full_feateng = df_train.copy()

# CLEAN FEATURES
full_feateng["Reviews-n"] = full_feateng["Reviews"].apply(first_num)
full_feateng["Ratings-n"] = full_feateng["Ratings"].apply(first_num)
full_feateng["hard-paper"] = full_feateng["Edition"].apply(lambda x: x.split(",")[0])
full_feateng["year"] = full_feateng["Edition"].apply(year_get)
full_feateng["month"] = full_feateng["Edition"].apply(month_get)

# DROPPING ORIGINAL FEATURES
full_feateng.drop(["Edition", "Ratings", "Reviews"], axis=1, inplace=True)

In [None]:
enhanced_predictor = TabularPredictor(label="Price", eval_metric="mean_squared_error").fit(
    train_data=full_feateng, time_limit=30 * 60
)

### Time to make Your Final Submission to the MLU Leaderboard</a>

> <img style="float: left;padding-right: 20px" src="./../images/challenge_robot.png" alt="drawing" width="130"/> 
> Now make a final prediction and submit this to MLU leaderboard.<br> Keep in mind that we used an engineered version of the dataset for training. We need to apply the same transformation to the test data before we can call `.predict()`:

In [None]:
test_data_feateng = df_test.copy()

# FOR TEST DATA
test_data_feateng["Reviews-n"] = test_data_feateng["Reviews"].apply(first_num)
test_data_feateng["Ratings-n"] = test_data_feateng["Ratings"].apply(first_num)
test_data_feateng["hard-paper"] = test_data_feateng["Edition"].apply(
    lambda x: x.split(",")[0]
)
test_data_feateng["year"] = test_data_feateng["Edition"].apply(year_get)
test_data_feateng["month"] = test_data_feateng["Edition"].apply(month_get)

# DROPING ORIGINAL FEATURES
test_data_feateng.drop(["Edition", "Ratings", "Reviews"], axis=1, inplace=True)

Add the code below to create predictions and the output file.

In [None]:
############## CODE HERE ####################


############## END OF CODE ####################

In [None]:
# ## CHALLENGE ANSWER
#dayone_utils.answer_html("CH_FINAL_SUBM")

#### Let's do a quick check to see if the file is ok related to the IDs expected
><img style="float: left; padding-right: 30px" src="./../images/task_robot.png" alt="drawing" width="100"/> 
> 1. Run the cell below to check if your submission file has the right IDs for the MLU Leaderboard.
2. If the difference is zero you are good to go!

In [None]:
# Run the code below
print("Double-check submission file against the original test file")
sample_submission_df = pd.read_csv("./../data/mlu-leaderboard-test.csv", sep=",")
print(
    "Differences between project result IDs and sample submission IDs:",
    (sample_submission_df["ID"] != df_enhanced_submission["ID"]).sum(),
)

<p style="padding: 10px; border: 1px solid black;">
<img src="./../images/MLU-NEW-logo.png" alt="drawing" width="400"/> <br/>
    
## Congrats for Finishing this Hands On Activity!
In the next module, __Code Walkthrough and Advanced AutoGluon__ we are going do a walk-through over your solutions and also show a notebook that implements an __end-to-end__ solution.

### <a name="p2-4">Part II (optional) - 4. Before You Go</a>
> <img style="float: left; padding-right: 20px" src="./../images/task_robot.png" alt="drawing" width="100"/> 
>After you are done with this Hands On, you can clean all model artifacts uncommenting and executing the cell below.<br/>

__It's always a good practice to clean up everything when you are done.__

In [None]:
!rm -r AutogluonModels