# Machine Learning Project Checklist

Resource: Aurélien Géron, *Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition*, O'Reilly Media, 2019


The following checklist should be used as a guide for working on the course projects. The content of the checklist is largely based on *Hands-On Machine Learning* from Aurélien Géron. In some places adjustments were made for the course project. The original checklist can be found in Appendix B (page 755) of *Hands-On Machine Learning*.

* **Frame the Problem and Look at the Big Picture:**
    1. Define the objective of your project in your own words.
    2. Select a performance criteria for evaluation of the task.
    3. Explain in a paragraph how you would solve the problem manually.
    4. List assumptions that you made so far.
    
    
* **Get the Data:**
    1. Create a new workspace for the course project. 
    2. Discuss with your colleagues whether you would like to use a version management tool.
        - git: https://git-scm.com/
        - github: https://github.com/
        - github tutorial: https://guides.github.com/activities/hello-world/
    3. Get the data.
    4. Convert the data to a format you can easily manipulate (if required).
    5. Check the size and type of the data.
    6. Sample a test set, put it aside, and never look at it (no data snooping!).
    7. Sample a training set and a validation set.
    
 
* **Explore the Data:**
    1. Create a copy of the training set for exploration (sample it down to a manageable size if necessary).
    2. (Create a Jupyter Notebook for your data exploration.)
    3. Study each attribute and its characteristics
        - name
        - type
        - % of missing values
        - noisiness
        - usefulness for task
        - type of distribution
        - ...
    4. Identify the target attribute (supervised learning).
    5. Visualize the data.
    6. Study correlations between attributes.
    7. Identify the promising transformations you may want to apply.
    8. Document the results of your exploratory data analysis.
    

* **Prepare the Data:**
    1. Write functions for all data transformations you apply.
    2. Clean the data.
        - remove outliers
        - fill in missing values
    3. Select important features (drop attributes that provide no useful information for the task).
    4. Use feature engineering.
        - decompose categorical features
        - add promising transformations for features
        - aggregate features into promissing new features
    5. Use feature scaling (standardize or normalize features).

    
* **Select and Train your Models:**
    1. Learn the basics about the models you've been assigned.
    2. Select a 3rd model on your own. Use criteria for your selection.
    3. Make a plan for training and evaluation of the models.
    4. Build your models (select meaningful values for the model parameters).
    5. Train your models with the training set.
    6. Measure and compare the performance of your models (use N-fold cross-validation).
    7. Analyze the most significant parameters for each model.
    8. Analyze the types of errors the models make.


* **Fine-Tune your Models:**
    1. Fine-tune the hyperparameters using cross-validation.
        - treat your data transformation choices as hyperparameters, especially when you are not sure about them. (e.g. if you are not sure whether to replace missing values with zeros or with the median value, or to just drop the rows.)
        - use grid search and random search $\to$ compare and evaluate the results.
    2. Compare the results of all three models after the fine-tuning process to determine the best hyperparameters.
    3. Select a final model and measure its performance on the test set to estimate the generalization error.
        - dont tweak your model after measuring the generalization error: you would just start overfitting the test set.
    4. Document the results of the examined models. Pay particular attention to the final model.
    
    
*  **Present Your Solution:**
    1. Document what you have done.
        - write a summary of the results in a new, final section
        - explain why your solution achieves the task objective
    2. Create a nice presentation.
        - make sure to highlight the big picture first
        - make sure you explain the assigned model in detail to your fellow students
        - dont forget to present interesting points you noticed along the way
        - list your assumptions and your systems limitations
        - ensure your key findings are communicated
