# Machine Learning Project Checklist
-------------------------------------------------------------

## Main Steps in ML:
------------------------------------------

1. Look at the big picture
2. Get the data
3. Explore the data to gain insights
4. Prepare data for ML algorithms
5. Select a model and train it
6. Fine tune the model
7. Present the solution
8. Launch, monitor, and maintain the system

## Step 1: The big Picture
-------------------------------------

1. Define objective in business terms
2. How will your solution be used?
3. What are the current solutions, if any?
4. How the problem should be framed ( supervised, unsupervised, etc)
5. How should performance be measured?
6. Is the performance measure aligned with the business objectives?
7. What is the minimum performance needed to reach business objectives?
8. What are comparable problems? Can you reuse experience or tools?
9. Is human expertise available?
10. How would you solve the problem manually?
11. List assumptions made so far ( you or others)
12. Verify assumptions if possible

## Step 3: Explore the data
---------------------------------------

1. Create copy of the data for exploration (sampling it down to a manageable size if necessary)
2. Create a Jupyter Notebook to keep a record of your data exploration
3. Study each Attribute and its characteristics:
    * Name
    * Type ( Categorical, int/float, bounded/unbounded, text, structure,etc.
    * % of missing values
    * Noisiness and type of noise (stochastic, outliers, rounding errors,etc.)
    * Possibly useful for the task?
    * Type of distribution (Gaussian, uniform, logarithmic,etc.)
4. For supervised learning task, identify the target attribute(s)
5. Visualize the data
6. Study the correlation between attributes
7. Study how you would solve the problem manually
8. Identify the promising transformations you may want to apply
9. Identify extra data that would be useful
10. Document what you have learned

## Step 4: Prepare the Data
----------------------------------------

* Work on copies of the data (keep original dataset intact)
* Write functions for all data transformations you apply, reasons:
    - So you can easily prepare data the next time you get a fresh dataset
    - So you can apply these transformations to future projects
    - To clean and prepare the test set
    - To clean and prepare new data instances once your solution is live
    - To make it easy to treat your preparation choices as hyperparameters

1. Data Cleaning:
    * Fix or remove outliers (optional)
    * Fill missing values(with 0s, median, mean, etc) or drop rows or columns
    
Check [Imputer](Imputer.ipynb)
2. Feature selection (optional):
    * Drop the attributes that provide no useful information about the task
3. Feature engineering, where appropiate:
    * Discretize continuous features
    * Decompose features (e.g., categorical, date/time,etc)
    * Add promising transformations of features (e.g., Log(x), Sqrt(x),etc.)
    * Aggregate features into promising new features
4. Feature scaling: standardize or normalize features

Check [Web Data](Web Data.ipynb)

Check [Creating categorical fields](Creating categorical fields.ipynb)

## Getting a feel about the data
--------------------------------------------

It is useful to start by looking at the following:

1. df.info() : Useful to see data structure (# of values, type, nulls, etc)
2. df['column'].value_counts() : Useful for categorical values to define categories and count for each of them
3. df.describe() : It provides basic stats about each feature
4. df.hist() : Good way to look the shape of each feature

Check [Getting a feel about the data](Getting a feel about the data.ipynb)

5. Looking for linear correlations

Check [Linear Correlations](Linear correlations.ipynb)

Check [Visualizing data Matplotlib](Visualizing data Matplotlib.ipynb)

Check [More visualizations in Matplotlib](03_Visualization.ipynb)

## Create a test set
-----------------------------

It is important to separate the train a test data from the beginning and to avoid to give any bias to the methodology by snooping on the data before hand.


1. [Store Analysise](Stores Analysis.ipynb)