Notes on a universal blueprint that can be used to attack and solve any machine-learning problem.

# 1. Define the Problem and Assemble a Dataset

First define the problem:
- What will your input data be? What are you trying to predict?
- What type of problem are you facing? Is it binary classification? Multiclass classification? Scalar regression? Vector regressions? Multiclass, multilable classification? Something else? Identifying the problem will guide the choice of model architecture, loss fuction, etc.

You can't move on until the above is complete. Be aware of that you are assuming the following:
- Your outputs can be predicted given your inputes
- Your available data is sufficiently informative to learn the relationships between inputs and outputs

# 2. Choosing a measure of Success

Is it appropriate to measure:
- Accuracy?
- Precision and recall?
- Customer-retention rate?

The metric for success will guide the choice of a loss function: what your model will optimize. It should align with your higher-level goals.

- For balanced-classification problems, accuracy and are under the receiver operating characteristic curve (ROC AUC) are common metrics.
- For ranking problems or multilabel classification, mean average precision

Not uncommon to define your own metric to measure success.

# 3. Deciding on an Evaluation Protocol

Choose one of the 3 common evaluation proceedures:
- Hold-out validation
- K-fold cross-validation
- Iterated K-fold validation

# 4. Preparing your Data

- Data should be formatted as tensors
- Values taken by these tensors should usually be scaled to small values
- If the data is heterogeneous, normalize it
- Do some feature engineering, especially for small-data problems

# 5. Developing a model that does better than a baseline

The goal at this stage is to achieve *statistical power* (beat a dumb baseline). It isn't always possible to do this. If you can't beat the random baseline after trying multiple reasonable architectures, it may be that the answer to the question you're asking isn't present in the input data. In this case, it's back to the drawing board.

If things go well, you'll need to make 3 key choices to build a first working model:
- *Last-layer activation*: This establishes useful constraints on the network's output.
- *Loss function*: This should match the type of problem you're trying to solve.
- *Optimization configuration*: What optimizer will you use? What will its learning rate be?

This table can you help you choose last-layer activation and loss fuction for a few common problem types:
| Problem Type | Last-layer activation | Loss function |
| ------------ | --------------------- | ------------- |
| Binary classification | `sigmoid` | `binary_crossentropy` |
| Multiclass, single-label classification | `softmax` | `categorical_crossentropy` |
| Multiclass, multilabel classification | `sigmoid` | `binary_crossentropy` |
| Regression to arbitrary values | None | `mse` |
| Regression to values between 0 and 1 | `sigmoid` | `mse` or `binary_crossentropy` |

# 6. Scaling Up: developing a model that overfits

Consider:
- Is your model sufficiently powerful?
- Does it have enough layers and parameters to properly model the problem?

To figure out where the border between underfitting and overfitting is, you must first cross it.

To figure out how big a model you'll need, you must develop a model that overfits.
- add layers
- make the layers bigger
- train for more epochs

# 7. Regularizing your model and turning your hyperparameters

This step will take most of your time. You'll repeatedly modify your model, train it, evaluate it, modify it again, repeat...

These are some things you should try at this stage:
- add dropout
- try different architectures: add or remove layers
- add L1 and/or L2 regularization
- try different hyperparameters
- iterate on feature engineering: add new features, or remove features that don't seem to be informative