# "A Journey Through Fastbook (AJTFB) - Chapter 2"
> "The second in a weekly-ish series where I revisit the fast.ai book, [\"Deep Learning for Coders with fastai & PyTorch\"](https://github.com/fastai/fastbook), and provide commentary on the bits that jumped out to me chapter by chapter.  So without further adieu, let's go!"

- toc: false
- branch: master
- badges: true
- hide_binder_badge: true
- comments: true
- author: Wayde Gilliam
- categories: [fastai, fastbook]
- image: images/articles/fastbook.jpg
- hide: true
- search_exclude: true
- permalink: temp-posts/ajtfb-chapter-2

Other posts in this series:
[A Journey Through Fastbook (AJTFB) - Chapter 1](https://ohmeow.com/posts/2020/11/06/_11_06_ajtfb_chapter_1.html)

## Chapter 2

---
### Starting Your Project

#### Things to think about when deciding on project feasibility

> the most important consideration is data availability

If you don't have enough quality data and/or the means to create it ... good luck.

> Important: Consider that **data augmentation** can alleviate both the need for more manual labelling and also protect you from problems with out-of-domain data (e.g. "when unexpected image types arise in the data when the model is being used in production") by synthetically creating more data likely to be seen that may not be in your dataset as is.

> Important: Start with a smaller ResNet (like 18 or 34) and move up as needed.



> iterate from end to end in your project: don't spend months fine-tuning your model, or polishing the perfect GUI, or labeling the perfect dataset.

In other words, fail early and fail often. If you don't, you're likely to only uncover critical problems much later than you would have before, and even worse, you're likely to not produce anything at all! In the world of deep learning there are a number of tools, that while helpful, can really get you so bogged down that you never deploy something usable (e.g., experiment tracking tools, hyperparameter optimization libraries, etc...). Also, remember that getting something in production is a different task from winning a kaggle competition, where the later may require use of some of those aforementioned tools and the ensembling of dozens of models. For production, something better than human is often good enough to get out there and through refactoring, improve.

---
## The Drivetrain Approach

![](https://raw.githubusercontent.com/fastai/fastbook/41a60e44d588139a03452f1907359fc2322f8d5f/images/drivetrain-approach.png)

### Steps 1-3

### Step 4





---
### Metrics

**Metrics** are a human-understandable measures of model quality whereas the **loss** is the machine's.  They are based on your validation set and are what you really care about, whereas the loss is "a measure of performance" that the training system can use to update weights automatically.

A good choice for loss is a function "that is easy for ***stochastic gradient descent (SGD)*** to use, whereas a good choies for your metrics are functions that your business users will care about. Seldom are they the same because most metrics don't provide smooth gradients that SGD can use to update your model's weights.

Examples of common metrics:

**error rate** = "the proportion of images that were incorrectly identified.

**accuracy** = the proportation of images that were correctly identified (`1 - error rate`)

---
## Validation & Test Sets

### What is a validation set?

A ***validation set*** (also know as the "development set") does not include any data from the ***training set***.  It's purpose to is gauge the generalization prowess of your model and also ensure you are neight overfitting or underfitting.

> If [the model] makes an accurate prediction for a data item, that should be because it has learned characteristics
> of that kind of item, and not because the model has been shaped by *actually having seen that particular
> item*.

### Why do we need a validation set?

> [because] what we care about is how well our model works on *previously unseen images* ... the longer
> you train for, the better your accuracy will get on the training set ... as the model starts to memorize
> the training set rather than finding generalizable underlying patterns in the data = **overfitting**

![](https://raw.githubusercontent.com/fastai/fastbook/41a60e44d588139a03452f1907359fc2322f8d5f/images/att_00000.png)

***Overfitting*** happens when the model "remembers specific features of the input data, rather than generalizing well to data not seen during training."

> Important: Your models should always overfit before anything else.  It is your training loss gets better while your validation loss gets worse ... in other words, if you're validation loss is improving, even if not to the extent of your training loss, you are *not* overfitting

> Important: ALWAYS include a validation set.

> Important: ALWAYS measure your accuracy (or any metrics) on the validation set.

> Important: Set the `seed` parameter so that you "get the same validation set every time" so that "if we change our model and retrain it, we know any differences are due to the changes to the model, not due to having a different random validation set."

The validation set also informs us how we may change the  ***hyperparamters*** (e.g., model architecture, learning rates, data augmentation, etc...) to improve results.  These parameters are NOT learned ... they are choices WE make that affect the learning of the model parameters.

### What is a test set?

A ***test set*** ensures that we aren't overfitting our hyperparameter choices; it is held back even from ourselves and used to evaulate the model at the very end.

> Important: If evaluating 3rd party solutions know how to create a good test set and how to create a good baseline model.  Hold these out from the potential consultants and use them to fairly evaluate their work.

### How do you define a good validation and test sets?

See pp.50-54 ...

> A key property of the validation and test sets is that they must be representative of the new data
> you will see in the future.

For time series, that means you'll likely want to make your validation set a continuous section with the latest dates.

You'll also want to make sure your model isn't learning particular ancillary features of particular things in your images (e.g., you want to see how your model performs on a person or boat it hasn't seen before ... see pp.53-54 for examples).






---
## Q & A & Best Practices

### What is a `Transform`

> A **Transform** conatins code that is applied automatically during training. There are two kinds ...
1. `item_tfms`: Applied to each item
2. `batch_tfms`: Applied to a *batch* of items at a time *using the GPU*

### Why do we make images 224x224 pixels?
> This is the standard size for historical reasons (old pretrained models require this size exactly) ...
> If you increase the size, you'll often get a model with better results since it will be able to focus
> on more details.

> Important: Train on progressively larger image sizes using the weights trained on smaller sizes as a kind of pretrained model.

### What is a ResNet & Why use it for computer vision tasks?

A **ResNet** is a model architecture that has proven to work well in CV tasks. Several variants exist with different numbers of layers with the larger architectures taking longer to train and more prone to overfitting especially with smaller datasets.

> Important: Start with a smaller ResNet (like 18 or 34) and move up as needed.

> Important: If you have a lot of data, the bigger resnets will likely give you better results.

And what other things can use images recognizers for besides image tasks? Sound, time series, malware classification ...

> ... a good rule of thumb for converting a dataset into an image representation: if the human eye can recognize categories from the images, then a deep learning model should be able to do so too.

See pp.36-39

### How can we see what our NN's are actually learning/doing?

See pp.33-36. Being able to inspect what your NN is doing (e.g., looking at the activations and the gradients) is one of the most important things you can learn as they are often the key to improving results.

> Important: Learn how to visualize and understand your activations and gradients!

### What is the difference between categorical and continuous datatypes?

***Categorical*** data "contains values taht are one of a discrete set of choice" such as gender, occupation, day of week, etc...

***Continuous*** data is numerical that represents a quantity such as age, salary, prices, etc...

> Important: For tasks that predict a continuous number, consider using `y_range` to constrain the network to predicting a value in the known range of valid values. (see p.47)

---

## Resources

1. https://book.fast.ai - The book's website; it's updated regularly with new content and recommendations from everything to GPUs to use, how to run things locally and on the cloud, etc...

2. https://course.fast.ai/datasets - A variety of slimmed down datasets you can use for various DL tasks that support "rapid prototyping and experimentation." 

3. https://huggingface.co/docs/datasets/ - Serves a similar purpose to the fastai datasets but for the NLP domain. Includes metrics and full/sub-set datasets that you can use to benchmark your results against the top guns of deep learning.


> Important: Start with a smaller dataset and scale up to full size to accelerate modeling!
