# "A Journey Through Fastbook (AJTFB) - Chapter 4"
> "The fourth in a weekly-ish series where I revisit the fast.ai book, [\"Deep Learning for Coders with fastai & PyTorch\"](https://github.com/fastai/fastbook), and provide commentary on the bits that jumped out to me chapter by chapter.  So without further adieu, let's go!"

- toc: false
- branch: master
- badges: true
- hide_binder_badge: true
- comments: false
- author: Wayde Gilliam
- categories: [fastai, fastbook]
- image: images/articles/fastbook.jpg
- hide: true
- search_exclude: true
- permalink: temp-posts/ajtfb-chapter4

Other posts in this series:  
[A Journey Through Fastbook (AJTFB) - Chapter 1](https://ohmeow.com/posts/2020/11/06/_11_06_ajtfb_chapter_1.html)  
[A Journey Through Fastbook (AJTFB) - Chapter 2](https://ohmeow.com/posts/2020/11/16/ajtfb-chapter-2.html)  
[A Journey Through Fastbook (AJTFB) - Chapter 3](https://ohmeow.com/posts/2020/11/22/ajtfb-chapter-3.html)



In [None]:
!pip install fastai>=2.3 --upgrade

import fastai
from fastai.vision.all import *
print(fastai.__version__ )

## Chapter 4

---
### How to visualize a grayscale image in pandas ...

In [None]:
mnist_path = untar_data(URLs.MNIST_SAMPLE); mnist_path.ls()

In [None]:
sample_3 = Image.open((mnist_path/'train/3').ls().sorted()[1])
sample_3

In [None]:
sample_3_t = tensor(sample_3)
df = pd.DataFrame(sample_3_t[4:15, 4:22])
df.style.set_properties(**{'font-size':'6pt'}).background_gradient('Greys')

---
### What is a baseline model and why do you want one?

> A simple model that you are confident should perform reasonably well. It should be
> simple to implement and easy to test

**Why do you want to start with a baseline model?**
> ... without starting with a sensible baseline, it is difficult to know whether your super-fancy models are any good

**How do you build/find one of these models?**

You can search online for folks that have trained models to solve a problem similar to your's and/or you can start with one of the high-level examples in the fastai docs against your data.  There are a bunch covering core vision, text, tabuluar and colab filtering tasks right [here](https://docs.fast.ai/tutorial.html).



---
### Tensors

**What is a "Tensor"?**

Like a numpy array, but with GPU support.  The data it contains must be of the ***same type*** and must conform in ***rectangular shape***.

> Important: "try to avoid as much as possible writing loops, and replace them by commands that work directly on arrays or tensors"

Let's take a look ..

In [None]:
threes = (mnist_path/'train/3').ls().sorted()
len(threes), threes[0]

In [None]:
all_threes = [ tensor(Image.open(fp)) for fp in threes ]
len(all_threes), all_threes[0].shape

In [None]:
stacked_threes = torch.stack(all_threes).float()/255
stacked_threes.shape

Important information about tensors include its `shape`, `rank`, and `type`:

In [None]:
# shape = the length of each axis
print('shape: ', stacked_threes.shape)

# rank = the total number of axes
print('rank: ', stacked_threes.ndim)

# type = the datatype of its contents
print('type: ', stacked_threes.dtype)

Check out pp.145-148 to learn about "broadcasting", a critical piece to understanding how you can and should manipulate tensors or numpy arrays!

---
### Stochastic Gradient Descent - How to train a model



Here are the steps:

1. **INITIALIZE** the weights = initializing parameters to random values

2. For each image, **PREDICT** whether it is a 3 or 7
 
3. Based on the predictions, calculate how good the model is by calculating its **LOSS** (small is good)

4. Calculate the **GRADIENT**, ***"which measures for each weight how changing the weight would change the loss"***

5. **STEP**, change all the weights based on the gradient

6. Starting at step 2, **REPEAT**

7. **STOP** when you don't want to train any longer or the model is good enough

Below, we'll delve deeper into steps 4 and 5 since how they work are likely the most foreign. We'll do this by getting a big more into the sample code beginning on p.150 ...

### Step 4: Calculating the gradients

> Important: "the gradients ***tell us how much we have to change each weight*** to make our model better ... allows us to more quickly calculate whether our loss will go up or down we we make those adjustments"


> Important: "The gradients ***tell us only the slope of our function***; they don't tell us exactly how far to adjust the parameters. But they do give us some idea of how far" (large slope = bigger adjustments needed whereas a small slope suggests we are close to the optimal value)


"The ***derivative*** of a function tells you how much a change in its parameters will change its result"

Remember: We are calculating a gradient for *EVERY* weight so we know how to adjust it to make our model better (i.e., lower the LOSS)

`requires_grad` tells PyTorch "that we want to calculate gradients with respect to that variable at that value"

In [None]:
def plot_function(f, tx=None, ty=None, title=None, min=-2, max=2, figsize=(6,4)):
    x = torch.linspace(min,max)
    fig,ax = plt.subplots(figsize=figsize)
    ax.plot(x,f(x))
    if tx is not None: ax.set_xlabel(tx)
    if ty is not None: ax.set_ylabel(ty)
    if title is not None: ax.set_title(title)

Here we pretend that the below is our **loss function**.  Running a number through it, our **weight** will produce a result, an **activation** ... in this case, our **loss** (which again is a value telling us how good or bad our model is; smaller = good)

In [None]:
xt = tensor(-1.5).requires_grad_(); xt

In [None]:
def f(x): return x**2
loss = f(xt)

plot_function(f, 'x', 'x**2')
plt.scatter(xt.detach().numpy(), loss.detach().numpy(), color='red')
print('Loss: ', loss.item())

So if our parameter is `-1.5` we get a loss = `2.25`. Since the direction of our slope is downward (negative), by changing its value to be a bit more positive, we get closer to achieving our goal of *minimizing our loss*

In [None]:
xt = tensor(-1.).requires_grad_(); xt

loss = f(xt)

plot_function(f, 'x', 'x**2')
plt.scatter(xt.detach().numpy(), loss.detach().numpy(), color='red')
print('Loss: ', loss.item())

And yes, our loss has improved!  If the direction of our slope were upwards (positive), we would conversely want `x` to be smaller.

***BUT*** now ... imagine having to figure all this out for a million parameters.  Obviously, we wouldn't want to try doing this manually as we did before, and thanks to PyTorch, we don't have too :)

Remember that by utilizing the `requires_grad_()` function, we have told PyTorch to keep track of how to compute the gradients based on the other calucations we perform, like running it through our loss function above.  Let's see what that looks like.

In [None]:
xt = tensor(-1.).requires_grad_(); 
print(xt)

loss = f(xt)
print(loss)

That `<PowBackward0>` is the gradient function it will use to calculate the gradients when needed.  And when we need it, we call the `backward` method to do so.

In [None]:
loss.backward()
print(xt.grad)

And the calcuated gradient is exactly what we expected given that to calculate the derivate of `x**2` is `2x` ... `2*-1 = -2`.

Again, the gradient tells us ***the slope of our function***.  Here have a a negative/downward slope and so at the very least, we know what moving in that direction will get us closer to the minimum.

The question is now, **How far do we move in that direction?**

---
### Step 5: Change all the weights based on the gradient using a "Learning Rate"

The **learning rate** (or LR) is a number (usually a small number like 1e-3 or 0.1) that we multiply the gradient by to get a better parameter value.  For a given parameter/weight `w`, the calculation looks like this:

` w -= w.grad * lr`

Notice we take the negative of the grad * lr operation because we want to move in the opposite direction.

> Important: We do this in a `with torch.no_grad()` so that we don't calculate the gradient for the gradient calculating operation

In [None]:
lr = 0.01

with torch.no_grad():
 xt -= xt.grad * lr

 print('New value for xt: ', xt)
 print('New loss: ', f(xt))

You can see the loss get smaller which is exactly what we want!

The above operation is also called the **optimization step**

See pp.156-157 for examples of what using a too small or too large LR might look like when training.  This could help you troubleshoot things if yours looks wonky.

---
### Measuring distances

See pp.141-142.  There are two main ways to measure distances.

**L1 norm** (or mean absolute difference): Take the mean of the absolute value of differences

` l1_loss = (tensor_a - tensor_b).abs().mean()`

**L2 norm** (or root mean squared error, RMSE): Take the square root of the mean of the square differences. The squaring of differences makes everything positive and the square root undoes the squaring.

> Important: "... the latter will penalize bigger mistakes more heavily than the former (and be more lenient with small mistakes)"

` l2_loss = ((tensor_a - tensor_b) ** 2).sqrt()`






---
### Disinformation

> It is not necessarily about getting someone to belive something false, but rather often used to sow disharmony and uncertainty, and to get people to give up on seeking the truth. Receiving conflicting accounts can lead people to assume that they can never know whom or what to trust.

Disinformation will unfotunately be one of the greatest legacies of President Trump. A step backwards for American society. A culture that will back if you if you tell them what they want to hear, even if you're a compulsive liar and base your statements on "gut feel" rather than facts and logic.

> While most of us like to think of ourselves as independent-minded, in reality we evolved to be influenced by others in our in-group, and in opposition to those in our out-group. Online discussions can influence our viewpoints, or alter the range of what we consider acceptable viewpoints. Humans are social animals, and as social animals, we are extremely influenced by the people around us. Increasingly, radicalization occurs in online environments; so influence is coming from people in the virtual space of online forums and social networks.

The biggest take here is that I am not as independently minded as I think I am. Knowing thyself is perhaps the best preventative of being swallowed up by disinformation. Limiting social media is another.

> Disinformation through autogenerated text is a particularly significant issue

As an NLP guy, this one scares me since part of my work is to summarize text. Knowing this, the first step I've taken is to let all business owners know the risk of text generation algorithms generating text that is either false and/or not necessarily reflective of the inputs, as in the case of abstract summarization.  The second step I took was to introduce human beings into the process and a workflow that has them look at at least the most potentially wrong summarizations before reports go out.



---
### What to do???

> You must assuem that any personal data that Facebook or Android keeps are data that governments around the world will try to get or that thieves will try to steal.

Data use and storage are things you need to think about.

I think these are good questions to ask/answer in any project to ensure good outcomes:

> * Whose interests, desires, skills, experiences, and values have we simply assumed rather than
> actually consulted?
> * Who are all the stakeholders who will be directly affected by our product? How have their interests
> been protected? How do we know what their interests really are - have we asked?
> * Whowhich groups and individuals will be indirectly affected in signficant ways?
> * Who might use this product that we didn't expect to use it, or for purposes we didn't initially
> intend?

See pp.119-120 for a bunch of good questions to put into your practice!

> When everybody on a team has similar backgrounds, they are likely to have similar blind spots around ethical tasks.

> ... first come up with a process, definition, set of questions etc., which is designed to resolve a problem. Then try to come up with an example in which the apparent solution results in a proposal that no one would consider acceptable. This can then lead to further refinement of the solution.

Thinking about all these things may lead one to analysis paralysis or even worse, complete apathy.  We need to start with something and be okay with criticism and refactoring. Additionally, we need to be thoughtful in even spot on criticism of others' systems. I don't think most folks try to make something racist or mysoginistic or whatever, so instead of calling them a "Hitler" on Twitter when we taste something that looks to us like fasicism, maybe a phone call and one-on-one chat is the better and more productive move.


---

## Resources

1. https://book.fast.ai - The book's website; it's updated regularly with new content and recommendations from everything to GPUs to use, how to run things locally and on the cloud, etc...

2. https://forums.fast.ai/c/data-ethics/47 - Forum subcategory for all things "data ethics".
