# 1. Prediction in One Dimension

1. What's wrong with the following class or custom module:<br>

```
import torch.nn as nn

torch.manual_seed(1)
class LR(nn.Module):
    def __init__(self, in_size, out_size):
        #super(LR, self).__init__()
        nn.Module.__init__(self)
        linear = nn.Linear(in_size, out_size)
        
    def forward(self, x):
        out = self.linear(x)
        return out
```

&#9744; "`super`" is not needed<br>
&#9744; "`nn.Module`" is not required<br>
&#9745; "`linear`" should be `self.linear`<br>
&#9744; The code will run fine


In [3]:
import torch
import torch.nn as nn

class LR(nn.Module):
    def __init__(self, in_size, out_size):
        #super(LR, self).__init__()
        nn.Module.__init__(self)
        linear = nn.Linear(in_size, out_size)
        
    def forward(self, x):
        out = self.linear(x)
        return out
    
# testing
model = LR(1, 1)
x = torch.tensor([[0.0]])
y_ = model(x)
print y_

AttributeError: 'LR' object has no attribute 'linear'

2. Consider the following lines of code. How many Parameters does the object model have?<br>
`from torch.nn import Linear`<br>
`model=Linear(in_features=1,out_features=1)`<br>

&#9744; 1<br>
&#9745; 2<br>
&#9744; 3<br>
&#9744; None of the above

In [4]:
from torch.nn import Linear 
model = Linear(in_features=1, out_features=1)
print(list(model.parameters()))

[Parameter containing:
tensor([[0.6337]], requires_grad=True), Parameter containing:
tensor([0.6783], requires_grad=True)]


# 2. Linear Regression Training

1. For linear regression, what is the assumption about the noise?<br>
&#9745; It is Gaussian noise<br>
&#9744; It is Laplacian noise<br>
&#9744; It has zero variance

2. We obtain model parameters via:<br>
&#9744; Testing<br>
&#9744; Prediction<br>
&#9745; Training

# 3. Loss

1. The loss is a function of your:<br>
&#9744; Data<br>
&#9744; Noise<br>
&#9745; Model Parameters<br>

2. The following is an example of:<br>
 <img src="images/quiz_loss.png" width="30%"/>

&#9744; "`w`" space<br>
&#9744; Data space<br>
&#9745; Parameter space

# 4. Gradient Descent

1. In gradient descent, what happens when you select a learning rate that is too large?<br>
&#9745; You may miss the minimum and your loss will start increasing<br>
&#9744; It may take to long to converge to a minimum<br>
&#9744; The loss function will become concave

2. How do you select an initial parameter value for the first iteration of gradient descent?<br>
&#9744; Set it to 100<br>
&#9744; It should always be 0<br>
&#9745; Randomly

# 5. Cost

1. For linear regression, what is true about the cost? (select all that apply)<br>
&#9745; It is the sum or average of loss<br>
&#9745; It is a measure of how well your model can predict the data<br>
&#9745; It is a function of the slope and bias


2. Gradient Descent is sometimes referred to as Batch Gradient Descent?<br>
&#9744; False<br>
&#9745; True 

# 6. Training Parameters in PyTorch the Hard Way

1. Your loss is a fuction of `w`. What method will calculate or accumulate gradients of your loss?<br>
&#9744; w.grad<br>
&#9745; loss.backward()<br>
&#9744; loss.grad

2. Consider the derivative of the loss with respect to `w`, where `l` represents the value of the loss. How do you obtain the equivalent value in PyTorch?<br>

$\frac{dl(w=-10)}{dw}$

&#9744; loss<br>
&#9744; loss.backward()<br>
&#9744; loss.grad<br>
&#9745; w.grad

# 7. Training with Slope and Bias

1. Your loss is a function of `w` and `b`. What method will calculate or accumulate gradients of your loss?<br>
&#9744; w.grad<br>
&#9745; loss.backward()<br>
&#9744; b.grad

2. The loss is a function of `w` and `b`. What is wrong with the following lines of code?

`w.data = w.data - lr*w.grad.data`<br>
`b.data = b.data - lr*b.grad.data`<br>
`loss.backward()`<br>

&#9744; `b.data` is not an attribute<br>
&#9744; `w.data` is not an attribute<br>
&#9745; You need to call `loss.backward()` before you have access to the gradient of `w` and `b`

# 8. Stochastic Gradient Descent

1. How many samples at a time do you use for stochastic gradient descent?<br>
Answer: 1

2. What is correct about stochastic gradient descent? (select all that apply)<br>

&#9744; The loss must be linear<br>
&#9745; The loss may exhibit sudden increases<br>
&#9745; It's an approximation of batch gradient descent<br>
&#9744; It always works better than batch gradient descent 

# 9. Mini-Batch Gradient Descent

1. You have 100 samples of data and your batch size is 25. How many iterations will it take to go through 1 epoch?<br>
Answer: 4

Consider the dataset class `Data()`. How would you create a data loader object `trainloader` with a batch size of 3?

&#9744; `data_set=Data(batch_size=3)
  trainloader=DataLoader(dataset=data_set)`<br><br>
&#9745; `data_set=Data()
  trainloader=DataLoader(dataset=data_set,batch_size=3)`<br><br>
&#9744; `trainloader=Data(batch_size=3)`

# 10. PyTorch Way

1. What does the followling line of code do?<br>
`optimizer.step()`<br>

&#9744; Clears the gradient<br>
&#9744; Computes the gradient of the loss with respect to all the learnable parameters<br>
&#9744; Makes a prediction<br>
&#9745; Makes an update to its parameters

2. What's missing from the following code?<br>

`yhat=model(x)`<br>
`loss=criterion(yhat,y)`<br>
`loss.backward()`<br>
`optimizer.step()`<br>
 
&#9744; There is no prediction<br>
&#9744; Calculation of the loss<br>
&#9745; Does not clear the gradient<br>
&#9744; There is no Backward pass

3. What's wrong with the following lines of code?<br>

`optimizer = optim.SGD(model.parameters(), lr = 0.01)`<br>
`model=linear_regression(1,1)`
 
&#9745; The model object has not been created. As such, the argument that specifies what Tensors should be optimized does not exist<br>
&#9744; There is no loss function<br>
&#9744; You have to clear the gradient

# 11. Training and Validation

1. Training data is used to train the model; validation data is used to obtain what?<br>

&#9745; Hyperparameters<br>
&#9744; A test of how good the model performs in the real world<br>
&#9744; The reduced model variance

2. For linear regression what are the Hyperparameters? (select all that apply)<br>

&#9745; Bach size<br>
&#9744; Slope<br>
&#9744; Bias<br>
&#9745; Learning rate

# 12. Early Stopping

1. What does the following line of code do?<br>

`torch.save(model.state_dict(), 'best_model.pt')`

&#9745; Saves the parameters of the model object so that you can use them later<br>
&#9744; Saves your data<br>
&#9744; Loads your model parameters<br>
&#9744; Saves a PyTorch tensor as a .csv

2. Early stopping uses the following to determine when to save the data:<br>
    
&#9744; Cost on the test data<br>
&#9744; Number of iterations<br>
&#9745; Cost on the validation data

In [None]:
&#9744;