---
title-block-banner: false
---

# Questions - Chapter 2 
::: {.callout-warning}
## Back to blog post
[fastai book chapter 2](../from_model_to_production_post.ipynb)
:::

::: {.callout-note}
## Links
- Source: [Fastbook Chapter 2 questionnaire solutions (wiki)](https://forums.fast.ai/t/fastbook-chapter-2-questionnaire-solutions-wiki/66392)
:::


<b>
Where do text models currently have major deficiency?
</b>

Currently deep learning is not good at generating correct responses! We don't have a reliable way to, for instance, combine a knowledge base of medical information with a deep learning model for generating medically correct natural language responses. This can be very dangerous, as the layman may not be able to evaluate the factual accuracy of the generated text.

<b>
What are possible negative societal implications of text generation models?
</b>

A negative societal concern is that context-appropriate, highly compelling responses on social media could be used at massive scale, to spread disinformation, create unrest and encourage conflict. 

Models reinforce bias (like gender bias, racial bias) in training data and create a vicious cycle of biased outputs.

<b>
In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?
</b>

One alternative is to use an entirely manual process, with your deep learning model approach running in parallel but not being used directly to drive any actions. The humans involved in the manual process should look at the deep learning output and check whether they make sense.

<b>
What kind of tabular data is deep learning particularly good at?
</b>

It great for analyzing time series and tabular data. Deep learning is also great at increasing columns containing neutral language and high categorical columns.

<b>
What's a key downside of directly using a deep learning model for recommendation systems?
</b>

Almost all machine learning approaches have the downside that they tell you only which product a particular user might like, rather than what recommendations would be helpful for a user. For example, if a user is familiar with other books from the same author, it isn’t helpful to recommend those products even though the user bought the author’s book. Or, recommending products a user may have already purchased.

<b>
What are the steps of the Drivetrain Approach?
</b>

1. Define an objective
2. Understand the levers: what inputs can you control?
3. What data can you collect?
4. model the levers in order to understand how they affect the objective.

The basic idea is to start with considering your objective, than think about what actions you can tak to meet that objective, then think about what actions you can take to meet that objective and what data you have (or can acquire) than can help, and then build a model that you can use to determine tha best actions to take to get the best results in terms of your objective.

<b>
Create an image recognition model using data you curate, and deploy, it on the web.
</b>

Todo. Watch [Lesson 2](https://www.youtube.com/watch?v=BvHmRx14HQ8&t=4485s) for help.

<b>
What is Dataloaders?
</b>

- A DataLoader doesn’t care about preparing data, it expects the data ready to go and only cares about how to load the data (e.g. whether in parallel or in a single process) as well as feeding the data to the model in batches (i.e. batch size)
- A DataLoaders is a thin wrapper for more than one DataLoader

<b>
What four things do we need to tell fastai to create DataLoaders?
</b>

- what kinds of data we working with
- how to get the list of items
- how to label these items
- how to create the validation set

<b>
What does the splitter parameter to DataBlock do?
</b>

The splitter parameter provides argument to tell fastai DataBlock to splits the data in a training and validation data set. For example create a 80% training und 20% validation random split of the data you could use splitter=RandomSplitter(valid_pct=0.2, seed = 42). With the random seed 42 we fix the randomness to get the same result for every run.

<b>
How do we ensure a random split always gives the same validation set?
</b>

Computer normally don't generate truly random outputs they use a pseudo-random-generator which takes a random seed as an input. With random seed input we will get the same result for every run. Using a random seed, we can generate a random split that gives the same training and validation set.

<b>
What letters are often used to signify the independent and dependent variables?
</b>

- x is independent 
    - An independent variable is the variable you manipulate or vary in an experimental study to explore its effects. It’s called “independent” because it’s not influenced by any other variables in the study
- y is dependent
    - A dependent variable is the variable that changes as a result of the independent variable manipulation. It’s the outcome you’re interested in measuring, and it “depends” on your independent variable.

<b>
What the difference between crop, pad, and squish Resize() approaches? When might you choose one over the other?
</b>

The default of the fastai resize function is crop.

- Crop (Zuschneiden) --> Fits the images to fit a square of the size requested, using the full width or height. This can result in losing some important details. For example lossing some key features that are cut out of the image.

- Pad (Auffuellen) --> Pad adds black pixels to reach the requested size. If we pad an image we have a lot of empty space, which is wasted computation for our model, and lowers the effective resolution of the images.

- Squish (Stauchen) --> Squish stretches or squishes the image to the requested size. This can cause the image to take an unnatural form, leading to a model that learns that things look different.

The right resize function depends on the underlying images. If the features of the images are all over the place in the image crop can cut out important features. This can result in loss in information and squishing of padding may be more useful.

Another better method could be RandomResizedCrop, in which we crop a randomly selected region of the image. So every epoch, the model will see a different part of the image and will learn accordingly.

<b>
What is data augmentation? Why is it needed?
</b>

Data augmentation refers to creating random variations of our input data, such that they appear different but do not change the meaning of the data. Examples of common data augmentation techniques for images are rotation, flipping, perspective warping, brightness changes, and contrast changes. Data augmentation is useful for the model to better understand the basic concept of what an object is and how the objects of interest are represented in images. Therefore, data augmentation allows machine learning models to generalize better. This is especially important when it can be slow and expensive to label data.

<b>
Provide an example of where the bear classification model might work poorly in production, due to structural or style differences in the training data.
</b>

- There is no bear in the image and the model has output option for not_a_bear.
- Nighttime images are passed into the model.
- The images vary in resolution. Low-resolution image.
- The bear is really far or near the camera.
- The model is very biased towards one type of features (eg. color.)

<b>
What is the difference between item_tfms and batch_tfms?
</B>

- item_tfms are transformations applied to a single data sample x on the CPU. Resize() is a common transform because the mini-batch of input images to a cnn must have the same dimensions. Assuming the images are RGB with 3 channels, then Resize() as item_tfms will make sure the images have the same with and height.

- batch_tfms are applied to batched data samples (aka individual samples that have been collated into a mini-batch) on the GPU. They are faster and more efficient than item_tfms. A good example of these are the ones provided by aug_transform(). Inside are several batch-level augmentation that help many models.

<b>
What is a confusion matrix?
</b>

In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as error matrix, is specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one; in unsupervised learning it is usually called a matching matrix.

Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class, of vise versa. The name stems from the fact that it makes it easy to see whether the system is confusing two classes.

<b>
What does export save do?
</b>

A model consists of two parts. The architecture and the trained parameters, the best way is to save both of these. In fastai you use the export method. This method even saves the definition of how to create your DataLoaders. This is important because otherwise you would have to redefine how to transform your data in order to use your model in production.

<b>
What is it called when we use a model for getting predictions, instead of training?
</b>

When we use a model for getting predictions, instead of training, we call it inference.

<b>
What are IPyhton widgets?
</b>

IPython widgets are GUI components that bring together Javascript und Python functionality in a web browser and can be created and used within a Jupyter notebook.
One example of these interactive GUI components would be an upload button which can be created with the Python function widgets.FileUpload().

<b>
When might you want to use CPU for deployment? When might GPU be better?
</b>

GPUs are best for doing identical work in parallel. If you will analyzing single pieces of data at time (like a single image or single sentence), then CPUs may be more effective instead, especially with more market competition for CPU servers versus GPU servers. GPUs could be used if you collect user responses into a batch at a time, and perform inference on the batch. This my require the user to wait for model predictions. Additionally, there are many other complexities when it comes to GPU inference, like memory management and queuing of the batches.

<b>
What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC?
</b>

One downside for using a server over a client that you will require a network connection and there will be some latency each time the model is called. Another negative point can be that some users may be concerned uploading sensitive data to your remote server. Managing the complexity and scaling the server can create additional overhead too, whereas if your model runs on the edge devices, each user is

<b>
What are 3 examples of problems that could occur when rolling out a bear warning system in practice?
</b>

The model we trained will likely perform poorly when:

- Handling night-time images
- Dealing with low-resolution images (ex: some smartphone images)
- The model returns prediction too slowly to be useful

<b>
What is "out of domain data"?
</b>

Out of domain data is data that is significantly different in structure or style th those used to train the model. For instance, if there were no black-and-white images in the training data, the model may do poorly on black-and-white images. 

<b>
What is "domain shift"?
</b>

This is when the type of data changes gradually over time. For example, an insurance company is using a deep learning model as part of their pricing algorithm, but over time their customers will be different, with the original training data not being representative of current data, and the deep learning model being applied on effectively out-of-domain data.

<b>
What are the 3 steps in the deployment process?
</b>

1. Manual process
    - Run model in parallel
    - Human check all predictions
2. Limited scope deployment
    - Careful human supervision
    - Time or geography limited
3. Gradual expansion
    - Good reporting systems needed
    - Consider what could go wrong