# Questionnaire Chapter 2: From Model to Production

- [Fastbook Chapter 2 Questionnaire Forum](https://forums.fast.ai/t/fastbook-chapter-2-questionnaire-solutions-wiki/66392)

1. Provide an example of where the bear classification model might work poorly in production, due to structural or style differences in the training data.

> 1. Working with video data instead of images.
> 2. Handling nighttime images.
> 3. Low-resolution camera images.
> 4. Ensuring results are returned fast enough to be useful in practice.
> 5. Recognizing bears in positions rarely seen in photos.

2. Where do text models currently have a major deficiency?

> Text models can generate context-appropriate text (like replies or imitating author style). However, text models still struggle with correct responses. Given factual information (such as a knowledge base), it is still hard to generate responses that utilizes this information to generate factually correct responses, though the text can seem very compelling. This can be very dangerous, as the layman may not be able to evaluate the factual accuracy of the generated text.

3. What are possible negative societal implications of text generation models?

> Spread of disinformation in a massive scale ("fake news") and encourage conflict.

4. In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?

> Have a human being review the predictions of the model to evaluate the results and determine what is the best next step.
>
>For example, a machine learning model for identifying strokes in CT scans can alert high priority cases for expedited review, while other cases are still sent to radiologists for review. Or other models can also augment the medical professional’s abilities, reducing risk but still improving efficiency of the workflow. For example, deep learning models can provide useful measurements for radiologists or pathologists.


5. What kind of tabular data is deep learning particularly good at?

> Deep learning has been making great improvements in time series data. Moreover, deep learning is very useful for increasing the variety of columns that can be included in tabular data, such as, columns containing natural language (book titles, reviews, etc.) and high-cardinality categorical columns (i.e., something taht contains a large number of discrete choices, such as zip code or product ID).
>
> However, deep learning models genearally take longer to train than simpler models like random forest or gradient boosting machines. Although, this is changing thanks to new libraries and improved GPUs.

6. What's a key downside of directly using a deep learning model for recommendation systems?

> Deep learning models only tell which products a particular user might like, rather than what recommendations would be helpful for a user. Many recommendations might not be helpful, for instance, if a user is already familiar with the products.

7. What are the steps of the Drivetrain Approach?

> The Drivetrain approach ensures that your modeling work is useful in practice. It considers the following:
> 1. __*Defined objective:*__ what outcome am I trying to achieve? (e.g., Google's objective is to "show the most relevant search results")
> 2. __*Levers:*__ what inputs can we control? (e.g., in Google's case that was the rankings of the search results)
> 3. __*Data:*__ what inputs we can collect? (e.g., Google realized that they could collect information regarding which pages linked to which other pages that was used for this purpose)
> 4. __*Model:*__ how the levers influence the objective (the objetive, levers, available data, and additional data we will collect, determine the models we can build)

8. How do the steps of the Drivetrain Approach map to a recommendation system?

> The __*objective*__ of a recommendation engine is to drive sales by surprising and delighting the customer with recommendations of items they would not have purchased without the recommendation. The __*lever*__ is the ranking of the recommendations. New __*data*__ must be collected to generate recommendations that will __*cause new sales*__. We can build two __*models*__ for purchase probabilities, conditional on seeing or not seeing a recommendation (A/B testing). The difference between these two probabilities is a utility function for a given recommendation to a customer.

9. Create an image recognition model using data you curate, and deploy it on the web.

> To be done by the reader. 

10. What is `DataLoaders`?

> It's a class that stores whatever `DataLoader` objects you pass to it and makes them available as `train` and `valid`.

11. What four things do we need to tell fastai to create `DataLoaders`?

> 1. What kinds of data we are working with
> 2. How to get the list of items
> 3. How to label these items
> 4. How to create the validation set

12. What does the `splitter` parameter to `DataBlock` do?

> It splits the dataset into subsets, usually train and validation sets. For example, to randomly split the data, you can use fastai’s predefined `RandomSplitter` class, providing it with the proportion of the data used for validation.

13. How do we ensure a random split always gives the same validation set?

> We can use a `seed` value that sets the pseudo-random generator to always generate the same "random" numbers in a fixed manner and it will be the same for every run.

14. What letters are often used to signify the independent and dependent variables?

> __x__ is independent (data used to make predictions).
> 
> __y__ is dependent (target).

15. What's the difference between the crop, pad, and squish resize approaches? When might you choose one over the others?

> default resized (cropped) grizzly image:
>
> <img src="https://miro.medium.com/max/1400/1*nOYSIcIZhLraYrjBJj_zcQ.png" width=400 height=400 />
>
> `crop` is the default `Resize()` method, and it crops the images to fit a square shape of the size requested, using the full width or height. This can result in losing some important details. For instance, if we were trying to recognize the breed of dog or cat, we may end up cropping out a key part of the body or the face necessary to distinguish between similar breeds.
>
> `pad` is an alternative `Resize()` method, which pads the matrix of the image’s pixels with zeros (which shows as black when viewing the images). If we pad the images then we have a whole lot of empty space, which is just wasted computation for our model, and results in a lower effective resolution for the part of the image we actually use.
>
> padded grizzly image:
>
> <img src="https://miro.medium.com/max/1400/1*1DJgD5V0cNo5_HcOZZ4jAg.png" width=400 height=400 />
>
> `squish` is another alternative `Resize()` method, which can either squish or stretch the image. This can cause the image to take on an unrealistic shape, leading to a model that learns that things look different to how they actually are, which we would expect to result in lower accuracy.
>
>grizzly image squished:
> 
> <img src="https://miro.medium.com/max/1400/1*eHzBqwEVX8aLMogg8D9oHQ.png" width=400 height=400 />
>
> Which resizing method to use therefore depends on the underlying problem and dataset. For example, if the features in the dataset images take up the whole image and cropping may result in loss of information, squishing or padding may be more useful.
>
> Another better method is `RandomResizedCrop`, in which we crop on a randomly selected region of the image. So every epoch, the model will see a different part of the image and will learn accordingly.

16. What is data augmentation? Why is it needed?

> __*Data augmentation*__ is the method for creating random variations of the input data. The data appears different but do not change its meaning. It can be done by rotation, flipping, perspective warping, brightness changes, and contrast changes.

17. What is the difference between `item_tfms` and `batch_tfms`?

> In deep learning, we usually feed multiple images at a time to the model, instead of a single image (this is called _mini_batch_).
>
> `item_tfms` are transformations applied to a single data sample x on the CPU. `Resize()` is a common transform because the mini-batch of input images to a cnn must have the same dimensions. Assuming the images are RGB with 3 channels, then `Resize()` as `item_tfms` will make sure the images have the same width and height.
>
> `item_tfms=Resize(128)`
>
> `batch_tfms` are applied to batched data samples (aka individual samples that have been collated into a mini-batch) on the GPU. They are faster and more efficient than `item_tfms`. A good example of these are the ones provided by `aug_transforms()`. Inside are several batch-level augmentations that help many models.
>
> `batch_tfms=aug_transforms()`

18. What is a confusion matrix?

> A __*confusion matrix*__ is a table that is used to define the performance of a classification algorithm. A confusion matrix visualizes and summarizes the performance of a classification algorithm.

19. What does `export` save?

> It saves the __*architecture*__ and the __*parameters*__ of a trained model to a file. Moreover, this file can be loaded and used to make predictions. It also saves how the `DataLoaders` is defined.

20. What is it called when we use a model for getting predictions, instead of training?

> __*Inference*__

21. What are IPython widgets?

> __*IPython widgets*__ are GUI components that bring together JavaScript and Python functionality in a web browser, and can be created and used in a Jupyter notebook.

22. When might you want to use CPU for deployment? When might GPU be better?

> __*CPU*__ is useful when we're doing small tasks at a time, like classifying one image at a time. A CPU would be best suited for deploying the model on a  web application.
>
> __*GPU*__ is useful only when doing lots of identical work in parallel, like performing a lot of computations when training a deep learning model. A GPU would be best suited for training the model.

23. What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC?

> The application will require network connection, and there will be extra network latency time when submitting input and returning results. Additionally, sending private data to a network server can lead to security concerns.
>
> On the flip side deploying a model to a server makes it easier to iterate and roll out new versions of a model. This is because you as a developer have full control over the server environment and only need to do it once rather than having to make sure that all the endpoints (phones, PCs) upgrade their version individually.

24. What are three examples of problems that could occur when rolling out a bear warning system in practice?

> The model we trained will likely perform poorly when:
> 1. Handling night-time images
> 2. Dealing with low-resolution images (ex: some smartphone images)
> 3. The model returns prediction too slowly to be useful

25. What is "out-of-domain data"?

> It's data the model sees in production but was not used during training. For example, an object detector that was trained exclusively with outside daytime photos is given a photo taken at night.

26. What is "domain shift"?

> It's when the type of data that the model sees changes over time. For example, an insurance company is using a deep learning model as part of their pricing algorithm, but over time their customers will be different, with the original training data not being representative of current data, and the deep learning model being applied on effectively out-of-domain data.

27. What are the three steps in the deployment process?

> 1. Manual process – the model is run in parallel and not directly driving any actions, with humans still checking the model outputs.
> 2. Limited scope deployment – The model’s scope is limited and carefully supervised. For example, doing a geographically and time-constrained trial of model deployment, that is carefully supervised.
> 3. Gradual expansion – The model scope is gradually increased, while good reporting systems are implemented in order to check for any significant changes to the actions taken compared to the manual process (i.e. the models should perform similarly to the humans, unless it is already anticipated to be better).
