# Questionnaire Chapter 2: From Model to Production

- [Fastbook Chapter 2 Questionnaire Forum](https://forums.fast.ai/t/fastbook-chapter-2-questionnaire-solutions-wiki/66392)

1. Provide an example of where the bear classification model might work poorly in production, due to structural or style differences in the training data.

> 1. Working with video data instead of images.
> 2. Handling nighttime images.
> 3. Low-resolution camera images.
> 4. Ensuring results are returned fast enough to be useful in practice.
> 5. Recognizing bears in positions rarely seen in photos.

2. Where do text models currently have a major deficiency?

> Text models can generate context-appropriate text (like replies or imitating author style). However, text models still struggle with correct responses. Given factual information (such as a knowledge base), it is still hard to generate responses that utilizes this information to generate factually correct responses, though the text can seem very compelling. This can be very dangerous, as the layman may not be able to evaluate the factual accuracy of the generated text.

3. What are possible negative societal implications of text generation models?

> Spread of disinformation in a massive scale ("fake news") and encourage conflict.

4. In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?

> Have a human being review the predictions of the model to evaluate the results and determine what is the best next step.
>
>For example, a machine learning model for identifying strokes in CT scans can alert high priority cases for expedited review, while other cases are still sent to radiologists for review. Or other models can also augment the medical professional’s abilities, reducing risk but still improving efficiency of the workflow. For example, deep learning models can provide useful measurements for radiologists or pathologists.


5. What kind of tabular data is deep learning particularly good at?

> Deep learning has been making great improvements in time series data. Moreover, deep learning is very useful for increasing the variety of columns that can be included in tabular data, such as, columns containing natural language (book titles, reviews, etc.) and high-cardinality categorical columns (i.e., something taht contains a large number of discrete choices, such as zip code or product ID).
>
> However, deep learning models genearally take longer to train than simpler models like random forest or gradient boosting machines. Although, this is changing thanks to new libraries and improved GPUs.

6. What's a key downside of directly using a deep learning model for recommendation systems?

> Deep learning models only tell which products a particular user might like, rather than what recommendations would be helpful for a user. Many recommendations might not be helpful, for instance, if a user is already familiar with the products.

7. What are the steps of the Drivetrain Approach?

> The Drivetrain approach ensures that your modeling work is useful in practice. It considers the following:
> 1. __*Defined objective:*__ what outcome am I trying to achieve? (e.g., Google's objective is to "show the most relevant search results")
> 2. __*Levers:*__ what inputs can we control? (e.g., in Google's case that was the rankings of the search results)
> 3. __*Data:*__ what inputs we can collect? (e.g., Google realized that they could collect information regarding which pages linked to which other pages that was used for this purpose)
> 4. __*Model:*__ how the levers influence the objective (the objetive, levers, available data, and additional data we will collect, determine the models we can build)

8. How do the steps of the Drivetrain Approach map to a recommendation system?

> The __*objective*__ of a recommendation engine is to drive sales by surprising and delighting the customer with recommendations of items they would not have purchased without the recommendation. The __*lever*__ is the ranking of the recommendations. New __*data*__ must be collected to generate recommendations that will __*cause new sales*__. We can build two __*models*__ for purchase probabilities, conditional on seeing or not seeing a recommendation (A/B testing). The difference between these two probabilities is a utility function for a given recommendation to a customer.

9. Create an image recognition model using data you curate, and deploy it on the web.

> To be done by the reader. 

10. What is `DataLoaders`?

> It's a class that stores whatever `DataLoader` objects you pass to it and makes them available as `train` and `valid`.

11. What four things do we need to tell fastai to create `DataLoaders`?

> 1. What kinds of data we are working with
> 2. How to get the list of items
> 3. How to label these items
> 4. How to create the validation set

12. What does the `splitter` parameter to `DataBlock` do?

> 

13. How do we ensure a random split always gives the same validation set?
14. What letters are often used to signify the independent and dependent variables?
15. What's the difference between the crop, pad, and squish resize approaches? When might you choose one over the others?
16. What is data augmentation? Why is it needed?
17. What is the difference between `item_tfms` and `batch_tfms`?
18. What is a confusion matrix?
19. What does `export` save?

> It saves the __*architecture*__ and the __*parameters*__ of a trained model to a file. Moreover, this file can be loaded and used to make predictions. It also saves how the `DataLoaders` is defined.

20. What is it called when we use a model for getting predictions, instead of training?

> __*Inference*__

21. What are IPython widgets?
22. When might you want to use CPU for deployment? When might GPU be better?
23. What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC?
24. What are three examples of problems that could occur when rolling out a bear warning system in practice?
25. What is "out-of-domain data"?
26. What is "domain shift"?
27. What are the three steps in the deployment process?