8.2.2 Computer vision

[M[ For neural networks that work with images like VGG-19, InceptionNet, you often see a visualization of what type of features each filter captures. How are these visualizations created?

Hint: check out this Distill post on Feature Visualization.
Filter size.
1. [M] How are your model’s accuracy and computational efficiency affected when you decrease or increase its filter size?
2. [E] How do you choose the ideal filter size?
[M] Convolutional layers are also known as “locally connected.” Explain what it means.
[M] When we use CNNs for text data, what would the number of channels be for the first conv layer?
[E] What is the role of zero padding?
[E] Why do we need upsampling? How to do it?
[M] What does a 1x1 convolutional layer do?
Pooling.
1. [E] What happens when you use max-pooling instead of average pooling?
2. [E] When should we use one instead of the other?
3. [E] What happens when pooling is removed completely?
4. [M] What happens if we replace a 2 x 2 max pool layer with a conv layer of stride 2?
[M] When we replace a normal convolutional layer with a depthwise separable convolutional layer, the number of parameters can go down. How does this happen? Give an example to illustrate this.
[M] Can you use a base model trained on ImageNet (image size 256 x 256) for an object classification task on images of size 320 x 360? How?
[H] How can a fully-connected layer be converted to a convolutional layer?
[H] Pros and cons of FFT-based convolution and Winograd-based convolution.

Hint: Read Fast Algorithms for Convolutional Neural Networks (Andrew Lavin and Scott Gray, 2015)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8.2.2-computer-vision.md

8.2.2-computer-vision.md

8.2.2 Computer vision

Files

8.2.2-computer-vision.md

Latest commit

History

8.2.2-computer-vision.md

File metadata and controls

8.2.2 Computer vision