Skip to content

Latest commit

 

History

History
24 lines (21 loc) · 1.76 KB

8.2.2-computer-vision.md

File metadata and controls

24 lines (21 loc) · 1.76 KB

8.2.2 Computer vision

  1. [M[ For neural networks that work with images like VGG-19, InceptionNet, you often see a visualization of what type of features each filter captures. How are these visualizations created?

    Hint: check out this Distill post on Feature Visualization.

  2. Filter size.

    1. [M] How are your model’s accuracy and computational efficiency affected when you decrease or increase its filter size?
    2. [E] How do you choose the ideal filter size?
  3. [M] Convolutional layers are also known as “locally connected.” Explain what it means.

  4. [M] When we use CNNs for text data, what would the number of channels be for the first conv layer?

  5. [E] What is the role of zero padding?

  6. [E] Why do we need upsampling? How to do it?

  7. [M] What does a 1x1 convolutional layer do?

  8. Pooling.

    1. [E] What happens when you use max-pooling instead of average pooling?
    2. [E] When should we use one instead of the other?
    3. [E] What happens when pooling is removed completely?
    4. [M] What happens if we replace a 2 x 2 max pool layer with a conv layer of stride 2?
  9. [M] When we replace a normal convolutional layer with a depthwise separable convolutional layer, the number of parameters can go down. How does this happen? Give an example to illustrate this.

  10. [M] Can you use a base model trained on ImageNet (image size 256 x 256) for an object classification task on images of size 320 x 360? How?

  11. [H] How can a fully-connected layer be converted to a convolutional layer?

  12. [H] Pros and cons of FFT-based convolution and Winograd-based convolution.

    Hint: Read Fast Algorithms for Convolutional Neural Networks (Andrew Lavin and Scott Gray, 2015)