implemented bottleneck separable convolutions #855

shreydesai · 2019-07-30T17:27:03Z

Summary: Creates bottleneck layers for separable convolutions. Downsampling, convolving, and then upsampling significantly cuts the number of parameters with minimal loss in performance. This diff is a variant of the traditional bottleneck method -- instead of upsampling directly in the pointwise convolution, we split it into two pieces where the first convolution downsamples into a (sufficiently small) low dimension and the second convolution upsamples into the target (higher) dimension.

Differential Revision: D16563566

Summary: Pull Request resolved: facebookresearch#855 Creates bottleneck layers for separable convolutions. Downsampling, convolving, and then upsampling significantly cuts the number of parameters with minimal loss in performance. This diff is a variant of the traditional bottleneck method -- instead of upsampling directly in the pointwise convolution, we split it into two pieces where the first convolution downsamples into a (sufficiently small) low dimension and the second convolution upsamples into the target (higher) dimension. Example: Given an input with 256 channels, the depthwise channel spatially convolves it with 256 channels. Then, instead of projecting the channel space to 256 with the pointwise convolution, it is split into two pieces. The first pointwise convolution projects the channel space to 64, then the second pointwise convolution projects the channel space to 256 -- as we intended. {F172681616} Differential Revision: D16563566 fbshipit-source-id: e742abce052380a5f1174a4180276fdbbacc5b41

Summary: Pull Request resolved: facebookresearch#855 Creates bottleneck layers for separable convolutions. Downsampling, convolving, and then upsampling significantly cuts the number of parameters with minimal loss in performance. This diff is a variant of the traditional bottleneck method -- instead of upsampling directly in the pointwise convolution, we split it into two pieces where the first convolution downsamples into a (sufficiently small) low dimension and the second convolution upsamples into the target (higher) dimension. Example: Given an input with 256 channels, the depthwise channel spatially convolves it with 256 channels. Then, instead of projecting the channel space to 256 with the pointwise convolution, it is split into two pieces. The first pointwise convolution projects the channel space to 64, then the second pointwise convolution projects the channel space to 256 -- as we intended. {F172681616} Differential Revision: D16563566 fbshipit-source-id: bff549728803062045a8540b069791e96c7944f1

Summary: Pull Request resolved: facebookresearch#855 Creates bottleneck layers for separable convolutions. Downsampling, convolving, and then upsampling significantly cuts the number of parameters with minimal loss in performance. This diff is a variant of the traditional bottleneck method -- instead of upsampling directly in the pointwise convolution, we split it into two pieces where the first convolution downsamples into a (sufficiently small) low dimension and the second convolution upsamples into the target (higher) dimension. Example: Given an input with 256 channels, the depthwise channel spatially convolves it with 256 channels. Then, instead of projecting the channel space to 256 with the pointwise convolution, it is split into two pieces. The first pointwise convolution projects the channel space to 64, then the second pointwise convolution projects the channel space to 256 -- as we intended. {F172681616} Reviewed By: geof90 Differential Revision: D16563566 fbshipit-source-id: dd44d8eebea00dbb33130febd6fc7bda9d735aa7

Summary: Pull Request resolved: facebookresearch#855 Creates bottleneck layers for separable convolutions. Downsampling, convolving, and then upsampling significantly cuts the number of parameters with minimal loss in performance. This diff is a variant of the traditional bottleneck method -- instead of upsampling directly in the pointwise convolution, we split it into two pieces where the first convolution downsamples into a (sufficiently small) low dimension and the second convolution upsamples into the target (higher) dimension. Example: Given an input with 256 channels, the depthwise channel spatially convolves it with 256 channels. Then, instead of projecting the channel space to 256 with the pointwise convolution, it is split into two pieces. The first pointwise convolution projects the channel space to 64, then the second pointwise convolution projects the channel space to 256 -- as we intended. {F172681616} Reviewed By: geof90 Differential Revision: D16563566 fbshipit-source-id: 1cf310315739802aa5ea6a34efb4c8ab771a3c63

facebook-github-bot · 2019-08-04T21:18:05Z

This pull request has been merged in 352b8be.

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jul 30, 2019

shreydesai force-pushed the export-D16563566 branch from db40d89 to 4409772 Compare August 2, 2019 18:04

shreydesai force-pushed the export-D16563566 branch from 4409772 to e34a891 Compare August 2, 2019 18:14

shreydesai force-pushed the export-D16563566 branch from e34a891 to 3f5074c Compare August 2, 2019 19:52

shreydesai force-pushed the export-D16563566 branch from 3f5074c to d39c777 Compare August 4, 2019 19:41

facebook-github-bot closed this in 352b8be Aug 4, 2019

facebook-github-bot added the Merged label Aug 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implemented bottleneck separable convolutions #855

implemented bottleneck separable convolutions #855

shreydesai commented Jul 30, 2019

facebook-github-bot commented Aug 4, 2019

implemented bottleneck separable convolutions #855

implemented bottleneck separable convolutions #855

Conversation

shreydesai commented Jul 30, 2019

facebook-github-bot commented Aug 4, 2019