Low utilization of 3D convolution

## ❓ Questions and Help

Hello, all!
I am running a 3D CNN in TPU v3-8, and the computation seems to be not well optimized.

**In short, the majority of the computation time seems to be wasted due to excessive padding in my first convolution**.

## Background Information
* PyTorch-XLA version: 1.9 (installed by the prebuilt docker `gcr.io/tpu-pytorch/xla:r1.9`)
* PyTorch version: 1.9.0a0+git1a7c23c
* TPU version: v3-8 (software version: pytorch-1.9)
* I have profiled the computation with TensorBoard, following [the official profiling guide](https://cloud.google.com/tpu/docs/pytorch-xla-performance-profiling-tpu-vm).
* I have found weird padding remarks, which is a possible source of low utilization as introduced [here](https://cloud.google.com/tpu/docs/cloud-tpu-tools).

## Observation
Below is the screenshot of the TensorBoard profiling result (op_profile page)
* Note that BATCH / FEATURE dimensions are padded, and the wasted time is 27% of all time.
<img width="2239" alt="Screen Shot 2021-10-25 at 5 50 42 PM" src="https://user-images.githubusercontent.com/5811413/138671506-a60eaeca-fa37-43c4-b854-570b80b73c6a.png">

Below is the PyTorch definition of the very first convolution:
```
self.in_ch = 64
self.inc = nn.Sequential(nn.Conv3d(n_channels, self.in_ch, 7, padding=3), nn.BatchNorm3d(64), nn.ReLU(inplace=True))
```
* The convolution has `12x1x96x96x96` (BCTHW) as the input, with 7x7x7 3D convolution, 64 output channels, and 3px paddings.
* The overall TPU FLOPS utilization is 13%, and memory bandwidth utilization is 21% (at the top of op_profile page)
* The maximum batch size I can use is 18, but the utilization is still very low.

## Question
1. Am I interpreting the result correctly? It seems that there are a lot of room to optimize.
2. Is there any best-practice for optimizing this low utilization issue?
3. Is 3D convolution fully optimized in PyTorch-XLA (I assume that 2D conv must have been fully optimized)?

Please excuse my ignorance, as I am just a beginner of using PyTorch-XLA / TPU.
Any help or suggestions would be appreciated. Thank you in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Low utilization of 3D convolution #3180

❓ Questions and Help

Background Information

Observation

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Low utilization of 3D convolution #3180

Description

❓ Questions and Help

Background Information

Observation

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions