Skip to content

Multiple issues training a conv model with torchinductor #93670

@anj-s

Description

@anj-s

I ran a simple E2E example with a conv model + MNIST data and these are the errors I ran into:

  1. TensorBoard profiling does not work
    --> works with nvfuser and without the torchdynamo annotation
  2. TORCHINDUCTOR_TRACE = 1 raises an error
    --> has trouble writing the pre fusion IR graph for some reason. i did turn on graph writing manually in the config file.
  3. max_pool2d_with_indices_backward and max_pool2d_with_indices lowering throw an assertion error in len(strides) == 2.
    --> we expect strides to be None or a value but it can be an empty list which we don't check for.
    See paste for full repro, stack trace and other debugging info. If you don't have access to the repo, here is the code

cc @ezyang @soumith @msaroufim @wconstab @ngimel @bdhirsh @voznesenskym

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions