[compiled autograd][cppnode] eliminate recompiles on ctx->saved_data

### 🚀 The feature, motivation and pitch

User-defined autograd functions may use ctx->saved_data to pass non-Tensor activations e.g. int, float, bool, etc. These are passed as [IValues](https://github.com/pytorch/pytorch/blob/18e75c098b219f2d5103e93c60f1de91fd06fa3e/aten/src/ATen/core/ivalue.h#L221), and currently we specialize on them, resulting in recompiles any time there's any value change.

Workaround that exist today:
- move all ctx->saved_data scalar conversions into a custom op to hide it from the compiler
- rewrite the c++ autograd function to only use tensors
- (if you're just trying to get perf numbers while this is unsupported) build from source and comment out https://github.com/pytorch/pytorch/blob/18e75c098b219f2d5103e93c60f1de91fd06fa3e/torch/csrc/autograd/custom_function.h#L205

Supporting dynamism for non-Tensors requires a bit of work because:
- The user-defined backward will generally call conversion methods e.g. [FBGEMM PermutePooledEmbsFunction](https://github.com/pytorch/FBGEMM/blob/eb73980029c39ed7dbd4bf963fc28bb7f5012590/fbgemm_gpu/src/permute_pooled_embedding_ops/permute_pooled_embedding_function.cpp#L70-L71) `ctx->saved_data["allow_duplicates"].toBool();`, which will return a new object https://github.com/pytorch/pytorch/blob/3d56673b24073235eb3beff3f8cf1a88a91ba459/aten/src/ATen/core/ivalue.h#L1341-L1354 This is annoying for tracing a dynamic graph, since we want all scalars to use their symbolic equivalents but the API returns a new object that cannot be swapped by compiled autograd
- Dynamo has a few different codepaths to specialize on scalar values, but these lifted scalars from saved activations frequently change and specialization here will cause us to recompile

Proposed fixes:
1. Derived class that makes IValue underlying scalars swappable and override conversion methods to return symbolic variables tracked by compiled autograd
2. Dynamo + dynamic support for keeping SymFloat/SymBool/(Sym?)String as lifted inputs

### Alternatives

A possible, but annoying alternative, is to ask users to rewrite their autograd function to only use Tensors, and move their Tensor to scalar conversion in custom ops

### Additional context

_No response_

cc @ezyang @anijain2305 @chauhang @penguinwu

	union TriviallyCopyablePayload {
	TriviallyCopyablePayload() : as_int(0) {}
	int64_t as_int;
	double as_double;
	bool as_bool;
	// Invariant: never nullptr; null state is represented as
	// c10::UndefinedTensorImpl::singleton() for consistency of
	// representation with Tensor.
	c10::intrusive_ptr_target* as_intrusive_ptr;
	struct {
	c10::DeviceType type;
	DeviceIndex index;
	} as_device;
	} u;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[compiled autograd][cppnode] eliminate recompiles on ctx->saved_data #130170

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[compiled autograd][cppnode] eliminate recompiles on ctx->saved_data #130170

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions