Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #77 when merged.
This is an initial draft but it motivates 2 discussions: host callbacks and reusable model blocks.
In order to perform side-effecting operations with hooks, Nx needs to support host callbacks. Once that's done, it likely means that hooks will always be implemented as host callbacks, but the implementation will need to evolve once we support them. I don't see the API being impacted much once we introduce host callbacks.
As for model blocks: In PyTorch has two kinds of hooks - tensor hooks and module hooks. In Axon we're more focused on module-style hooks for now. Module hooks have the following signature:
hook(module, input, output) :: None
. The equivalent in Axon would be:hook(axon_struct, input, output) :: nil
. For both forward and backward hooks, even if they are only meant to be side-effecting operations, I'd prefer they always return an output such that after a layer the output is basically what's shown in this PR.One issue is that we're currently unable to support hooks registered on large blocks of a model. For example, if I have a block of Convs:
and I tried to register a hook, I would only have access to the input to the last conv in the block, not
x
. This motivates introducing a module-like abstraction which allows you to treat groups of layers as a single block. This is essentially the same issue we have in #59 as well.