Skip to content

Conversation

TroyGarden
Copy link
Contributor

Summary:

context

  • There's quite some limitations on the postproc support for TorchRec's train pipeline
  • add better warning message for debugging

symptoms

  • unable to run input_dist in "-1" batch with the SparseDistTrainPipeline, AKA, SDD (Sparse Data Dist) pipeline
  • warning in log: Module '{node.target}' will NOT be pipelined, due to input modifications

typical issues

  • root cause: input KJT is modified or passed through some module/function potentially modifies the KJT
  • pipeline_postproc is not enabled
  • check the error message for fx node {child_node.name, child_node.op, child_node.target} can't be handled correctly for postproc module
  • postproc module has trainable weights (sorry we don't support this)
  • a postproc function modifies the input KJT
  • two postproc modules have certain execution order

workaround

  • make the postproc function a nn.Module
  • put order-dependent functions/modules under the same nn.Module to preserve the order.

Differential Revision: D82591429

Summary:
# context
* There's quite some limitations on the postproc support for TorchRec's train pipeline
* add better warning message for debugging

## symptoms
* unable to run input_dist in "-1" batch with the `SparseDistTrainPipeline`, AKA, SDD (Sparse Data Dist) pipeline
* warning in log: `Module '{node.target}' will NOT be pipelined, due to input modifications`

## typical issues
* root cause: input KJT is modified or passed through some module/function potentially modifies the KJT
* pipeline_postproc is not enabled
* check the error message for `fx node {child_node.name, child_node.op, child_node.target} can't be handled correctly for postproc module`
* postproc module has trainable weights (sorry we don't support this)
* a postproc function modifies the input KJT
* two postproc modules have certain execution order

##  workaround
* make the postproc function a nn.Module
* put order-dependent functions/modules under the same nn.Module to preserve the order.

Differential Revision: D82591429
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 16, 2025
@facebook-github-bot
Copy link
Contributor

@TroyGarden has exported this pull request. If you are a Meta employee, you can view the originating diff in D82591429.

@TroyGarden TroyGarden deleted the export-D82591429 branch September 19, 2025 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants