Skip to content

fix: Cache XLNet relative_positional_encoding to avoid CPU computation#44762

Closed
BillionClaw wants to merge 1 commit intohuggingface:mainfrom
BillionClaw:clawoss/fix/xlnet-relative-positional-encoding-device
Closed

fix: Cache XLNet relative_positional_encoding to avoid CPU computation#44762
BillionClaw wants to merge 1 commit intohuggingface:mainfrom
BillionClaw:clawoss/fix/xlnet-relative-positional-encoding-device

Conversation

@BillionClaw
Copy link
Copy Markdown

@BillionClaw BillionClaw commented Mar 16, 2026

XLNet.relative_positional_encoding creates intermediate tensors on CPU every forward pass because torch.arange was missing the device parameter. This causes unnecessary CPU-GPU transfers when running on CUDA.

Added device=self.device to all 4 torch.arange calls in the method.

Fixes #44737

…al_encoding

The relative_positional_encoding method was creating tensors on CPU every
forward pass because torch.arange was not using the device parameter.
This caused unnecessary CPU-GPU transfers when running on CUDA.

Fixes huggingface#44737
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: xlnet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

XLNet: relative_positional_encoding computes on CPU every forward pass (missing device= in torch.arange)

2 participants