Skip to content

ERNIE-Image RoPE implementation: Shape mismatch between freqs_cis and chunking? #13546

@Rauyolo

Description

@Rauyolo

I was looking into the RoPE implementation for ERNIE-Image and noticed something interesting that I wanted to ask about.

It looks like the freqs_cis tensor uses an interleaved format (e.g., repeating pairs like [1.0, 1.0, 0.707, 0.707...]), but the actual rotation logic uses Megatron-style blocked chunking. Here is the snippet for context:

# Apply RoPE: same rotate_half logic as Megatron _apply_rotary_pos_emb_bshd (rotary_interleaved=False)
# x_in: [B, S, heads, head_dim], freqs_cis: [B, S, 1, head_dim] with angles [θ0,θ0,θ1,θ1,...]
def apply_rotary_emb(x_in: torch.Tensor, freqs_cis: torch.Tensor) -> torch.Tensor:
    rot_dim = freqs_cis.shape[-1]
    x, x_pass = x_in[..., :rot_dim], x_in[..., rot_dim:]
    cos_ = torch.cos(freqs_cis).to(x.dtype)
    sin_ = torch.sin(freqs_cis).to(x.dtype)
    # Non-interleaved rotate_half: [-x2, x1]
    x1, x2 = x.chunk(2, dim=-1)
    x_rotated = torch.cat((-x2, x1), dim=-1)
    return torch.cat((x * cos_ + x_rotated * sin_, x_pass), dim=-1)

If I am understanding the code correctly, applying interleaved frequencies to [-x2, x1] blocks means the two halves of a coordinate pair might be rotated by different angles. For example, the first element x[0] gets paired with -x[D/2] and multiplied by theta_0, but x[D/2] gets paired with x[0] and multiplied by theta_{D/4}.

To match the x.chunk(2, dim=-1) logic, shouldn't freqs_cis be formatted in blocks like [theta_0, theta_1, ..., theta_0, theta_1, ...] instead of being interleaved?

Is this asymmetric rotation a known behavior from training, or am I missing something about how this specific implementation is supposed to work mathematically? I tried changing the alignment so the frequencies match the blocks, and as expected, the model's output turns into garbage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions