Because I am researching object tracking tasks, I noticed that the volume calculation function takes dimensions (B, Fea_dim). Is this the feature dimension for classification outputs? In other words, is it the class token? For tracking tasks, our feature dimensions vary significantly, changing to (B, N, Fea_dim). Is the GRAM method you proposed effective for such multi-token tasks?