We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
感谢你的回复, 我又回过头看了关于token semantic module 部分的代码,仍然有一些不明白的地方要请教你,
B, N, C = x.shape SF = frames_num // (2 ** (len(self.depths) - 1)) // self.patch_stride[0] ST = frames_num // (2 ** (len(self.depths) - 1)) // self.patch_stride[1] x = x.permute(0,2,1).contiguous().reshape(B, C, SF, ST) B, C, F, T = x.shape # group 2D CNN c_freq_bin = F // self.freq_ratio x = x.reshape(B, C, F // c_freq_bin, c_freq_bin, T) x = x.permute(0,1,3,2,4).contiguous().reshape(B, C, c_freq_bin, -1) x = self.tscam_conv(x) x = torch.flatten(x, 2) # B, C, T
1.group 2D 那里,前面已经将特征图分成了SF,ST 的形式了,这里的关于特征图的形状的变换的作用是什么?
2.self.tscam_conv处理后的特征图的形状变成了B,Class,T',那么这个T'是有什么物理意义在里面吗?
3.上述操作完成了我看到程序中对x进行上采样来生成fpx作为framewise_output,并用它来做定位(确定开始结束时间?),那么这个fpx为什么来用来做定位,以及fpx(B,1024,527)的1024的物理意义是什么?
希望您能抽空解决我的疑惑
Originally posted by @dong-0412 in #19 (comment)
The text was updated successfully, but these errors were encountered:
Sorry, something went wrong.
No branches or pull requests
感谢你的回复,
我又回过头看了关于token semantic module 部分的代码,仍然有一些不明白的地方要请教你,
1.group 2D 那里,前面已经将特征图分成了SF,ST 的形式了,这里的关于特征图的形状的变换的作用是什么?
2.self.tscam_conv处理后的特征图的形状变成了B,Class,T',那么这个T'是有什么物理意义在里面吗?
3.上述操作完成了我看到程序中对x进行上采样来生成fpx作为framewise_output,并用它来做定位(确定开始结束时间?),那么这个fpx为什么来用来做定位,以及fpx(B,1024,527)的1024的物理意义是什么?
希望您能抽空解决我的疑惑
Originally posted by @dong-0412 in #19 (comment)
The text was updated successfully, but these errors were encountered: