我已经开始测试了, 期待我的测试结果 #3

Hanzc989 · 2024-04-23T03:51:48Z

用我们自己的SR Dataset 开始测试了, 58W张 720x720 的高清图, 数据分布非常好 :) 相信我 :)~
已经跑起来,开始train 了, 不过 train 起来是真的慢啊, MSE model 需要 27 天 :( 然后 GAN 估计还需要27 天

27 天啊, A100 x4 .
不过为了保证质量, options 文件做了点修改:
gt_size: 384
....
network_g:
type: DRCT
upscale: 2
in_chans: 3
img_size: 64
window_size: 16
compress_ratio: 3
squeeze_factor: 30
conv_scale: 0.01
overlap_ratio: 0.5
img_range: 1.
depths: [6, 6, 6, 6, 6, 6]
embed_dim: 180
num_heads: [6, 6, 6, 6, 6, 6]
mlp_ratio: 2
upsampler: 'pixelshuffle'
resi_connection: '1conv'
....
请问, 这会影响最终的质量吗 ?
另外, 为何如此之慢 :(

ming053l · 2024-04-25T02:59:16Z

感谢！不好意思晚回复了，
因为組里面只有我一人在维护repository，再加上我目前只有两张V100可用
所以更新回比较慢，这点我感到很不好意思
也感谢您协助我们训练Real_DRCT_GAN，目前看来可能会花费您不少时间，再次感谢
－－－
您更改的options 应该是没问题的，事實上这个work是基于HAT的环境去改的
将HAT official repository的Real_HAT_GAN的HAT的部分换成DRCT就能运行
－－－
我阵子有试了把DRCT和Mamba融合在一起(STL换成SS2D)
但是或許因为SS2D的recurrent特性配合上Dense-residual connection的性质
也就是feature map要一直hold on在memory上导致不好训练
即便inference能比HAT快，但相对不好训练是目前观察到的潜在问题

wangxinchao-bit · 2024-04-26T10:38:58Z

感觉原始的图像尺寸还是稍微比较大，图像尺寸比较大，IO加载的时间就很慢，我之前直接加载原图，4x4090 得跑7天，然后图像裁切为480*480之后(数量也对应增多了)，采用单卡4090就只需要两天(针对我的同一个模型)，你可以看看是不是这个原因

FlotingDream · 2024-04-28T04:30:23Z

感谢！不好意思晚回复了，因为組里面只有我一人在维护repository，再加上我目前只有两张V100可用所以更新回比较慢，这点我感到很不好意思也感谢您协助我们训练Real_DRCT_GAN，目前看来可能会花费您不少时间，再次感谢－－－您更改的options 应该是没问题的，事實上这个work是基于HAT的环境去改的将HAT official repository的Real_HAT_GAN的HAT的部分换成DRCT就能运行－－－我阵子有试了把DRCT和Mamba融合在一起(STL换成SS2D) 但是或許因为SS2D的recurrent特性配合上Dense-residual connection的性质也就是feature map要一直hold on在memory上导致不好训练即便inference能比HAT快，但相对不好训练是目前观察到的潜在问题

I think maybe like in paper rank1 "replaced all the Hybrid Attention Blocks (HAB) of HAT with SSFormer Blocks". in drct maybe replace the w-msa in stl with SS2D or use SS2D+ CA. may get new sota?

in https://arxiv.org/abs/2404.09790
XiaomiMM
Description.
The solution proposed by the XiaomiMM is illustrated in Fig. 1. The characteristic of Mamba [49] is its ability to model long-range dependencies of long sequences, which is likely due to its parametric approach that enables Mamba to store information of long sequences. However, Mamba is an autoregressive model, which typically has unidirectionality, such as good temporal properties and causal sequence modeling. Compared to the Transformer [69], it cannot model the relationships between sequence elements. The Transformer has shown strong advantages in various tasks, but it is not good at handling long sequence information. The characteristics of Mamba and Transformer are highly complementary, for which the authors designed the SSFormer (State Space Transformer) block. The Super-Resolution (SR) task [70] is indeed a pixel-intensive task because it aims to recover high-resolution (HR) details from low-resolution (LR) images. In this process, the model needs to perform dense calculations at each pixel point to predict and generate new pixel points in higher resolution images, so modeling the contextual relationship of pixel points in the super-resolution task is more important. Based on this, the authors introduced the SSFormer Block into the super-resolution task and built the MambaSR model. The network structure of MambaSR is based on HAT [11], and the authors replaced all the Hybrid Attention Blocks (HAB) of HAT with SSFormer Blocks, achieving the best performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

我已经开始测试了, 期待我的测试结果 #3

我已经开始测试了, 期待我的测试结果 #3

Hanzc989 commented Apr 23, 2024 •

edited

ming053l commented Apr 25, 2024

wangxinchao-bit commented Apr 26, 2024

FlotingDream commented Apr 28, 2024

我已经开始测试了, 期待我的测试结果 #3

我已经开始测试了, 期待我的测试结果 #3

Comments

Hanzc989 commented Apr 23, 2024 • edited

ming053l commented Apr 25, 2024

wangxinchao-bit commented Apr 26, 2024

FlotingDream commented Apr 28, 2024

Hanzc989 commented Apr 23, 2024 •

edited