Feature Mismatch for Replication #30

kravrolens · 2023-06-13T04:23:43Z

Hi, I have some questions about the network structure of CFW.

As shown in figure 2 in your paper and your code, according to my understanding, you concat two layers' features of the encoder and decoder. The enc_feat (immediate features) sizes are :

# enc_fea[3]: torch.Size([1, 512, 16, 16])
# enc_fea[2]: torch.Size([1, 512, 32, 32])  reserved 1
# enc_fea[1]: torch.Size([1, 256, 64, 64])  reserved 0
# enc_fea[0]: torch.Size([1, 128, 128, 128])

You choose the enc_fea[2] and enc_fea[1]. However the dec_fea sizes are:

# h: torch.Size([1, 512, 64, 64])
# h: torch.Size([1, 512, 128, 128])
# h: torch.Size([1, 256, 256, 256])
# h: torch.Size([1, 128, 512, 512])

It seems that enc_feat and dec_fea can't be concated. Thanks for your help in advance!

The text was updated successfully, but these errors were encountered:

IceClear · 2023-06-13T06:41:10Z

Hi, your understanding is correct, I use the two layers' features.
For your question, you just need to print the intermediate features to check the shape. The code can be successfully run so there should be no bugs.

kravrolens · 2023-06-13T10:24:45Z

As I showed before, enc_fea and dec_fea don't seem to have feature components of the same size. Because the input of the encoder is lq image, the output of the decoder is hq image (4x). There is a 4x gap in the size of the features, can you please elaborate how this is handled? THANKS!

btw, the size of constructed latents input (./CFW_trainingdata/latents/xxx.npy) is (1,4,64,64) before training CFW. Is it right?

kravrolens · 2023-06-13T12:14:21Z

I found my problem. The size of constructed latents input (./CFW_trainingdata/latents/xxx.npy) should be (1,4,64,64).
The size of lq image should be (1, 3, 512, 512).
Sorry to bother you, thank you!

xyIsHere · 2023-07-06T04:02:14Z

As I showed before, enc_fea and dec_fea don't seem to have feature components of the same size. Because the input of the encoder is lq image, the output of the decoder is hq image (4x). There is a 4x gap in the size of the features, can you please elaborate how this is handled? THANKS!

btw, the size of constructed latents input (./CFW_trainingdata/latents/xxx.npy) is (1,4,64,64) before training CFW. Is it right?

Hi @kravrolens, I'm also working on the replication right now. From your issue, I thought you have already get succeed on the first fine-tuning stage. I'm personally get stuck on the fine-tuning stage. The problem is that my fine-tuning results did not show very good generation ability. I tested the provided stablesr model by setting the dec_w to be 0.0 and check its fine-tuning results and the results look much better than mine (as shown in issue #36). Did you have similar problem as me? Thank you so much!

IceClear closed this as completed Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Mismatch for Replication #30

Feature Mismatch for Replication #30

kravrolens commented Jun 13, 2023

IceClear commented Jun 13, 2023

kravrolens commented Jun 13, 2023 •

edited

Loading

kravrolens commented Jun 13, 2023 •

edited

Loading

xyIsHere commented Jul 6, 2023

Feature Mismatch for Replication #30

Feature Mismatch for Replication #30

Comments

kravrolens commented Jun 13, 2023

IceClear commented Jun 13, 2023

kravrolens commented Jun 13, 2023 • edited Loading

kravrolens commented Jun 13, 2023 • edited Loading

xyIsHere commented Jul 6, 2023

kravrolens commented Jun 13, 2023 •

edited

Loading

kravrolens commented Jun 13, 2023 •

edited

Loading