Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Mismatch for Replication #30

Closed
kravrolens opened this issue Jun 13, 2023 · 4 comments
Closed

Feature Mismatch for Replication #30

kravrolens opened this issue Jun 13, 2023 · 4 comments

Comments

@kravrolens
Copy link

Hi, I have some questions about the network structure of CFW.

As shown in figure 2 in your paper and your code, according to my understanding, you concat two layers' features of the encoder and decoder. The enc_feat (immediate features) sizes are :

# enc_fea[3]: torch.Size([1, 512, 16, 16])
# enc_fea[2]: torch.Size([1, 512, 32, 32])  reserved 1
# enc_fea[1]: torch.Size([1, 256, 64, 64])  reserved 0
# enc_fea[0]: torch.Size([1, 128, 128, 128]) 

You choose the enc_fea[2] and enc_fea[1]. However the dec_fea sizes are:

# h: torch.Size([1, 512, 64, 64])
# h: torch.Size([1, 512, 128, 128])
# h: torch.Size([1, 256, 256, 256])
# h: torch.Size([1, 128, 512, 512])

It seems that enc_feat and dec_fea can't be concated. Thanks for your help in advance!

@IceClear
Copy link
Owner

Hi, your understanding is correct, I use the two layers' features.
For your question, you just need to print the intermediate features to check the shape. The code can be successfully run so there should be no bugs.

@kravrolens
Copy link
Author

kravrolens commented Jun 13, 2023

As I showed before, enc_fea and dec_fea don't seem to have feature components of the same size. Because the input of the encoder is lq image, the output of the decoder is hq image (4x). There is a 4x gap in the size of the features, can you please elaborate how this is handled? THANKS!

btw, the size of constructed latents input (./CFW_trainingdata/latents/xxx.npy) is (1,4,64,64) before training CFW. Is it right?

@kravrolens
Copy link
Author

kravrolens commented Jun 13, 2023

I found my problem. The size of constructed latents input (./CFW_trainingdata/latents/xxx.npy) should be (1,4,64,64).
The size of lq image should be (1, 3, 512, 512).
Sorry to bother you, thank you!

@xyIsHere
Copy link

xyIsHere commented Jul 6, 2023

As I showed before, enc_fea and dec_fea don't seem to have feature components of the same size. Because the input of the encoder is lq image, the output of the decoder is hq image (4x). There is a 4x gap in the size of the features, can you please elaborate how this is handled? THANKS!

btw, the size of constructed latents input (./CFW_trainingdata/latents/xxx.npy) is (1,4,64,64) before training CFW. Is it right?

Hi @kravrolens, I'm also working on the replication right now. From your issue, I thought you have already get succeed on the first fine-tuning stage. I'm personally get stuck on the fine-tuning stage. The problem is that my fine-tuning results did not show very good generation ability. I tested the provided stablesr model by setting the dec_w to be 0.0 and check its fine-tuning results and the results look much better than mine (as shown in issue #36). Did you have similar problem as me? Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants