Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue learning latent encoding for new faces #21

Closed
njordsir opened this issue Oct 12, 2019 · 9 comments
Closed

Issue learning latent encoding for new faces #21

njordsir opened this issue Oct 12, 2019 · 9 comments

Comments

@njordsir
Copy link

njordsir commented Oct 12, 2019

I am trying to derive latent encodings for cutom faces, as done in https://github.com/Puzer/stylegan-encoder.

Here are the details after porting the same to pytorch:

from models.stylegan_generator import StyleGANGenerator

#load the pre-trained synthesis network
m_synth = StyleGANGenerator("stylegan_ffhq").model.synthesis.cuda().eval()

#process the output of the synthesis module
class PostProcAfterSynth(nn.Module):
    def __init__(self):
        super(PostProcAfterSynth, self).__init__()
    def forward(self, gen_img):
        #remap to [0,1]
        return (gen_img+1)/2
    
post_proc_layer = PostProcAfterSynth()

#preprocess the generated image before feeding into perceptual model    
class PreProcBeforePerception(nn.Module):
    def __init__(self, img_size):
        super(PreProcBeforePerception, self).__init__()
        self.img_size = img_size
        self.mean = torch.tensor([0.485, 0.456, 0.406], device=device).view(-1, 1, 1)
        self.std = torch.tensor([0.229, 0.224, 0.225], device=device).view(-1, 1, 1)
    def forward(self, gen_img):
        #resize input image
        gen_img = F.adaptive_avg_pool2d(gen_img, self.img_size)
        #normalize
        gen_img = (gen_img - self.mean) / self.std
        return gen_img
    
pre_proc_layer = PreProcBeforePerception(img_size=256)

#use pre-trained vgg model for feature extraction
m_vgg = models.vgg16(pretrained=True).features[:16].to(device).eval()

#set up the model
model = nn.Sequential(m_synth)
model.add_module(str(1), post_proc_layer)
model.add_module(str(2), pre_proc_layer)
model.add_module(str(3), m_vgg)

for param in model.parameters():
    param.requires_grad_(False)

print(m_vgg)

Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace)
  (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (6): ReLU(inplace)
  (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (8): ReLU(inplace)
  (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (11): ReLU(inplace)
  (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (13): ReLU(inplace)
  (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (15): ReLU(inplace)
)

As done by Puzer, I select the [conv->conv->pool->conv->conv->pool->conv->conv->conv] section of the vgg network for feature extraction.

Pre-computing the features for the reference image:

ref_img_path = "."
ref_img = np.array(Image.open(ref_img_path))
ref_img = ref_img.astype(np.float32)/255.
ref_img = np.array([np.transpose(ref_img, (2,0,1))])
ref_img = torch.tensor(ref_img, device=device)
ref_img = pre_proc_layer(ref_img)
ref_img_features = m_vgg(ref_img).detach()

Optimization:

trainable_latent = torch.randn((1,18,512), device=device).requires_grad_(True)
loss_func = torch.nn.MSELoss()

optimizer = optim.SGD([trainable_latent], lr=0.5)

losses = []
for i in tqdm(range(1000)):
    optimizer.zero_grad()
    gen_img_features = model(trainable_latent)
    loss = loss_func(gen_img_features, ref_img_features)
    loss_val = loss.data.cpu()
    losses.append(loss_val)
    loss.backward()
    optimizer.step()

The latent encoding and subsequent generated images are of a poor quality. The results are nowhere near as crisp as that by Puzer.

What I have tried:

  1. Learning Z space latent instead of WP+
  2. Variety of optimizers, learning rate, iterations combos

What could be wrong:

  1. There might be issues with my pipeline above (new to pytorch)
  2. There might be some difference in pre-trained vgg networks for pytorch and keras, that I might have failed to take into account.
  3. The perceptual model used is not complex enough. (but it does work for Puzer)

Any help with the above would be much appreciated.

@ShenYujun
Copy link
Collaborator

You can try to extract VGG features from a fixed input image using both stylegan-encoder and your own pytorch version to check whether these two tools give same output.

Also, does the loss descend normally during the optimization procedure?

@njordsir
Copy link
Author

Original:

Learnt and generated with stylegan-encoder:

Learnt and generated with code above:

The loss does reduce but stabilizes early. The comparison above is with SGD optimizer and learning rate = 1. Other optimizers and lr give similar or worse results.

Maybe this has something to do with differences in optimizer implementations for pytorch and tensorflow/keras and this is just an issue of finding the right hyperparamters to train, but I have had no luck so far.

@ShenYujun
Copy link
Collaborator

The loss value from top and bottom figures are clearly different. Can you test whether VGG models from tensorflow/pytorch version give same response to same image? I suggest taking this test as the first step of debugging.

@ShenYujun
Copy link
Collaborator

We will support the inversion function in the future version soon. Close this issue for now.

@Voyz
Copy link

Voyz commented Jan 16, 2020

Hi @ShenYujun - is there any indication as to when the inversion function will be made public? We await it with anticipation!

@ShenYujun
Copy link
Collaborator

@Voyz Yes, the code will be public for sure. For now, we still have some work in submission, but a more powerful GAN-related toolkit is coming soon!!

@Voyz
Copy link

Voyz commented Jan 24, 2020

@ShenYujun That's absolutely wonderful news, thanks! Out of interest, would you be able to give an approximate release date?

@ShenYujun
Copy link
Collaborator

@Voyz We may release the code in March. Thanks for your interest and patience.

@Voyz
Copy link

Voyz commented Jan 25, 2020

@ShenYujun Thank you, appreciate the reply. We truly admire your work, massive kudos for what you've achieved so far! Looking forward to seeing more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants