Issue learning latent encoding for new faces #21

njordsir · 2019-10-12T09:59:49Z

I am trying to derive latent encodings for cutom faces, as done in https://github.com/Puzer/stylegan-encoder.

Here are the details after porting the same to pytorch:

from models.stylegan_generator import StyleGANGenerator

#load the pre-trained synthesis network
m_synth = StyleGANGenerator("stylegan_ffhq").model.synthesis.cuda().eval()

#process the output of the synthesis module
class PostProcAfterSynth(nn.Module):
    def __init__(self):
        super(PostProcAfterSynth, self).__init__()
    def forward(self, gen_img):
        #remap to [0,1]
        return (gen_img+1)/2
    
post_proc_layer = PostProcAfterSynth()

#preprocess the generated image before feeding into perceptual model    
class PreProcBeforePerception(nn.Module):
    def __init__(self, img_size):
        super(PreProcBeforePerception, self).__init__()
        self.img_size = img_size
        self.mean = torch.tensor([0.485, 0.456, 0.406], device=device).view(-1, 1, 1)
        self.std = torch.tensor([0.229, 0.224, 0.225], device=device).view(-1, 1, 1)
    def forward(self, gen_img):
        #resize input image
        gen_img = F.adaptive_avg_pool2d(gen_img, self.img_size)
        #normalize
        gen_img = (gen_img - self.mean) / self.std
        return gen_img
    
pre_proc_layer = PreProcBeforePerception(img_size=256)

#use pre-trained vgg model for feature extraction
m_vgg = models.vgg16(pretrained=True).features[:16].to(device).eval()

#set up the model
model = nn.Sequential(m_synth)
model.add_module(str(1), post_proc_layer)
model.add_module(str(2), pre_proc_layer)
model.add_module(str(3), m_vgg)

for param in model.parameters():
    param.requires_grad_(False)

print(m_vgg)

Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace)
  (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (6): ReLU(inplace)
  (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (8): ReLU(inplace)
  (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (11): ReLU(inplace)
  (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (13): ReLU(inplace)
  (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (15): ReLU(inplace)
)

As done by Puzer, I select the [conv->conv->pool->conv->conv->pool->conv->conv->conv] section of the vgg network for feature extraction.

Pre-computing the features for the reference image:

ref_img_path = "."
ref_img = np.array(Image.open(ref_img_path))
ref_img = ref_img.astype(np.float32)/255.
ref_img = np.array([np.transpose(ref_img, (2,0,1))])
ref_img = torch.tensor(ref_img, device=device)
ref_img = pre_proc_layer(ref_img)
ref_img_features = m_vgg(ref_img).detach()

Optimization:

trainable_latent = torch.randn((1,18,512), device=device).requires_grad_(True)
loss_func = torch.nn.MSELoss()

optimizer = optim.SGD([trainable_latent], lr=0.5)

losses = []
for i in tqdm(range(1000)):
    optimizer.zero_grad()
    gen_img_features = model(trainable_latent)
    loss = loss_func(gen_img_features, ref_img_features)
    loss_val = loss.data.cpu()
    losses.append(loss_val)
    loss.backward()
    optimizer.step()

The latent encoding and subsequent generated images are of a poor quality. The results are nowhere near as crisp as that by Puzer.

What I have tried:

Learning Z space latent instead of WP+
Variety of optimizers, learning rate, iterations combos

What could be wrong:

There might be issues with my pipeline above (new to pytorch)
There might be some difference in pre-trained vgg networks for pytorch and keras, that I might have failed to take into account.
The perceptual model used is not complex enough. (but it does work for Puzer)

Any help with the above would be much appreciated.

The text was updated successfully, but these errors were encountered:

ShenYujun · 2019-10-13T12:46:51Z

You can try to extract VGG features from a fixed input image using both stylegan-encoder and your own pytorch version to check whether these two tools give same output.

Also, does the loss descend normally during the optimization procedure?

njordsir · 2019-10-13T15:11:07Z

Original:

Learnt and generated with stylegan-encoder:

Learnt and generated with code above:

The loss does reduce but stabilizes early. The comparison above is with SGD optimizer and learning rate = 1. Other optimizers and lr give similar or worse results.

Maybe this has something to do with differences in optimizer implementations for pytorch and tensorflow/keras and this is just an issue of finding the right hyperparamters to train, but I have had no luck so far.

ShenYujun · 2019-10-14T14:34:35Z

The loss value from top and bottom figures are clearly different. Can you test whether VGG models from tensorflow/pytorch version give same response to same image? I suggest taking this test as the first step of debugging.

ShenYujun · 2019-12-13T04:43:17Z

We will support the inversion function in the future version soon. Close this issue for now.

Voyz · 2020-01-16T13:08:30Z

Hi @ShenYujun - is there any indication as to when the inversion function will be made public? We await it with anticipation!

ShenYujun · 2020-01-18T02:57:02Z

@Voyz Yes, the code will be public for sure. For now, we still have some work in submission, but a more powerful GAN-related toolkit is coming soon!!

Voyz · 2020-01-24T14:09:18Z

@ShenYujun That's absolutely wonderful news, thanks! Out of interest, would you be able to give an approximate release date?

ShenYujun · 2020-01-24T22:55:03Z

@Voyz We may release the code in March. Thanks for your interest and patience.

Voyz · 2020-01-25T13:20:47Z

@ShenYujun Thank you, appreciate the reply. We truly admire your work, massive kudos for what you've achieved so far! Looking forward to seeing more!

ShenYujun closed this as completed Dec 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue learning latent encoding for new faces #21

Issue learning latent encoding for new faces #21

njordsir commented Oct 12, 2019 •

edited

ShenYujun commented Oct 13, 2019

njordsir commented Oct 13, 2019

ShenYujun commented Oct 14, 2019

ShenYujun commented Dec 13, 2019

Voyz commented Jan 16, 2020

ShenYujun commented Jan 18, 2020

Voyz commented Jan 24, 2020

ShenYujun commented Jan 24, 2020

Voyz commented Jan 25, 2020

Issue learning latent encoding for new faces #21

Issue learning latent encoding for new faces #21

Comments

njordsir commented Oct 12, 2019 • edited

ShenYujun commented Oct 13, 2019

njordsir commented Oct 13, 2019

ShenYujun commented Oct 14, 2019

ShenYujun commented Dec 13, 2019

Voyz commented Jan 16, 2020

ShenYujun commented Jan 18, 2020

Voyz commented Jan 24, 2020

ShenYujun commented Jan 24, 2020

Voyz commented Jan 25, 2020

njordsir commented Oct 12, 2019 •

edited