Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRGAN - Bypass check_array_lengths(X, Y, W) in training.py for different input and output batch sizes #3940

Closed
titu1994 opened this issue Oct 2, 2016 · 20 comments

Comments

@titu1994
Copy link
Contributor

titu1994 commented Oct 2, 2016

I am trying to implement the SRGAN model from the paper Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. Since this uses both a GAN and VGG Perceptual losses, I am using modified ActivityRegularizers to incorporate the various losses. I am using the latest Theano and Keras, and the model is for Theano only (I'm using Windows, no Tensorflow yet)

The full model architecture and the implementation of the loss are here : SRGAN regularizers gist

A simplified view of the entire model is:
srgan-simple-architecture

In this model, I am passing a 96 x 96 blurred low resolution image (LR) as input to the SR-ResNet network, and 384 x 384 high resolution images (HR) as input 2 and 3 to the Discriminator and VGG networks, as well as the outputs of the SR-ResNet.

The issue is that since original inputs to the SR-ResNet have a batch size of 8, Keras assumes that output batch size should also be 8. However, since we are merging the original high resolution images into both the discriminator and VGG networks, batch size becomes 16.

The reason for adding the original HR input images as Input2 and Input3 is that the ContentVGGRegularizer needs to compare the gram matrix of the HR inputs to the gram matrix of the generated outputs (G(LR)) from the generative model. Also, to train the discriminator network, we require the original images (D(G(LR)).

The error is fairly simple and understandable :

Traceback (most recent call last):
  File "D:/Users/Yue/PycharmProjects/Super Resolution using Generative Adversarial Networks/models.py", line 431, in <module>
    srgan_network.pre_train_network(coco_path, nb_epochs=20)
  File "D:/Users/Yue/PycharmProjects/Super Resolution using Generative Adversarial Networks/models.py", line 310, in pre_train_network
    self._train_network(image_dir, nb_epochs=nb_epochs)
  File "D:/Users/Yue/PycharmProjects/Super Resolution using Generative Adversarial Networks/models.py", line 365, in _train_network
    batch_size=self.batch_size // 2,nb_epoch=1, verbose=0)
  File "D:\Users\Yue\Anaconda3\lib\site-packages\keras-1.1.0-py3.4.egg\keras\engine\training.py", line 1034, in fit
    batch_size=batch_size)
  File "D:\Users\Yue\Anaconda3\lib\site-packages\keras-1.1.0-py3.4.egg\keras\engine\training.py", line 973, in _standardize_user_data
    check_array_lengths(x, y, sample_weights)
  File "D:\Users\Yue\Anaconda3\lib\site-packages\keras-1.1.0-py3.4.egg\keras\engine\training.py", line 177, in check_array_lengths
    str(list(set_y)[0]) + ' target samples.')
Exception: Input arrays should have the same number of samples as target arrays. Found 8 input samples and 16 target samples.

My question is, is there any way to train this network without manually creating the train_function, getting updates, add regularizers and then use the train_function?

Or is there some other way to train such networks without appending the original inputs? I have not yet found a way to mask a portion of the input batch, which could potentially solve this problem (by passing the blurred and HR images to the SR-ResNet input), however this poses another challenge - LR images are 96 x 96, HR images are 384, 384).

Any solutions? I can only think of one right now, which is to duplicate most of the _make_train_function and _fit_loop code to bypass this check.

@titu1994
Copy link
Contributor Author

titu1994 commented Oct 3, 2016

So I finally got the full model to work by duplicating most of the code in _make_train_function , fit and _fit_loop (and commenting out the check_array_lengths function). It compiles fully and can be trained without errors, however the loss function must be a single dummy loss which accepts (y_true, y_pred) and returns a single Keras variable with value = 0. So binary_crossentropy cannot be applied to the discriminator output and must be replaced by the AdversarialLossRegularizer.

Still, training can proceed properly even though the assertion of "Input arrays should have the same number of samples as target arrays. Found 8 input samples and 16 target samples." failed.

However when I begin pre training the generator network with just the generator and the VGG loss, the output generated is extremely pixelated as the below two images show. In fact, just the generator and vgg model combined do not even cause the check_array_lengths error, since I am freezing the VGG weights and training only using the ContentVGGRegularizer and TVRegularizer (using dummy loss variable when compiling so that training is done only via regularizers). The generator is trained on MS COCO dataset (random 50k sample images) instead of the 50k validation set of ImageNet, since I don't have the validation set.

The following two images are downscaled from original size to 32x32 for input to the generator model and 128x128 for validation against the generated results. The generator is pre trained with only VGG 2,2 loss as described in the paper, and upscales the 32x32 input to 128x128. I am using 32x32 as input size instead of 96x96 as my GPU cannot handle such a large model combined with such a large input (For 96x96, the full model contains nearly 303 million parameters).

I can tell that the models are learning something, since the dots are far more pronounced in the first 1000~ validation images, but as the loss drops steadily, the output images become clearer and contain less RGB dots. However even at the end of 50k iterations, the dots do not disappear on either the pre-training network or the full SR + GAN + VGG model.

At Iteration 23300 (Using SR + VGG model, for pre-training) :
iteration_23300

At Iteration 50000 (Using full SR + GAN + VGG model, for final training) :
iteration_50000

As can be seen, the RGB dots form a grid pattern throughout the upscaled image. The same dots appear when the last two Deconvolution layers are replaced with Upscale + Convolution, although such a network produces very poor images. Perhaps the error lies in trying to upscale the original image 4x without providing any information to the network of what data should fill the upsampled space. I'll have to look into the sub-pixel convolution layer mentioned in the paper Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network to see if this can upscale the image properly without producing these RGB grids.

The following images are using UpSampling + Convolution layers to replace the Deconvolution layers + stride 2.

At Iteration 21000 (Using SR + VGG model, for pre-training),
iteration_21000

At Iteration 45000 (Using full SR + GAN + VGG model, for final training),
iteration_45000

Since the original issue is solved simply by commenting out one crucial check, the issue is more or less solved. I understand the need to assert input batch size be same as that of output batch size, but if the model is carefully constructed then the user can still train the model correctly. I would appreciate some better way to do this without such a hackish way of bypassing internal Keras code.

The code to define the generative + VGG model is simple enough, if anyone wishes to test.

class GenerativeNetwork:

    def __init__(self, img_width=32, img_height=32, batch_size=16, small_model=False,
                  tv_weight=2e-8):
        self.img_width = img_width
        self.img_height = img_height
        self.batch_size = batch_size
        self.small_model = small_model

        self.tv_weight = tv_weight

        self.mode = 0

    def create_sr_model(self, ip):

        x = Convolution2D(64, 3, 3, activation='relu', border_mode='same', name='sr_res_conv1')(ip)

        nb_residual = 5 if self.small_model else 15

        for i in range(nb_residual):
            x = self._residual_block(x, i + 1)

        x = Deconvolution2D(64, 3, 3, activation='relu', border_mode='same', subsample=(2, 2), name='sr_res_deconv1',
                            output_shape=(self.batch_size, 64, self.img_width * 2, self.img_height * 2))(x)

        x = Deconvolution2D(64, 3, 3, activation='relu', border_mode='same', subsample=(2, 2), name='sr_res_deconv2',
                            output_shape=(self.batch_size, 64, self.img_width * 4, self.img_height * 4))(x)

        tv_regularizer = TVRegularizer(img_width=self.img_width * 4, img_height=self.img_height * 4,
                                       weight=self.tv_weight)
        x = Convolution2D(3, 3, 3, activation="linear", border_mode='same', activity_regularizer=tv_regularizer,
                          name='sr_res_conv_final')(x)

        return x

    def _residual_block(self, ip, id):
        init = ip

        x = Convolution2D(64, 3, 3, activation='linear', border_mode='same',
                          name='sr_res_conv_' + str(id) + '_1')(ip)
        x = BatchNormalization(axis=1, mode=self.mode, name="sr_res_batchnorm_" + str(id) + "_1")(x)
        x = Activation('relu', name="sr_res_activation_" + str(id) + "_1")(x)

        x = Convolution2D(64, 3, 3, activation='linear', border_mode='same',
                          name='sr_res_conv_' + str(id) + '_2')(x)
        x = BatchNormalization(axis=1, mode=self.mode, name="sr_res_batchnorm_" + str(id) + "_2")(x)

        m = merge([x, init], mode='sum', name="sr_res_merge_" + str(id))
        m = Activation('relu', name="sr_res_activation_" + str(id))(m)

        return m

class VGGNetwork:
    '''
    Helper class to load VGG and its weights to the generator model
    '''

    def __init__(self, img_width=128, img_height=128, vgg_weight=1.0):
        self.img_height = img_height
        self.img_width = img_width
        self.vgg_weight = vgg_weight

        self.vgg_layers = None

    def append_vgg_network(self, x_in, true_X_input, pre_train=False):

        # Append the initial inputs to the outputs of the SRResNet
        x = merge([x_in, true_X_input], mode='concat', concat_axis=0)

        # Normalize the inputs via custom VGG Normalization layer
        x = Normalize(name="normalize_vgg")(x)

        vgg_regularizer = ContentVGGRegularizer(weight=self.vgg_weight)

        # Begin adding the VGG layers
        x = Convolution2D(64, 3, 3, activation='relu', name='vgg_conv1_1', border_mode='same')(x)
        x = Convolution2D(64, 3, 3, activation='relu', name='vgg_conv1_2', border_mode='same')(x)
        x = MaxPooling2D(name='vgg_maxpool1')(x)

        x = Convolution2D(128, 3, 3, activation='relu', name='vgg_conv2_1', border_mode='same')(x)

        if pre_train:
            x = Convolution2D(128, 3, 3, activation='relu', name='vgg_conv2_2', border_mode='same',
                              activity_regularizer=vgg_regularizer)(x)
        else:
            x = Convolution2D(128, 3, 3, activation='relu', name='vgg_conv2_2', border_mode='same')(x)
        x = MaxPooling2D(name='vgg_maxpool2')(x)

        x = Convolution2D(256, 3, 3, activation='relu', name='vgg_conv3_1', border_mode='same')(x)
        x = Convolution2D(256, 3, 3, activation='relu', name='vgg_conv3_2', border_mode='same')(x)
        x = Convolution2D(256, 3, 3, activation='relu', name='vgg_conv3_3', border_mode='same')(x)
        x = MaxPooling2D(name='vgg_maxpool3')(x)

        x = Convolution2D(512, 3, 3, activation='relu', name='vgg_conv4_1', border_mode='same')(x)
        x = Convolution2D(512, 3, 3, activation='relu', name='vgg_conv4_2', border_mode='same')(x)
        x = Convolution2D(512, 3, 3, activation='relu', name='vgg_conv4_3', border_mode='same')(x)
        x = MaxPooling2D(name='vgg_maxpool4')(x)

        x = Convolution2D(512, 3, 3, activation='relu', name='vgg_conv5_1', border_mode='same')(x)
        x = Convolution2D(512, 3, 3, activation='relu', name='vgg_conv5_2', border_mode='same')(x)

        if not pre_train:
            x = Convolution2D(512, 3, 3, activation='relu', name='vgg_conv5_3', border_mode='same',
                          activity_regularizer=vgg_regularizer)(x)
        else:
            x = Convolution2D(512, 3, 3, activation='relu', name='vgg_conv5_3', border_mode='same')(x)
        x = MaxPooling2D(name='vgg_maxpool5')(x)

        return x

    def load_vgg_weight(self, model):
        # Loading VGG 16 weights
        weights_name = "vgg16_weights_th_dim_ordering_th_kernels_notop.h5"
        weights_path = THEANO_WEIGHTS_PATH_NO_TOP

        weights = get_file(weights_name, weights_path, cache_subdir='models')
        f = h5py.File(weights)

        layer_names = [name for name in f.attrs['layer_names']]

        if self.vgg_layers is None:
            self.vgg_layers = [layer for layer in model.layers
                               if 'vgg_' in layer.name]

        for i, layer in enumerate(self.vgg_layers):
            g = f[layer_names[i]]
            weights = [g[name] for name in g.attrs['weight_names']]
            layer.set_weights(weights)
        print('VGG Model weights loaded.')

        # Freeze all VGG layers
        for layer in self.vgg_layers:
            layer.trainable = False

        return model

# Code to create the combined model
dummy_output =  K.variable(0.0)

def dummy_loss(y_true, y_pred):
        return dummy_output 

ip = Input(shape=(3, img_width, img_height), name='x_generator')
ip_vgg = Input(shape=(3, img_width * 4, img_height * 4), name='x_vgg')  # Actual X images

generative_network = GenerativeNetwork()
vgg_network = VGGNetwork()

sr_output = generative_network.create_sr_model(ip)
generative_model_ = Model(ip, sr_output)

vgg_output = vgg_network.append_vgg_network(sr_output, ip_vgg, pre_train=True)

pretrain_model = Model(input=[ip, ip_vgg], output=vgg_output)

vgg_network.load_vgg_weight(pretrain_model )

pretrain_optimizer = Adam(lr=1e-4)
pretrain_model .compile(pretrain_optimizer, dummy_loss)

EDIT: Didn't mean to close the issue. I am still looking at better ways to train such models without such a hackish way to get around Keras internal code.

@titu1994 titu1994 closed this as completed Oct 3, 2016
@titu1994 titu1994 reopened this Oct 3, 2016
@titu1994 titu1994 changed the title Bypass check_array_lengths(X, Y, W) in training.py for different input and output batch sizes SRGAN - Bypass check_array_lengths(X, Y, W) in training.py for different input and output batch sizes Oct 3, 2016
@titu1994
Copy link
Contributor Author

titu1994 commented Oct 5, 2016

I found a bug in my normalization code so I've fixed it. The images are still pixelated in grids, however it is no longer a RGB grid but instead a pale grid of 4 colors. Sample images are below

Iteration 25000 (SR + VGG)
iteration-25000

Iteration 46500 (SR + VGG)
iteration-46500

I believe this image distortion is mainly due to the network not having any idea of what it should fill in the upscaled space. To test this idea, I changed the architecture to become more like an auto encoder, providing blurred and distorted (affine transformations) inputs of size 128x128 and passed high resolution images to the Input 3 (VGG network) to pretrain. The results are significantly better, as can be seen below. The model uses only the VGG 2,2 regularizer loss to train, by passing a dummy loss function to the optimizer.

Iteration 25000 (SR + VGG)
iteration-25000-ae

Iteration 49000 (SR + VGG)
iteration-49000-ae

This suggests that the error is no longer in my normalization layer, but in the SRGAN model itself. In hindsight, I am just wondering how the twitter team managed to avoid this pixel grid problem without passing any information of what the upscaled output should look like. They may have used a custom Sub-Pixel convolution layer which up samples the previous layer using convolutional filter information, which is what I will be trying to implement next as well.

I am facing another challenge in training the Discriminator model. Since I cant use binary crossentropy to train the model (It says input shape mismatch - input batch size was 16 but output on that axis is 32, due to the merging of the high resolution inputs (Input 2 in the above images)), I had to resort to pre-training the discriminator model separately and then load the weights carefully into the SRGAN model.

This is not ideal, as GANs must alternate updates between discriminator and generator, and due to the input mismatch error, I cannot continue training the discriminator network when training the full SRGAN model (SR + GAN + VGG).

@titu1994
Copy link
Contributor Author

titu1994 commented Oct 7, 2016

Seems that 4x upscaling causes patches of the same color to repeat in a grid formation throughout the upscaled image space. The below grid is repeated after nearly 500k iterations, and VGG 2,2 loss does not go down further.

Iteration 500,000 (image patch magnified 32 times to clearly see the repeating pattern)
capture

Seems like we need a way to add Model to another Model, just like we can add two Sequential models on a 3rd Sequential model to train the GAN. All GAN examples in keras are forced to use Sequential Generator and Sequential Discriminator models, as we don't (as of yet) have a way to merge multiple Models together into a single Model.

@LLCF
Copy link

LLCF commented Oct 30, 2016

Hi, @titu1994
I trained the SRGAN and I alse got the images like this.
epoch_1_iteration_44032_num_31_generatedepoch_1_iteration_44032_num_31_real_

I don't understand the sentesnce.

To test this idea, I changed the architecture to become more like an auto encoder, providing blurred and distorted (affine transformations) inputs of size 128x128 and passed high resolution images to the Input 3 (VGG network) to pretrain.

Could you tell me the detail ?

Thanks a lot.

@titu1994
Copy link
Contributor Author

titu1994 commented Oct 30, 2016

@LLCF The Auto Encoder model is not in the paper. That was the reason I didn't expand on that point. It was merely to test the theory that the model implementation and learning with VGG loss alone was correct.

In any case, the auto encoder style model accepts a 128x128 image, downscales it 4x to 32x32, has multiple residual blocks and then upscales the image to 128x128 again, using deep skip connections to speed up training.

The full architecture is as below (It's huge so please zoom in to see the sizes) :
srgan

@keeper121
Copy link

Hi, @titu1994

May be this link can be useful for the pixel grid problem http://distill.pub/2016/deconv-checkerboard/.

Some questions, do you train the net with total loss for generator like in paper (vgg loss + adv loss + tv loss)? How much every part influence on the total loss? How you balancing the GAN game (discriminator - generator contest)?.

@titu1994
Copy link
Contributor Author

@keeper121 Thanks for the link. I did check it out before but was wondering how to implement linear up sampling in Keras or if it's already implemented in the current UpSampling2D layer.

I have not been able to train the full network yet sadly. It keeps giving some Python recursion error. If it could run, it would use the parameters from the paper.

@Neltherion
Copy link

@titu1994
Hi... I just wanted to find out if you have made any progress since finding out that DeConvolutions are resulting in blurry images?
it seems this paper hasn't been implemented in Keras by anyone else and it'd be really nice if you could finish it completely...

@titu1994
Copy link
Contributor Author

titu1994 commented Dec 5, 2016

@Neltherion Well I was able to implement the SubPixelConvolution layer in Keras (at least for Theano, haven't tried the tensorflow implementation yet). That did result in slightly less 'grid' deformations in the images.

However, there seems to be a simple solution posted in the above link, which is to use Nearest Neighbour Resize convolution, though I don't know how to implement that in Keras. I don't know if the Upscaling layer in Keras is equivalent to Nearest Neighbour resize convolution, since the results I got from it were completely unusable.

As to what I was able to complete : the generator and discriminator are now pre-trainable. In the sense that they can be trained separately. However upon merging the Discriminator, Generator and VGG models into one, it causes a Theano gradient recursion error. I did try setting the python recursion limit to 20k and above, but I don't have that much RAM in my laptop to handle that. I haven't been able to move past that, since I am currently preparing for my Masters course at UIC.

I plan to work on it and fix it when I have a little more time but for now I am using the Theano implementation of the paper - Neural Enhance.

@Neltherion
Copy link

Thanks for the quick reply... I hope you'd find the time to finish this, I learned some useful things from your implementation... God speed...

@titu1994
Copy link
Contributor Author

titu1994 commented Dec 11, 2016

@Neltherion With a little help, I was able to get the full SRGAN (Generator + Discriminator + VGG) Network to train. However I am getting very erratic results from it :

I ran the code for several thousand iterations. Notice the iteration number and the losses -

Around the 3670 mark
Iter : 3670 / 80000 | Improvement : 0.00 percent | Time required : 0.19 seconds | Discriminator Loss : 0.002 | Generative Loss : 2289353.000

Sudden hiccup in training at the 3754 mark :
Iter : 3754 / 80000 | Improvement : -530237.69 percent | Time required : 0.19 seconds | Discriminator Loss : 7.973 | Generative Loss : 455776.750

Another hiccup at the 3762 mark :
Iter : 3762 / 80000 | Improvement : -101.09 percent | Time required : 0.19 seconds | Discriminator Loss : 16.032 | Generative Loss : 2113836.500

Sudden improvement at the 4147 mark:
Iter : 4147 / 80000 | Improvement : 50.27 percent | Time required : 0.19 seconds | Discriminator Loss : 7.973 | Generative Loss : 2728298.500

And then a constant discriminator loss for the next 4000 iterations :
Iter : 8317 / 80000 | Improvement : 0.00 percent | Time required : 0.19 seconds | Discriminator Loss : 7.973 | Generative Loss : 2577019.500

This is basically an implementation issue, I think. I have seen this happen when discriminator overpowers the generator and vice versa. Or it may be an incorrect implementation of how the weights are updated. Please update your copy to the latest commit (71) as it fixed the training bug.

I hope to have some more time to fix these bugs. In any case, the original issue is closed a long time ago. The fact that the entire SRGAN model trains by bypassing the check means there is no point in keeping the issue open.

@Neltherion
Copy link

Actually, I have seen this 7.97x number before when training GANs... I think it's just as you said and it happens when some kind of tug of war starts between the Discriminator and the Generator and one manages to overcome the other... Please update this thread if you found any useful info on how to overcome this problem...

Here's the output of a GAN I recently implemented and the overpowering can be seen (just like yours):
capture

@titu1994
Copy link
Contributor Author

I will update as soon as I find some useful solution. Probably I will take a good look at the Theano implementation by jcjohnson to see how he overcame this issue.

I barely have a few weeks before my move so I will try to find some time in the middle to solve this. I don't like keeping bugs around for long either.

@Neltherion
Copy link

@titu1994 By the way, could you please elaborate on how you replaced Deconvolution2D layers with something that wouldn't result in artifacts (as you mentioned above) in Keras?
aside from the artifacts you mentioned, Deconvolution2D seems to have some other problems in Keras/Theano, one of which is that you have to use the same number of Deconvolution2D filters if you want to use them sequentially (as you yourself mentioned here)

@titu1994
Copy link
Contributor Author

I replaced Deconvolution layers with SubPixelConvolution layers instead which I have implemented in Theano. It works, and shape inference is a definite plus, but while the results are better than the Deconvolution layers, I can still see grid like structures.

I will be studying how to implement Resize Convolution layers as mentioned in the blog article which may fix the artifacts altogether.

For now, SubPixelConvolution layers are a good enough substitute to Deconvolution layers with stride 2 or more. The boundary artifacts are only visible when attempting am upsampling task such as SRGAN. There is no such defects when training an auto encoder style JPEG artifact remover or sharpening network (which I have tried and tested to see if the SubPixelConvolution layers was working correctly)

@titu1994
Copy link
Contributor Author

titu1994 commented Dec 24, 2016

@Neltherion I have pushed a few commits to correct the discriminator training.

Changes were :

  • Use soft labels [(0.0 - 0.3) for 0 and (0.7 - 1.2) for 1] instead of binary [0-1] classifications
  • Randomly give wrong classifications (invert the labels) (probability = 10%)
  • Scale results from [0, 255] to [-1, +1] (earlier it was [0, 255] to [0, 1])
  • Increase Adversarial loss to 1 instead of 0.001
  • Increase Batch size for training the discriminator to 16 instead of 1. This was the major change which corrected the discriminator training.

All of these changes now allow the discriminator to learn properly, having a loss value between 3-6 (due to adversarial loss regularizer) and accuracy between 0.7 to 0.88 during training the discriminator.

Also, I have switched to using the Keras Upsampling layer instead of SubPixelConvolution layer. The Keras UpSampling2D layer is equivalent to the Nearest Neighbour Resize Convolution mentioned in http://distill.pub/2016/deconv-checkerboard/ and I can see that the results are significantly better than before, but still not very good.

I haven't had time to check the training of the full model. Will do so in the coming weeks.

@Neltherion
Copy link

@titu1994
Great! I love it how you are committed to make this work... did you by any chance have a look at this article which is from NIPS 2016 and is about training GANs?! or did you come up with the modifications all by yourself? either way I'll be using those modifications in the future for my projects...

By the way, why do you think the batch size affected the training? was it perhaps stuck in some local optima because of 1-batch-size batches? some papers claim they had to reduce batch size for the GAN to work while others (such as yourself) say they've seen improvements when increasing the batch size...

anyway, Thanks for the update!

@titu1994
Copy link
Contributor Author

titu1994 commented Dec 24, 2016

@Neltherion Ah that link is useful. I didn't know about it, but did watch many of the videos of NIPS 2016 on facebook and several slides which discussed how training methods of GANs can be changed to improve performance.

I believe the soft label trick was from a paper that I read Improved Techniques for Training GANs. Flipping labels was mentioned in some slide about GANs in NIPS2016. Although now that I have read the post, I have to read up on how I can modify the loss function to max log D. I simply ported the Theano code from Neural Enhance for the adversarial loss. Will have to test how the new loss function helps in training. I did try minibatch discrimination on another project, but found it to harm the results than to help it (similar to what the post says).

Normalization to [-1, 1] makes sense since we are using tanh activation to squash the activations to [-1, 1] and then denormalizing to [0, 255] in the denormalization layer for an image upsampling task. Never thought that it was a worthwhile insight :P

I set up the training of the Discriminator so that each batch would have all positive or all negative samples. Simple batch size trick I suppose but it's nice to know it is a formal method to improve performance.

The leaky relu trick was in the original paper about SRGAN so I knew it was tested and important. I rarely use MaxPooling anymore, convolution with stride 2 is much more efficient in my opinion. The post suggests using Deconvolutions with stride 2 and SubPixelConvolution upscaling, but they cause grid effects so I guess it doesn't apply to image upscaling tasks.

Thankfully the new upsampling function works properly, but has a chance to create completely solid colors for every image and get stuck at that for the generator. That is the reason that I have provided a pretrained SRGAN model, which has been trained on 80k images with the new upsampling function to avoid a bad initialization.

I think batch size plays a huge role in training GANs. I simply could not get any GAN I implement to work with batch size < 8. Maybe it is the design of the discriminator, or some error in my code. In my experience, I simply train the discriminator with a batch size of 16, and then train the full model with any batch size (even 1).

This may be due to the fact that with batch size = 1, Batch Normalization acts like Instance normalization. This may cause the discriminator to get stuck at local minimas, and always have similar loss values (7.093~8.192)

@mnill
Copy link

mnill commented Jan 11, 2017

@titu1994 can u please explain u implementation of resize-conv?
Is it just UpSampling2D and then conv2d with 3x3 filters and stride 1 ?

@dwari
Copy link

dwari commented Jan 27, 2017

When I run the code I get the following error "TypeError: standardize_input_data() got an unexpected keyword argument 'check_batch_dim'" and the errors are in
File "models.py", line 795, in
srgan_network.train_full_model(coco_path, nb_images=1000, nb_epochs=2)
File "models.py", line 463, in train_full_model
self._train_model(image_dir, nb_images, nb_epochs)
File "models.py", line 698, in _train_model
batch_size=self.batch_size, nb_epoch=1, verbose=0)
File "keras_ops.py", line 129, in fit
batch_size=batch_size)
File "keras_ops.py", line 42, in _standardize_user_data
exception_prefix='model input')

Can anyone please help in resolving this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants