What I'm trying to do is to generate bouding-box(Xmim, Ymin, Xmax, Ymax) rather than image with GAN like the following image.
input |
output |
Let's build our model in the following order.
In GAN, the first net is called Generator Net G(Z) and the second net called Discriminator Net D(X).
The first net generates data, and the second net tries to tell the difference between the real data and the fake data generated by the first net. The second net will output a scalar [0, 1] which represents a probability of real data.
the objective function:
stabilize the training with some architectural constraints.
GAN is hard to train.
Stabilize Generative Adversarial networks with some architectural constraints.\
Architecture guidelines for stable Deep Convolutional GANs
- Replace any pooling layers with strided convolutions (discriminator) and - - fractional-strided convolutions (generator).
- Use batchnorm in both the generator and the discriminator
- Remove fully connected hidden layers for deeper architectures. Just use average pooling at the end.
- Use ReLU activation in generator for all layers except for the output, which uses Tanh.
- Use LeakyReLU activation in the discriminator for all layers.
Use DCGAN to generate images:
The Architecture:
G: fc-->reshape--> deconv(conv_transepose) + bn + relu (4 layers) --> tanh
D: conv + bn + leaky_relu (4 layers) --> reshape --> fc(opn=1) -->sigmoid
Results:
180000 iteration |
190000 iteration |
Apply GAN by adding condition(supervised)
we want to condition those networks with some vector y, the easiest way to do it is to feed y into both networks. Hence, our generator and discriminator are now G(z,y) and D(X,y) respectively.
Now, the objective function is given by:
Use Condition GAN to generate new data samples with certain label in mnist datasets.
the Architecture is:
G: concat(Z,y) -->reshape--> deconv(conv_transepose) + bn + relu (4 layers) --> sigmoid(or tanh)
D: conv_concat(x,y) --> conv + bn + leaky_relu (4 layers) --> reshape --> fc(opn=1) --> sigmoid
the conditional variables is a collection of one-hot vectors.
Results:
constraint Y = 7 |
constraint Y = 3 |
Bacause What I'm trying todo is to generate bouding-box(Xmim, Ymin, Xmax, Ymax) rather than image, the X no longer mean image but should be the bouding-box, Y shoud mean image now.
meanwhile the Generator should not use deconv(conv_transepose) because the output of G is bouding-box.
so the Architecture is:
G: conv_concat(Z,y) --> conv + bn + leaky_relu (4 layers) --> reshape --> fc --> sigmoid(or tanh)
D: conv_concat(x,y) --> conv + bn + leaky_relu (4 layers) --> reshape --> fc(opn=1) --> sigmoid
use Conditional + DCGAN to generate bouding-box with constraint images(Y):
Results:
60000 iteration |
60000 iteration |
Analysis:
It's hard to GAN to learn the relationship between the location of the bouding-box and the image features where we use image as constraint.
so I didn't get the results we expected.
REF:
- Agustinus Kristiadi's Blog
- https://github.com/YadiraF/GAN