Infogan huge (12g+) model #27

CrackerHax · 2019-06-06T12:18:27Z

No matter my inputs, InfoGAN produces a huge model (12g+) causing tpu to close its socket. Changing settings (batch size, image size, number of samples, etc) did not seem to help.

Samples used were 256x256 RGB images with 4 labels

From google bucket:
model.ckpt-0.data-00000-of-00001 | 12.02 GB

Log:

I0606 12:01:05.467655 140082657560448 estimator.py:1111] Calling model_fn.

I0606 12:01:05.468298 140082657560448 datasets.py:210] Running with 1 hosts, modifying dataset seed for host 0 to 547.

I0606 12:01:05.468435 140082657560448 datasets.py:311] train_input_fn(): params={'batch_size': 16, 'context': <tensorflow.contrib.tpu.python.tpu.tpu_context.TPUContext object at 0x7f6752a6f5c0>} seed=547

I0606 12:01:05.512352 140082657560448 modular_gan.py:396] _preprocess_fn(): images=Tensor("arg0:0", shape=(256, 256, 3), dtype=float32, device=/job:worker/task:0/device:CPU:0), labels=Tensor("arg1:0", shape=(4,), dtype=int32, device=/job:worker/task:0/device:CPU:0), seed=547

I0606 12:01:05.526810 140082657560448 tpu_random.py:71] Passing random offset: Tensor("Cast:0", shape=(), dtype=int32, device=/job:worker/task:0/device:CPU:0) with data ({'images': <tf.Tensor 'arg1:0' shape=(256, 256, 3) dtype=float32>, 'z': <tf.Tensor 'arg2:0' shape=(14,) dtype=float32>}, <tf.Tensor 'arg3:0' shape=(4,) dtype=int32>).

I0606 12:01:05.617208 140082657560448 modular_gan.py:529] model_fn(): features={'images': <tf.Tensor 'InfeedQueue/dequeue:1' shape=(2, 256, 256, 3) dtype=float32>, 'z': <tf.Tensor 'InfeedQueue/dequeue:2' shape=(2, 14) dtype=float32>, '_RANDOM_OFFSET': <tf.Tensor 'InfeedQueue/dequeue:0' shape=(2,) dtype=int32>}, labels=Tensor("InfeedQueue/dequeue:3", shape=(2, 4), dtype=int32, device=/device:TPU_REPLICATED_CORE:0),mode=train, params={'batch_size': 2, 'use_tpu': True, 'context': <tensorflow.contrib.tpu.python.tpu.tpu_context.TPUContext object at 0x7f67521429e8>}

W0606 12:01:05.617559 140082657560448 modular_gan.py:537] Graph will be unrolled.

...

Generator variables:
+--------------------------+-----------------+-------------+---------+
| Name                     | Shape           | Size        | Type    |
+--------------------------+-----------------+-------------+---------+
| generator/g_fc1/kernel:0 |      (14, 1024) |      14,336 | float32 |
| generator/g_fc1/bias:0   |         (1024,) |       1,024 | float32 |
| generator/g_bn1/gamma:0  |         (1024,) |       1,024 | float32 |
| generator/g_bn1/beta:0   |         (1024,) |       1,024 | float32 |
| generator/g_fc2/kernel:0 |  (1024, 524288) | 536,870,912 | float32 |
| generator/g_fc2/bias:0   |       (524288,) |     524,288 | float32 |
| generator/g_bn2/gamma:0  |       (524288,) |     524,288 | float32 |
| generator/g_bn2/beta:0   |       (524288,) |     524,288 | float32 |
| generator/g_dc3/kernel:0 | (4, 4, 64, 128) |     131,072 | float32 |
| generator/g_dc3/bias:0   |           (64,) |          64 | float32 |
| generator/g_bn3/gamma:0  |           (64,) |          64 | float32 |
| generator/g_bn3/beta:0   |           (64,) |          64 | float32 |
| generator/g_dc4/kernel:0 |   (4, 4, 3, 64) |       3,072 | float32 |
| generator/g_dc4/bias:0   |            (3,) |           3 | float32 |
+--------------------------+-----------------+-------------+---------+
Total: 538,595,523
I0606 12:01:08.306834 140082657560448 utils.py:174] 
Discriminator variables:
+--------------------------------+-----------------+-------------+---------+
| Name                           | Shape           | Size        | Type    |
+--------------------------------+-----------------+-------------+---------+
| discriminator/d_conv1/kernel:0 |   (4, 4, 3, 64) |       3,072 | float32 |
| discriminator/d_conv1/bias:0   |           (64,) |          64 | float32 |
| discriminator/d_conv2/kernel:0 | (4, 4, 64, 128) |     131,072 | float32 |
| discriminator/d_conv2/bias:0   |          (128,) |         128 | float32 |
| discriminator/d_fc3/kernel:0   |  (524288, 1024) | 536,870,912 | float32 |
| discriminator/d_fc3/bias:0     |         (1024,) |       1,024 | float32 |
| discriminator/d_fc4/kernel:0   |       (1024, 1) |       1,024 | float32 |
| discriminator/d_fc4/bias:0     |            (1,) |           1 | float32 |

...

I0606 12:09:19.918299 140082657560448 tpu_estimator.py:504] Init TPU system

I0606 12:09:27.281899 140082657560448 tpu_estimator.py:510] Initialized TPU in 7 seconds

I0606 12:09:27.785771 140081668916992 tpu_estimator.py:463] Starting infeed thread controller.

I0606 12:09:27.786308 140081652131584 tpu_estimator.py:482] Starting outfeed thread controller.

I0606 12:09:27.928345 140082657560448 tpu_estimator.py:536] Enqueue next (1000) batch(es) of data to infeed.

I0606 12:09:27.928739 140082657560448 tpu_estimator.py:540] Dequeue next (1000) batch(es) of data from outfeed.

I0606 12:09:34.653408 140081652131584 error_handling.py:70] Error recorded from outfeed: Socket closed

The text was updated successfully, but these errors were encountered:

Marvin182 · 2019-06-10T21:10:47Z

This is somewhat working as intended. The InfoGAN contains fully connected layers that depend on the image size.
The reference code uses this formula for determining the number of output nodes of the layer:
128 * (h / 4) * (w / 4)
where h and w are the height and width of the image (256 in your case). This results in huge fully connected layers in G and D (536M paramaters each).

If you want to apply the InfoGAN architecture you might want to reduce the multiple (128). The corresponding lines are

compare_gan/compare_gan/architectures/infogan.py

Line 55 in 3b04f13

net = linear(net, 128 * (h // 4) * (w // 4), scope="g_fc2")

and

compare_gan/compare_gan/architectures/infogan.py

Line 88 in 3b04f13

net = conv2d(net, 128, 4, 4, 2, 2, name="d_conv2", use_sn=use_sn)

CrackerHax · 2019-06-11T08:24:25Z

This line too:
net = tf.reshape(net, [bs, h // 4, w // 4, 128])

Thanks for that. I had to lower it to 32 to keep the TPUs from shutting down and the file sizes are 3gb rather than 12gb.

Marvin182 · 2019-06-18T00:45:47Z

I recommend to use an architecture that is more suited for higher resolutions, ie. BigGAN.

CrackerHax changed the title ~~Infogan huge model~~ Infogan huge (12g+) model Jun 6, 2019

Marvin182 closed this as completed Jun 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infogan huge (12g+) model #27

Infogan huge (12g+) model #27

CrackerHax commented Jun 6, 2019 •

edited

Marvin182 commented Jun 10, 2019

CrackerHax commented Jun 11, 2019 •

edited

Marvin182 commented Jun 18, 2019

Infogan huge (12g+) model #27

Infogan huge (12g+) model #27

Comments

CrackerHax commented Jun 6, 2019 • edited

Marvin182 commented Jun 10, 2019

CrackerHax commented Jun 11, 2019 • edited

Marvin182 commented Jun 18, 2019

CrackerHax commented Jun 6, 2019 •

edited

CrackerHax commented Jun 11, 2019 •

edited