Skip to content
This repository has been archived by the owner on Jan 10, 2023. It is now read-only.

Infogan huge (12g+) model #27

Closed
CrackerHax opened this issue Jun 6, 2019 · 3 comments
Closed

Infogan huge (12g+) model #27

CrackerHax opened this issue Jun 6, 2019 · 3 comments

Comments

@CrackerHax
Copy link

CrackerHax commented Jun 6, 2019

No matter my inputs, InfoGAN produces a huge model (12g+) causing tpu to close its socket. Changing settings (batch size, image size, number of samples, etc) did not seem to help.

Samples used were 256x256 RGB images with 4 labels

From google bucket:
model.ckpt-0.data-00000-of-00001 | 12.02 GB

Log:

I0606 12:01:05.467655 140082657560448 estimator.py:1111] Calling model_fn.

I0606 12:01:05.468298 140082657560448 datasets.py:210] Running with 1 hosts, modifying dataset seed for host 0 to 547.

I0606 12:01:05.468435 140082657560448 datasets.py:311] train_input_fn(): params={'batch_size': 16, 'context': <tensorflow.contrib.tpu.python.tpu.tpu_context.TPUContext object at 0x7f6752a6f5c0>} seed=547

I0606 12:01:05.512352 140082657560448 modular_gan.py:396] _preprocess_fn(): images=Tensor("arg0:0", shape=(256, 256, 3), dtype=float32, device=/job:worker/task:0/device:CPU:0), labels=Tensor("arg1:0", shape=(4,), dtype=int32, device=/job:worker/task:0/device:CPU:0), seed=547

I0606 12:01:05.526810 140082657560448 tpu_random.py:71] Passing random offset: Tensor("Cast:0", shape=(), dtype=int32, device=/job:worker/task:0/device:CPU:0) with data ({'images': <tf.Tensor 'arg1:0' shape=(256, 256, 3) dtype=float32>, 'z': <tf.Tensor 'arg2:0' shape=(14,) dtype=float32>}, <tf.Tensor 'arg3:0' shape=(4,) dtype=int32>).

I0606 12:01:05.617208 140082657560448 modular_gan.py:529] model_fn(): features={'images': <tf.Tensor 'InfeedQueue/dequeue:1' shape=(2, 256, 256, 3) dtype=float32>, 'z': <tf.Tensor 'InfeedQueue/dequeue:2' shape=(2, 14) dtype=float32>, '_RANDOM_OFFSET': <tf.Tensor 'InfeedQueue/dequeue:0' shape=(2,) dtype=int32>}, labels=Tensor("InfeedQueue/dequeue:3", shape=(2, 4), dtype=int32, device=/device:TPU_REPLICATED_CORE:0),mode=train, params={'batch_size': 2, 'use_tpu': True, 'context': <tensorflow.contrib.tpu.python.tpu.tpu_context.TPUContext object at 0x7f67521429e8>}

W0606 12:01:05.617559 140082657560448 modular_gan.py:537] Graph will be unrolled.

...

Generator variables:
+--------------------------+-----------------+-------------+---------+
| Name                     | Shape           | Size        | Type    |
+--------------------------+-----------------+-------------+---------+
| generator/g_fc1/kernel:0 |      (14, 1024) |      14,336 | float32 |
| generator/g_fc1/bias:0   |         (1024,) |       1,024 | float32 |
| generator/g_bn1/gamma:0  |         (1024,) |       1,024 | float32 |
| generator/g_bn1/beta:0   |         (1024,) |       1,024 | float32 |
| generator/g_fc2/kernel:0 |  (1024, 524288) | 536,870,912 | float32 |
| generator/g_fc2/bias:0   |       (524288,) |     524,288 | float32 |
| generator/g_bn2/gamma:0  |       (524288,) |     524,288 | float32 |
| generator/g_bn2/beta:0   |       (524288,) |     524,288 | float32 |
| generator/g_dc3/kernel:0 | (4, 4, 64, 128) |     131,072 | float32 |
| generator/g_dc3/bias:0   |           (64,) |          64 | float32 |
| generator/g_bn3/gamma:0  |           (64,) |          64 | float32 |
| generator/g_bn3/beta:0   |           (64,) |          64 | float32 |
| generator/g_dc4/kernel:0 |   (4, 4, 3, 64) |       3,072 | float32 |
| generator/g_dc4/bias:0   |            (3,) |           3 | float32 |
+--------------------------+-----------------+-------------+---------+
Total: 538,595,523
I0606 12:01:08.306834 140082657560448 utils.py:174] 
Discriminator variables:
+--------------------------------+-----------------+-------------+---------+
| Name                           | Shape           | Size        | Type    |
+--------------------------------+-----------------+-------------+---------+
| discriminator/d_conv1/kernel:0 |   (4, 4, 3, 64) |       3,072 | float32 |
| discriminator/d_conv1/bias:0   |           (64,) |          64 | float32 |
| discriminator/d_conv2/kernel:0 | (4, 4, 64, 128) |     131,072 | float32 |
| discriminator/d_conv2/bias:0   |          (128,) |         128 | float32 |
| discriminator/d_fc3/kernel:0   |  (524288, 1024) | 536,870,912 | float32 |
| discriminator/d_fc3/bias:0     |         (1024,) |       1,024 | float32 |
| discriminator/d_fc4/kernel:0   |       (1024, 1) |       1,024 | float32 |
| discriminator/d_fc4/bias:0     |            (1,) |           1 | float32 |

...

I0606 12:09:19.918299 140082657560448 tpu_estimator.py:504] Init TPU system

I0606 12:09:27.281899 140082657560448 tpu_estimator.py:510] Initialized TPU in 7 seconds

I0606 12:09:27.785771 140081668916992 tpu_estimator.py:463] Starting infeed thread controller.

I0606 12:09:27.786308 140081652131584 tpu_estimator.py:482] Starting outfeed thread controller.

I0606 12:09:27.928345 140082657560448 tpu_estimator.py:536] Enqueue next (1000) batch(es) of data to infeed.

I0606 12:09:27.928739 140082657560448 tpu_estimator.py:540] Dequeue next (1000) batch(es) of data from outfeed.

I0606 12:09:34.653408 140081652131584 error_handling.py:70] Error recorded from outfeed: Socket closed
@CrackerHax CrackerHax changed the title Infogan huge model Infogan huge (12g+) model Jun 6, 2019
@Marvin182
Copy link
Contributor

This is somewhat working as intended. The InfoGAN contains fully connected layers that depend on the image size.
The reference code uses this formula for determining the number of output nodes of the layer:
128 * (h / 4) * (w / 4)
where h and w are the height and width of the image (256 in your case). This results in huge fully connected layers in G and D (536M paramaters each).

If you want to apply the InfoGAN architecture you might want to reduce the multiple (128). The corresponding lines are

net = linear(net, 128 * (h // 4) * (w // 4), scope="g_fc2")
and
net = conv2d(net, 128, 4, 4, 2, 2, name="d_conv2", use_sn=use_sn)

@CrackerHax
Copy link
Author

CrackerHax commented Jun 11, 2019

This line too:
net = tf.reshape(net, [bs, h // 4, w // 4, 128])

Thanks for that. I had to lower it to 32 to keep the TPUs from shutting down and the file sizes are 3gb rather than 12gb.

@Marvin182
Copy link
Contributor

I recommend to use an architecture that is more suited for higher resolutions, ie. BigGAN.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants