Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MobileViT #54

Closed
KyloRen1 opened this issue Apr 24, 2022 · 10 comments
Closed

MobileViT #54

KyloRen1 opened this issue Apr 24, 2022 · 10 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@KyloRen1
Copy link

Tried to run MobileViT_S model with input shape 256, 256, 3 and got the following error

UnimplementedError Traceback (most recent call last)
in ()
2
3 history = model.fit(get_training_dataset_with_oversample(repeat_dataset=True, oversample=True), steps_per_epoch=STEPS_PER_EPOCH, epochs=EPOCHS,
----> 4 validation_data=get_validation_dataset(), validation_steps=VALIDATION_STEPS)
5

1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in _numpy(self)
1189 return self._numpy_internal()
1190 except core._NotOkStatusException as e: # pylint: disable=protected-access
-> 1191 raise core._status_to_exception(e) from None # pylint: disable=protected-access
1192
1193 @Property

UnimplementedError: 9 root error(s) found.
(0) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"}
[[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]]
(1) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"}
[[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]]
[[while/body/_1/while/strided_slice_35/_445]]
(2) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"}
[[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]]
[[while/body/_1/while/strided_slice_23/_381]]
(3) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"}
[[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]]
[[while/body/_1/while/Pad_8/_407]]
(4) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"}
[[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]]
[[while/body/_1/while/Maximum_2/y/_341]]
(5) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f3 ... [truncated]

@leondgarse
Copy link
Owner

leondgarse commented Apr 25, 2022

  • I just took a test using colab: kecam_test.ipynb. It's working fine in this basic test. Which version of tensorflow are you using?
  • Seems it's tf.reshape complaining doing too much work in a single call, like mobilevit.py#L69. May try if using 2 lines for those reshapes helps in your case, like:
    - nn = tf.reshape(nn, [-1, patch_hh, patch_size * patch_size, patch_ww * channel])  # [batch, patch_hh, h_patch_size * w_patch_size, patch_ww * channel]
    + nn = tf.reshape(nn, [-1, patch_hh, patch_size, patch_size, patch_ww, channel])  # splitted
    + nn = tf.reshape(nn, [-1, patch_hh, patch_size * patch_size, patch_ww * channel])  # combined

@vecxoz
Copy link

vecxoz commented Apr 25, 2022

I saw similar errors on TPU because it does not support dynamic shapes.
One possible mitigation is to use drop_remainder=True in all datasets.

https://www.tensorflow.org/api_docs/python/tf/data/Dataset#batch

@vecxoz
Copy link

vecxoz commented Apr 25, 2022

Just a follow up note. When trying to predict a test set on TPU and its size is not divisible by batch size, use of drop_remainder=True will result in incomplete prediction. GPU will do the job.

@leondgarse
Copy link
Owner

leondgarse commented Apr 25, 2022

Yes, you are right, I can reproduce it using TPU. Testing result updated in above kecam_test.ipynb.

  • For drop_remainder=True. I think it's not an issue, as model real accuracy without drop_remainder can be tested after training is done, by using GPU.
  • In my tests, attention models like mobilevit / coatnet / swinv2 in this package cannot use bfloat16, saving will throw error, still don't know how to fix. Have to use ! python train_script.py -m ... --TPU --disable_float16 in my testing.

@leondgarse
Copy link
Owner

Model saving in bfloat16 precission also works now. Testing results updated in above kecam_test.ipynb.

@KyloRen1
Copy link
Author

I have tried adding drop_remainder=True but the error remains

@leondgarse
Copy link
Owner

I've added a part TPU training test in above colab kecam_test.ipynb, that using some fake data reproducing this, without using train_script.py. Is it possible for you making something similar for replicating? Like dataset / loss usage, digging out what actually causing this.

@leondgarse
Copy link
Owner

@KyloRen1 Have you tried this recently? Still waiting for your response.

@leondgarse
Copy link
Owner

Please reopen if issue still exists.

@sayannath
Copy link

How to serialize the model in saved_model format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants