Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about exporting an integer-only MobileBERT to TF-Lite format. #325

Closed
nadongguri opened this issue Jul 15, 2020 · 13 comments
Closed

Comments

@nadongguri
Copy link

Hi, I'm trying to export a mobilebert model to tflite format.

Environment
Docker (tensorflow/tensorflow:1.15.0-gpu-py3) image
V100 16GB

As guided in README.md., I followed "Run Quantization-aware-training with Squad" then "Export an integer-only MobileBERT to TF-Lite format." However, I got an error while converting to quantized tflite model.

2020-07-15 10:26:10.934857: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: graph_to_optimize
2020-07-15 10:26:10.934903: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] constant_folding: Graph size after: 4461 nodes (-1120), 4701 edges (-1124), time = 779.203ms.
2020-07-15 10:26:10.934931: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] constant_folding: Graph size after: 4461 nodes (0), 4701 edges (0), time = 374.792ms.
Traceback (most recent call last):
File "run_squad.py", line 1517, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "run_squad.py", line 1508, in main
tflite_model = converter.convert()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/python/lite.py", line 993, in convert
inference_output_type)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/python/lite.py", line 239, in _calibrate_quantize_model
inference_output_type, allow_float)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/python/optimize/calibrator.py", line 78, in calibrate_and_quantize
np.dtype(output_type.as_numpy_dtype()).num, allow_float)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/python/optimize/tensorflow_lite_wrap_calibration_wrapper.py", line 115, in QuantizeModel
return _tensorflow_lite_wrap_calibration_wrapper.CalibrationWrapper_QuantizeModel(self, input_py_type, output_py_type, allow_float)
RuntimeError: Invalid quantization params for op GATHER at index 2 in subgraph 0

I used pre-trained weights (uncased_L-24_H-128_B-512_A-4_F-4_OPT) that mentioned in README.md.
Is it required to distillation process before quantization-aware-training?

Regards,
Dongjin.

@saberkun
Copy link
Contributor

@renjie-liu

@renjie-liu
Copy link
Contributor

That's interesting, can you update to the latest tf-nightly and try again?

We're counting on the new quantizer.

thanks

@nadongguri
Copy link
Author

Hi @renjie-liu,
thank you for your response.
I've already tried to export using tf-nightly version on docker image. (tensorflow/tensorflow:nightly-gpu)
As far as I know, mobilebert uses tensorflow 1.15 version right? so It's not working.

Traceback (most recent call last):
File "run_squad.py", line 31, in
from mobilebert import modeling
File "/home/google-research/mobilebert/modeling.py", line 32, in
from tensorflow.contrib import layers as contrib_layers
ModuleNotFoundError: No module named 'tensorflow.contrib'

BTW, does it mean "counting on the new quantizer" to use tensorflow 2.x converter?

Regards,
Dongjin.

@renjie-liu
Copy link
Contributor

@liufengdb

Can you help take a look?

Thanks

@renjie-liu
Copy link
Contributor

I think the real issue is the model trained in 1.x world but the quantization needs 2.x. (it's easier for us to do internally)

@saberkun we probably need to migrate mobilebert to 2.x asap.

wdyt?

@saberkun
Copy link
Contributor

saberkun commented Jul 20, 2020

I think we already removed all tf.contrib usage. The code could probably run with TF1 compatible mode with tf.comat.v1.disable_v2_behavior. https://www.tensorflow.org/api_docs/python/tf/compat/v1/disable_v2_behavior
We just did not get a change to test in open source.

@nadongguri Would you try tf 2.x and adding tf.comat.v1.disable_v2_behavior() in main()?

@nadongguri
Copy link
Author

Hi @saberkun,
I tried to run a command "python run_squad.py ..." in tf 2.4.0-dev20200712 version with tf.compat.v1.disable_v2_behavior() in main and also in modeling.py.
The error is as follows.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Traceback (most recent call last):
File "run_squad.py", line 32, in
from mobilebert import modeling
File "/home/google-research/mobilebert/modeling.py", line 32, in
from tensorflow.contrib import layers as contrib_layers
ModuleNotFoundError: No module named 'tensorflow.contrib'

Do I remove code that uses the contrib package? (contrib_layers and quantize method only)

Also, you provide saved model files for float and quantized type so I tested quantized saved model file with toco in tf 1.15 but I got an error message during conversion.

2020-07-22 13:05:32.200417: F ./tensorflow/lite/toco/toco_tooling.h:38] Check failed: s.ok() Unimplemented: this graph contains an operator of type Cast for which the quantized form is not yet implemented. Sorry, and patches welcome (that's a relatively fun patch to write, mostly providing the actual quantized arithmetic code for this op).
Fatal Python error: Aborted
Current thread 0x00007f06cdae8740 (most recent call first):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/toco/python/toco_from_protos.py", line 52 in execute
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250 in _run_main
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299 in run
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40 in run
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/toco/python/toco_from_protos.py", line 89 in main
File "/usr/local/bin/toco_from_protos", line 8 in
Aborted (core dumped)

Regards,
Dongjin.

@nadongguri
Copy link
Author

I modified the run_squad.py script to remove the contrib module in order to export a quantized tflite model file in tf 2.4.0-dev20200712 version.
I used two saved_model; one is provided (https://storage.googleapis.com/cloud-tpu-checkpoints/mobilebert/mobilebert_squad_savedmodels.tar.gz), the other one is trained as guided.
I got the same error during convert.

loc("bert/encoder/layer_2/attention/self/clip_by_value/Minimum"): error: 'tfl.minimum' op quantization parameters violate the same scale constraint: !quant.uniform<i8:f32, 0.32020080089569092:-67> vs. !quant.uniform<i8:f32, 0.18242761492729187:-77>
Traceback (most recent call last):
File "export_tflite.py", line 642, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "export_tflite.py", line 632, in main
tflite_model = converter.convert()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py", line 1972, in convert
return super(TFLiteConverter, self).convert()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py", line 1341, in convert
result = self._calibrate_quantize_model(result, **flags)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py", line 444, in _calibrate_quantize_model
return _mlir_quantize(calibrated)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/convert.py", line 147, in mlir_quantize
inference_type)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/wrap_toco.py", line 52, in wrapped_experimental_mlir_quantize
inference_type)
RuntimeError: Failed to quantize: :0: error: loc("bert/encoder/layer_2/attention/self/clip_by_value/Minimum"): 'tfl.minimum' op quantization parameters violate the same scale constraint: !quant.uniform<i8:f32, 0.32020080089569092:-67> vs. !quant.uniform<i8:f32, 0.18242761492729187:-77>
:0: note: loc("bert/encoder/layer_2/attention/self/clip_by_value/Minimum"): see current operation: %3377 = "tfl.minimum"(%3373, %36) : (tensor<1x4x384x384x!quant.uniform<i8:f32, 0.32020080089569092:-67>>, tensor<!quant.uniform<i8:f32, 0.18242761492729187:-77>>) -> tensor<1x4x384x384x!quant.uniform<i8:f32, 0.32020080089569092:-67>>

@manojpreveen
Copy link

Can someone please clarify :
To reproduce the quantized int8 savedmodel as provided here (https://storage.googleapis.com/cloud-tpu-checkpoints/mobilebert/mobilebert_squad_savedmodels.tar.gz),

  1. Distillation on Pre-trained mobilebert checkpoint using Pre-trained data (https://github.com/google-research/google-research/tree/master/mobilebert#distillation)
  2. Running Quantization-aware-training with Squad (https://github.com/google-research/google-research/tree/master/mobilebert#run-quantization-aware-training-with-squad)

Are both the above steps required to get the quantized int8 savedmodel or just the step 2 with pre-trained model will give the quantized int8 model?

@saberkun @renjie-liu @liufengdb @nadongguri

@nadongguri
Copy link
Author

Hi all,
the quantized int8 SavedModel can be converted in tensorflow v2.3.0.
I got the result as
{"exact_match": 81.06906338694418, "f1": 88.54016833795568}.
Thank you.

@saberkun
Copy link
Contributor

I think this (88.54) matches the expectation.

@nadongguri
Copy link
Author

Thanks, I'll close this issue.

@jk78346
Copy link

jk78346 commented Dec 4, 2021

I modified the run_squad.py script to remove the contrib module in order to export a quantized tflite model file in tf 2.4.0-dev20200712 version. I used two saved_model; one is provided (https://storage.googleapis.com/cloud-tpu-checkpoints/mobilebert/mobilebert_squad_savedmodels.tar.gz), the other one is trained as guided. I got the same error during convert.

loc("bert/encoder/layer_2/attention/self/clip_by_value/Minimum"): error: 'tfl.minimum' op quantization parameters violate the same scale constraint: !quant.uniform<i8:f32, 0.32020080089569092:-67> vs. !quant.uniform<i8:f32, 0.18242761492729187:-77>
Traceback (most recent call last):
File "export_tflite.py", line 642, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "export_tflite.py", line 632, in main
tflite_model = converter.convert()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py", line 1972, in convert
return super(TFLiteConverter, self).convert()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py", line 1341, in convert
result = self._calibrate_quantize_model(result, **flags)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py", line 444, in _calibrate_quantize_model
return _mlir_quantize(calibrated)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/convert.py", line 147, in mlir_quantize
inference_type)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/wrap_toco.py", line 52, in wrapped_experimental_mlir_quantize
inference_type)
RuntimeError: Failed to quantize: :0: error: loc("bert/encoder/layer_2/attention/self/clip_by_value/Minimum"): 'tfl.minimum' op quantization parameters violate the same scale constraint: !quant.uniform<i8:f32, 0.32020080089569092:-67> vs. !quant.uniform<i8:f32, 0.18242761492729187:-77>
:0: note: loc("bert/encoder/layer_2/attention/self/clip_by_value/Minimum"): see current operation: %3377 = "tfl.minimum"(%3373, %36) : (tensor<1x4x384x384x!quant.uniform<i8:f32, 0.32020080089569092:-67>>, tensor<!quant.uniform<i8:f32, 0.18242761492729187:-77>>) -> tensor<1x4x384x384x!quant.uniform<i8:f32, 0.32020080089569092:-67>>

Hi, @nadongguri , I'm having the exact issue you had as the following:

Traceback (most recent call last):
  File "mobilebert/run_squad.py", line 32, in <module>
    from mobilebert import modeling
  File "/home/khsu4/google-research/mobilebert/modeling.py", line 32, in <module>
    from tensorflow.contrib import layers as contrib_layers
ModuleNotFoundError: No module named 'tensorflow.contrib'

What did you do exactly on modifying run_squad.py? thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants