Question about exporting an integer-only MobileBERT to TF-Lite format. #325

nadongguri · 2020-07-15T10:58:32Z

Hi, I'm trying to export a mobilebert model to tflite format.

Environment
Docker (tensorflow/tensorflow:1.15.0-gpu-py3) image
V100 16GB

As guided in README.md., I followed "Run Quantization-aware-training with Squad" then "Export an integer-only MobileBERT to TF-Lite format." However, I got an error while converting to quantized tflite model.

2020-07-15 10:26:10.934857: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: graph_to_optimize
2020-07-15 10:26:10.934903: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] constant_folding: Graph size after: 4461 nodes (-1120), 4701 edges (-1124), time = 779.203ms.
2020-07-15 10:26:10.934931: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] constant_folding: Graph size after: 4461 nodes (0), 4701 edges (0), time = 374.792ms.
Traceback (most recent call last):
File "run_squad.py", line 1517, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "run_squad.py", line 1508, in main
tflite_model = converter.convert()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/python/lite.py", line 993, in convert
inference_output_type)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/python/lite.py", line 239, in _calibrate_quantize_model
inference_output_type, allow_float)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/python/optimize/calibrator.py", line 78, in calibrate_and_quantize
np.dtype(output_type.as_numpy_dtype()).num, allow_float)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/python/optimize/tensorflow_lite_wrap_calibration_wrapper.py", line 115, in QuantizeModel
return _tensorflow_lite_wrap_calibration_wrapper.CalibrationWrapper_QuantizeModel(self, input_py_type, output_py_type, allow_float)
RuntimeError: Invalid quantization params for op GATHER at index 2 in subgraph 0

I used pre-trained weights (uncased_L-24_H-128_B-512_A-4_F-4_OPT) that mentioned in README.md.
Is it required to distillation process before quantization-aware-training?

Regards,
Dongjin.

saberkun · 2020-07-17T00:41:05Z

@renjie-liu

renjie-liu · 2020-07-17T03:57:51Z

That's interesting, can you update to the latest tf-nightly and try again?

We're counting on the new quantizer.

thanks

nadongguri · 2020-07-17T05:20:16Z

Hi @renjie-liu,
thank you for your response.
I've already tried to export using tf-nightly version on docker image. (tensorflow/tensorflow:nightly-gpu)
As far as I know, mobilebert uses tensorflow 1.15 version right? so It's not working.

Traceback (most recent call last):
File "run_squad.py", line 31, in
from mobilebert import modeling
File "/home/google-research/mobilebert/modeling.py", line 32, in
from tensorflow.contrib import layers as contrib_layers
ModuleNotFoundError: No module named 'tensorflow.contrib'

BTW, does it mean "counting on the new quantizer" to use tensorflow 2.x converter?

Regards,
Dongjin.

renjie-liu · 2020-07-17T05:59:17Z

@liufengdb

Can you help take a look?

Thanks

renjie-liu · 2020-07-20T03:16:01Z

I think the real issue is the model trained in 1.x world but the quantization needs 2.x. (it's easier for us to do internally)

@saberkun we probably need to migrate mobilebert to 2.x asap.

wdyt?

saberkun · 2020-07-20T03:23:20Z

I think we already removed all tf.contrib usage. The code could probably run with TF1 compatible mode with tf.comat.v1.disable_v2_behavior. https://www.tensorflow.org/api_docs/python/tf/compat/v1/disable_v2_behavior
We just did not get a change to test in open source.

@nadongguri Would you try tf 2.x and adding tf.comat.v1.disable_v2_behavior() in main()?

nadongguri · 2020-07-22T14:27:06Z

Hi @saberkun,
I tried to run a command "python run_squad.py ..." in tf 2.4.0-dev20200712 version with tf.compat.v1.disable_v2_behavior() in main and also in modeling.py.
The error is as follows.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Traceback (most recent call last):
File "run_squad.py", line 32, in
from mobilebert import modeling
File "/home/google-research/mobilebert/modeling.py", line 32, in
from tensorflow.contrib import layers as contrib_layers
ModuleNotFoundError: No module named 'tensorflow.contrib'

Do I remove code that uses the contrib package? (contrib_layers and quantize method only)

Also, you provide saved model files for float and quantized type so I tested quantized saved model file with toco in tf 1.15 but I got an error message during conversion.

2020-07-22 13:05:32.200417: F ./tensorflow/lite/toco/toco_tooling.h:38] Check failed: s.ok() Unimplemented: this graph contains an operator of type Cast for which the quantized form is not yet implemented. Sorry, and patches welcome (that's a relatively fun patch to write, mostly providing the actual quantized arithmetic code for this op).
Fatal Python error: Aborted
Current thread 0x00007f06cdae8740 (most recent call first):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/toco/python/toco_from_protos.py", line 52 in execute
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250 in _run_main
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299 in run
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40 in run
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/toco/python/toco_from_protos.py", line 89 in main
File "/usr/local/bin/toco_from_protos", line 8 in
Aborted (core dumped)

Regards,
Dongjin.

nadongguri · 2020-07-22T17:57:59Z

I modified the run_squad.py script to remove the contrib module in order to export a quantized tflite model file in tf 2.4.0-dev20200712 version.
I used two saved_model; one is provided (https://storage.googleapis.com/cloud-tpu-checkpoints/mobilebert/mobilebert_squad_savedmodels.tar.gz), the other one is trained as guided.
I got the same error during convert.

loc("bert/encoder/layer_2/attention/self/clip_by_value/Minimum"): error: 'tfl.minimum' op quantization parameters violate the same scale constraint: !quant.uniform<i8:f32, 0.32020080089569092:-67> vs. !quant.uniform<i8:f32, 0.18242761492729187:-77>
Traceback (most recent call last):
File "export_tflite.py", line 642, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "export_tflite.py", line 632, in main
tflite_model = converter.convert()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py", line 1972, in convert
return super(TFLiteConverter, self).convert()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py", line 1341, in convert
result = self._calibrate_quantize_model(result, **flags)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py", line 444, in _calibrate_quantize_model
return _mlir_quantize(calibrated)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/convert.py", line 147, in mlir_quantize
inference_type)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/wrap_toco.py", line 52, in wrapped_experimental_mlir_quantize
inference_type)
RuntimeError: Failed to quantize: :0: error: loc("bert/encoder/layer_2/attention/self/clip_by_value/Minimum"): 'tfl.minimum' op quantization parameters violate the same scale constraint: !quant.uniform<i8:f32, 0.32020080089569092:-67> vs. !quant.uniform<i8:f32, 0.18242761492729187:-77>
:0: note: loc("bert/encoder/layer_2/attention/self/clip_by_value/Minimum"): see current operation: %3377 = "tfl.minimum"(%3373, %36) : (tensor<1x4x384x384x!quant.uniform<i8:f32, 0.32020080089569092:-67>>, tensor<!quant.uniform<i8:f32, 0.18242761492729187:-77>>) -> tensor<1x4x384x384x!quant.uniform<i8:f32, 0.32020080089569092:-67>>

manojpreveen · 2020-09-24T21:29:32Z

Can someone please clarify :
To reproduce the quantized int8 savedmodel as provided here (https://storage.googleapis.com/cloud-tpu-checkpoints/mobilebert/mobilebert_squad_savedmodels.tar.gz),

Distillation on Pre-trained mobilebert checkpoint using Pre-trained data (https://github.com/google-research/google-research/tree/master/mobilebert#distillation)
Running Quantization-aware-training with Squad (https://github.com/google-research/google-research/tree/master/mobilebert#run-quantization-aware-training-with-squad)

Are both the above steps required to get the quantized int8 savedmodel or just the step 2 with pre-trained model will give the quantized int8 model?

@saberkun @renjie-liu @liufengdb @nadongguri

nadongguri · 2020-12-20T07:48:09Z

Hi all,
the quantized int8 SavedModel can be converted in tensorflow v2.3.0.
I got the result as
{"exact_match": 81.06906338694418, "f1": 88.54016833795568}.
Thank you.

saberkun · 2020-12-22T02:55:56Z

I think this (88.54) matches the expectation.

nadongguri · 2020-12-28T04:43:51Z

Thanks, I'll close this issue.

jk78346 · 2021-12-04T19:18:09Z

I modified the run_squad.py script to remove the contrib module in order to export a quantized tflite model file in tf 2.4.0-dev20200712 version. I used two saved_model; one is provided (https://storage.googleapis.com/cloud-tpu-checkpoints/mobilebert/mobilebert_squad_savedmodels.tar.gz), the other one is trained as guided. I got the same error during convert.

loc("bert/encoder/layer_2/attention/self/clip_by_value/Minimum"): error: 'tfl.minimum' op quantization parameters violate the same scale constraint: !quant.uniform<i8:f32, 0.32020080089569092:-67> vs. !quant.uniform<i8:f32, 0.18242761492729187:-77>
Traceback (most recent call last):
File "export_tflite.py", line 642, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "export_tflite.py", line 632, in main
tflite_model = converter.convert()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py", line 1972, in convert
return super(TFLiteConverter, self).convert()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py", line 1341, in convert
result = self._calibrate_quantize_model(result, **flags)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py", line 444, in _calibrate_quantize_model
return _mlir_quantize(calibrated)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/convert.py", line 147, in mlir_quantize
inference_type)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/wrap_toco.py", line 52, in wrapped_experimental_mlir_quantize
inference_type)
RuntimeError: Failed to quantize: :0: error: loc("bert/encoder/layer_2/attention/self/clip_by_value/Minimum"): 'tfl.minimum' op quantization parameters violate the same scale constraint: !quant.uniform<i8:f32, 0.32020080089569092:-67> vs. !quant.uniform<i8:f32, 0.18242761492729187:-77>
:0: note: loc("bert/encoder/layer_2/attention/self/clip_by_value/Minimum"): see current operation: %3377 = "tfl.minimum"(%3373, %36) : (tensor<1x4x384x384x!quant.uniform<i8:f32, 0.32020080089569092:-67>>, tensor<!quant.uniform<i8:f32, 0.18242761492729187:-77>>) -> tensor<1x4x384x384x!quant.uniform<i8:f32, 0.32020080089569092:-67>>

Hi, @nadongguri , I'm having the exact issue you had as the following:

Traceback (most recent call last):
  File "mobilebert/run_squad.py", line 32, in <module>
    from mobilebert import modeling
  File "/home/khsu4/google-research/mobilebert/modeling.py", line 32, in <module>
    from tensorflow.contrib import layers as contrib_layers
ModuleNotFoundError: No module named 'tensorflow.contrib'

What did you do exactly on modifying run_squad.py? thanks.

nadongguri closed this as completed Dec 28, 2020

bhbruce mentioned this issue Aug 13, 2021

Error about convert MobileBERT to TFLite #780

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about exporting an integer-only MobileBERT to TF-Lite format. #325

Question about exporting an integer-only MobileBERT to TF-Lite format. #325

nadongguri commented Jul 15, 2020

saberkun commented Jul 17, 2020

renjie-liu commented Jul 17, 2020

nadongguri commented Jul 17, 2020

renjie-liu commented Jul 17, 2020

renjie-liu commented Jul 20, 2020

saberkun commented Jul 20, 2020 •

edited

nadongguri commented Jul 22, 2020

nadongguri commented Jul 22, 2020

manojpreveen commented Sep 24, 2020

nadongguri commented Dec 20, 2020

saberkun commented Dec 22, 2020

nadongguri commented Dec 28, 2020

jk78346 commented Dec 4, 2021

Question about exporting an integer-only MobileBERT to TF-Lite format. #325

Question about exporting an integer-only MobileBERT to TF-Lite format. #325

Comments

nadongguri commented Jul 15, 2020

saberkun commented Jul 17, 2020

renjie-liu commented Jul 17, 2020

nadongguri commented Jul 17, 2020

renjie-liu commented Jul 17, 2020

renjie-liu commented Jul 20, 2020

saberkun commented Jul 20, 2020 • edited

nadongguri commented Jul 22, 2020

nadongguri commented Jul 22, 2020

manojpreveen commented Sep 24, 2020

nadongguri commented Dec 20, 2020

saberkun commented Dec 22, 2020

nadongguri commented Dec 28, 2020

jk78346 commented Dec 4, 2021

saberkun commented Jul 20, 2020 •

edited