Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are the weights with quantization? #11

Closed
seranus opened this issue Jun 17, 2018 · 8 comments
Closed

Are the weights with quantization? #11

seranus opened this issue Jun 17, 2018 · 8 comments
Labels
enhancement New feature or request

Comments

@seranus
Copy link

seranus commented Jun 17, 2018

I was looking at the weight file sizes, they seem the same size like from original repos. Would be nice if the size could be reduced with quantization

@justadudewhohacks
Copy link
Owner

Hi,

You are right, the weights are not quantized. I am not familar yet with how to run inference with a quantized model and whether it's possible with tfjs. But it would be awesome if we could reduce the model sizes that way. I will dig into it.

@seranus
Copy link
Author

seranus commented Jun 18, 2018

The process of quantization is just changing your weights from float32 to uint8 so you get a 4 times size decrease. I usually do it trough the converter

@justadudewhohacks
Copy link
Owner

I know that you can quantize the weights using bazel, but do the weights simply get dequantized once you load them again?

I read somewhere that the ops in the network have to be aware of the quantized weights to run inference, but I might be wrong here.

In the first case, that should hopefully be easy to implement.

@seranus
Copy link
Author

seranus commented Jun 18, 2018

I'm not sure I never did manual quantization.

https://github.com/tensorflow/tfjs-converter/blob/master/python/tensorflowjs/quantization_test.py

From the looks of it there could be a default scaling based on type

@justadudewhohacks justadudewhohacks added the enhancement New feature or request label Jun 18, 2018
@justadudewhohacks
Copy link
Owner

Yep seems like you are right. Looking at the weight loader it's a simple scaling operation to dequantize the weights.

Awesome! I will try to get this running soon, decreasing the model size from 28mb to 7mb looks promising.

@justadudewhohacks
Copy link
Owner

Update: So I managed to quantize the weights for the face detection and the face landmark model. Currently the changes are available on this branch.

Apparently quantizing the face recognition model is not as straight forward, as it originally was not a tensorflow model. The issue here is that simply quantizing all weights will make the model unusable, in a way that it returns wrong outputs. Right now, it seems that leaving the weights for the conv64 layers uncompressed and quantize the rest does work out however.

Long story short: I am still working on it.

@justadudewhohacks
Copy link
Owner

And here it is :)

model weights have been quantized, to reduce the model size by ~75%:

  • face detection model: 21.7 MB -> 5.4 MB
  • face recognition model: 28.7 MB -> 7.0 MB
  • face landmark model: 21.9 MB -> 6.2 MB

plus model weights are sharded in chunks of 4 MB to allow them to be cached in the browser

@seranus
Copy link
Author

seranus commented Jun 23, 2018

Thanks, will check it out

@seranus seranus closed this as completed Jun 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants