-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StableDiffusion Tensorflow to TF Lite #1033
Comments
Thanks @charbull for the report! Will take a look at this. TFLite conversion would be an awesome addition. |
Leaving another conversion issue here related to the tf lite max size 2G related to protobuf: divamgupta/stable-diffusion-tensorflow#58 (comment) |
Looking forward to have a tf lite and stablediffusion on the coral :) |
I was able to convert
def _create_broadcast_shape(self, input_shape):
broadcast_shape = [1] * len(input_shape)
broadcast_shape[self.axis] = input_shape[self.axis] // self.groups
broadcast_shape.insert(self.axis, self.groups)
return broadcast_shape to def _create_broadcast_shape(self, input_shape):
broadcast_shape = [1] * input_shape.shape.rank
broadcast_shape[self.axis] = input_shape[self.axis] // self.groups
broadcast_shape.insert(self.axis, self.groups)
return broadcast_shape
|
Thank you @freedomtan I am trying those now, |
Hi @freedomtan
|
@charbull batch_size isn't configured at a StableDiffusion-wide level in our implementation, but rather passed as a parameter to specific SD flows (e.g. the |
@ianstenbit oh I see ! thank you for the clarification. I am not there yet. |
I wonder why the 2GB limitation will be applied here, since weights is not part of the proto? |
Yup, you have to modify the source code, either adding an argument or directly modifying Input layers work :-) |
Yup, that's not trivial. In TensorFlow 2.10 and before (I didn't check if it is changed in 2.11 or later), it seems when you try to convert a model to tflite from concrete function, it is converted to frozen graphdef first. Thus, there is protobuf 2 GiB limitation. |
I see, that makes sense. Though in the future, I'd really hope this is part of logic will get done at tf lite runtime, instead of model saving. |
you mean the 2GB limit is still true across 2.10 and master, correct? |
Yes that link is pointing to master. Then if we see the conversion workflow... https://www.tensorflow.org/lite/models/convert?hl=en#model_conversion |
Nope, there are two 2 GiB limitations, one is protobuf, the other is flatbuffer, which is file format of TFLite. |
Right, I think this is suboptimal, freezing can be done at runtime, I will try to figure out how to push this forward with TFLite separately. |
But I think that probably we could still solve using something like https://github.com/tensorflow/hub/blob/master/examples/text_embeddings/export.py#L204-L219. What do you think? |
As I saw another case pointing to that workaround (tensorflow/tensorflow#47326 (comment)) |
Hmm...that's basically rewriting the graph and use placeholder instead....is this even do-able at TF2? |
@abattery What do you think? |
I was able to make some progress based on advice from @costiash here: divamgupta/stable-diffusion-tensorflow#58 (comment)
It does generates an TFlite with 16float ~ 1.6 GB size file. However, since I am running tflite on edge tpu, It seems I need to go full quantization with int8. The https://www.tensorflow.org/api_docs/python/tf/lite/RepresentativeDataset is needed for full quantization, according to the reference:
Are there samples I can use from this project? or do I need generate some images from TF stable diffusion and use them? |
@charbull, my 2 cents: DO NOT use With recently tf master + keras_cv (0.3.4 + group norm patch), converting from keras model works like a charm.
The TF master I tested was built from tensorflow/tensorflow@680a9b2 |
@freedomtan thank you ! we are getting closer. It turns out I need to run the following for the edge TPU quantization : https://www.tensorflow.org/lite/performance/post_training_quantization#integer_only
trying to figure out how to get the
Are there samples I can use from this project? or do I need generate some images from TF stable diffusion and Cheers, |
I don't remember what checkpoint we have used: Probably for the calibration you can pass prompts samples from: |
@bhack thank you ! will give it a try :) |
Hi, I tried the following so far with @ianstenbit I. I generated images from prompts and put them in a csv file so that I can prepare the representative_dataset
Then the conversion:
Getting the following error, not sure what is the issue:
II. Tried to cut the conversion for the encoder alone:
which also produced the same error:
Any ideas what is going wrong? Thank you |
As discussed, I think what you stored in In order to convert the full StableDiffusion model, we'll need to either
|
@charbull
|
@freedomtan @ianstenbit What would be the best approach to get to tflite int8 in this case. Would the edge tpu "knows" how to handle this? There is also the Tflite interpreter. Not sure exactly what would be the best way going forward ? :) If you can highlight the steps and the library/tools, I am happy to give it a shot. Cheers |
@charbull FYR. I gave a try. ideally, if we feed input tensors with correct ranges, we should be able to get quantized model.
model = keras_cv.models.StableDiffusion(img_width=512, img_height=512)
prompt_1 = "A watercolor painting of a Golden Retriever at the beach"
encoding_1 = model.encode_text(prompt_1)
def get_timestep_embedding(timestep, batch_size, dim=320, max_period=10000):
half = dim // 2
freqs = tf.math.exp(
-math.log(max_period) * tf.range(0, half, dtype=tf.float32) / half
)
args = tf.convert_to_tensor([timestep], dtype=tf.float32) * freqs
embedding = tf.concat([tf.math.cos(args), tf.math.sin(args)], 0)
embedding = tf.reshape(embedding, [1, -1])
return tf.repeat(embedding, batch_size, axis=0)
def representative_data_gen():
for i in range(1):
em = get_timestep_embedding(i, 1)
noise = tf.random.normal(1, 64, 64, 4)
yield [encoding_1, em, noise]
converter = tf.lite.TFLiteConverter.from_keras_model(model.diffusion_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8 # or tf.uint8
converter.inference_output_type = tf.uint8 # or tf.uint8
converter.representative_dataset = representative_data_gen
tflite_model_int8 = converter.convert() unfortunately, I got the following message. Dunno how to solve it yet.
|
it turns an important message escaped my attention
But the quantized model isn't supposed to be bigger than 2gb :-( Checking the converter source code, it seems for tflite full-integer quantization, a TF model is converted to a TFLite model and an intermediate tflite model file is saved. Then the converter would try to optimize the intermediate TFLite model. But since the intermediate tflite model is > 2 gb, the conversion failed. |
@freedomtan thank you for updating here, To save each model separately then converting them to tflite/int8. |
@charbull |
hey @charbull so are all TFLite models converted? |
Hey @LukeWood @freedomtan i wasn't able to make the Tflite/int8 work yet. As the int8 conversion requires data samples. |
@charbull FYR. I wrote a script that could be used to split the diffusion model into two chunks, so that it will be trivial to convert the two split models into (fp32) tflite models and easier to generate post-training quantization models. https://github.com/freedomtan/keras_cv_stable_diffusion_to_tflite |
@charbull Generate quantized int8 models also works, see https://github.com/freedomtan/keras_cv_stable_diffusion_to_tflite/blob/main/convert_keras_diffusion_model_into_two_tflite_models_qint8.ipynb |
@freedomtan thank you, I ran the qint8 version, but for some reason it is crashing my notebook. Maybe I need to upgrade :) |
@charbull you mean you tried my conversion Jupyter notebook? my guess is, mostly memory problem (out of memory, etc.) |
how does the tokenizer work in this conversion? |
I don't really know how to generate representative dataset, which is essential for getting reasonable accuracy. I just showed that quantization is possible :-) |
added a jupyter notebook showing the converted tflite models work, https://github.com/freedomtan/keras_cv_stable_diffusion_to_tflite/blob/main/text_to_image_using_converted_tflite_models.ipynb. Most of the code is from keras cv implementation. I replaced keras model inference code with tflite model inference. It's not a complete implementation, just to test it per request from @sayakpaul. |
Here's an end to end Colab notebook: https://github.com/sayakpaul/Adventures-in-TensorFlow-Lite/blob/master/Stable_Diffusion_to_TFLite.ipynb building on top of @freedomtan's awesome work. |
@freedomtan May I ask how did you get this error message for integer-only quantization?
I can only get the original error message which is
|
Error Message:
Model_Id: https://huggingface.co/kadirnar/dreambooth_diffusion_model_v5 |
This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you. |
Up |
Hi @LukeWood,
For fun, I tried converting stable Diffusion model from Tensorflow to TF lite, so that I can run it on coral/edge tpu.
I tried two approaches:
I- Saved model approach:
II- Go through h5
will try to document them as much as possible. (sorry in advance for the long traces)
for both:
Using
!pip install --upgrade keras-cv
I was not able to save the model for both.I- Saved model approach:
II- Go through h5
h5
TF 2.11.0
does not load h5 files anymore.tf 2.11.0
and install tf2.1.0
The
load_model
throws the following error:The text was updated successfully, but these errors were encountered: