-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export & use T5-Base model for summarization #33
Comments
Indeed you would have to manage all that stuff yourself. Edit: It might be useful if we provided some Swift wrapper code for this that would hide the complexity (since it's the same for most Transformer models) but right now we don't have this. |
yikes! I was ready to put my gloves on, but I've spent two days now trying to get the encoder / decoder models to run in python without going through |
@hollance Hey, I came around of implementing "that stuff" and have it running in Swift on MacOS and iOS now :) |
Hi @seboslaw! I've recently done a similar exercise, and discovered that if the model accepts flexible shapes, then Core ML only uses the CPU. In the case of sequence-to-sequence models such as T5, the decoder is configured to accept inputs whose length is unbounded, as you can see in the Predictions tab of Xcode ( I tried to work around this issue by using fixed shapes, but so far I've only tested autoregressive models. Using a fixed sequence length of, say, In addition, using fixed shapes requires that you prepare your inputs using padding and the appropriate attention masks, which is a bit more work to be done in the Swift code. This is a very interesting area for us, and as Matthijs mentioned we are considering whether to create some Swift wrappers and a set of "best practices" for conversion to help with these tasks. (No promises though, we're still assessing the problem :) |
Hey @pcuenca, thx for your reply! I've tried your suggestion (I think I did :) and updated the upperBounds of the input parameters. However, the Performance Report still says "CPU only" (see below) :( I used coremltools to edit the inputs of my already converted decoder model:
Since this didn't seem to work I looked into providing the inputs to the hf exporters tool directly. But then I saw that "The sequence_length specified in the configuration object is ignored" if "seq2seq" is provided. |
Hey @pcuenca, no worries - you were clear, I simply lack experience with the exporter :) I think I understand what needs to be done now, however, it seems that I need to export the T5 as two separate models, thus providing the
Why is it this way anyway? And is there a way to get this done aside from patching This is what I've started with (only
|
In the meantime I've tried editing the (using the exporter) exported MLModel through coremltools:
However, I receive this error:
So as far as I understand modifying an exported MLModel is off the table. @pcuenca Do you think doing it the way described in my prev post will be possible? |
Testing T5 is high up in my to-do list, I hope to get to it pretty soon and hopefully I'll have some insight then :) Sorry for the no-answer though. |
@pcuenca no worries and I totally understand :) |
I think I originally made it ignore the |
@seboslaw What you tried to do here used to work, but in newer versions of Core ML it results in the error you've seen. The problem is that the model was compiled with flexible shapes and this is inconsistent with the (fixed) shape you assign later on. I'm working in a local branch with some quick and dirty modifications to convert T5 using fixed shapes. I can push it later today so that you can keep testing on your end. |
@seboslaw This is the branch: #37. I have other local changes, so I hope I didn't break or miss anything. I verified that T5 encoder and decoder export with fixed shapes for all their inputs, and that Xcode's performance report successfully chooses the GPU for all operations. I haven't tried to run inference inside an app yet. |
@pcuenca awesome! I’ll give it a try as soon as I’m in front of my computer. Thanks a lot already for the effort! |
@pcuenca I tried it, but unfortunately it gives different results when compared to the non-GPU model. Hopefully, I simply messed up the padding. Right now I'm focussing on the decoder. I padded as follows:
Would you say that's correct? EDIT: Another problem I found is that the
With the new model my inputs look like this:
I'm not experienced with the sec2sec model architecture, but aren't the attention_masks supposed to suppress the additional |
@seboslaw Did you get summarization to work in Swift? How did you implement it? I converted the model, but don't know how to use it, and wasn't able to find much information online. |
Hey guys,
I'm pretty new to CoreML conversion stuff and took the naive approach of converting a T5-Base model to CoreML (I want to use it to generate summarisations). As layed out in the README I created an encoder and a decoder model, which worked without a problem:
This is where the fun begins :) I've only ever worked with the t5 model through transformers & pipelines. Like this:
As far as I understand by using the
model.generate
method the transformers utilities do all the heavy lifting here like creating the attention_masks, running the encoder, passing the encoder_hidden_states along, etc. pp.Am I right to assume that I would have to implement all this functionality by hand if I want to work with the CoreML encoder / decoder models?
I'm not only worried about using them in Python, but would also like to use them in Swift. But I guess there's no easy plug'n play solution here, right? :)
The text was updated successfully, but these errors were encountered: