Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to load bert vocab would result in unpickling error #249

Closed
wassimseif opened this issue Apr 22, 2020 · 11 comments
Closed

Trying to load bert vocab would result in unpickling error #249

wassimseif opened this issue Apr 22, 2020 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@wassimseif
Copy link

Hey,

I was trying to serve a finetuned bert model. in the logs of torchserve I noticed that the workers are not being loaded.

The exception I got is

2020-04-22 14:06:23,581 [INFO ] W-9004-sentiment-analysis_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     self.source_vocab = torch.load(self.manifest['model']['sourceVocab'])
2020-04-22 14:06:23,581 [INFO ] W-9007-sentiment-analysis_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - _pickle.UnpicklingError: invalid load key, '['.

The vocab file I used is this one

Also i noticed that even trying torch.load('vocab.txt') would throw the same error. Am i missing something with how to load the vocab ?

Thank you

@wassimseif
Copy link
Author

So i noticed that I'm loading a text vocab file which is not supported?

@wassimseif
Copy link
Author

Using a custom handler solved the issue but I would love to know how to get it running with the predefined handlers

@lokeshgupta1975
Copy link
Collaborator

@wassimseif custom handlers are not present for all types of models. If you look at https://github.com/pytorch/serve/tree/master/examples, you will find some examples. Over time, we will keep increasing the number of default handlers. Here is the current list of supported default handlers:
https://pytorch.org/serve/default_handlers.html

@adityabindal adityabindal added the bug Something isn't working label Apr 23, 2020
@fbbradheintz
Copy link
Contributor

If we're going to treat this as a bug (and not a limitation of the built-in handler), we'll need to figure a way to detect whether we're dealing with a parameter file serialized with torch.save(), or a text file - and it needs to not be checking the file extension.

@wassimseif
Copy link
Author

Is there a way to save the vocab of a tokenizer as not as text file ?

I'm using the transformers library & it seems that it only saves the vocab as a regular text file

Screen Shot 2020-04-25 at 1 13 30 AM

@MFreidank
Copy link
Collaborator

MFreidank commented Apr 25, 2020

@wassimseif For transformers models, you will need to provide custom handler code, the default text_handler does not support transformers models as of now.
For more details on how this would work, see:
https://medium.com/@freidankm_39840/deploy-huggingface-s-bert-to-production-with-pytorch-serve-27b068026d18

@laohur
Copy link

laohur commented Apr 27, 2020

@wassimseif For transformers models, you will need to provide custom handler code, the default text_handler does not support transformers models as of now.
For more details on how this would work, see:
https://medium.com/@freidankm_39840/deploy-huggingface-s-bert-to-production-with-pytorch-serve-27b068026d18

what is the benchmark of torchserve for bert?

@vgoklani
Copy link

Hey @wassimseif - how did you setup your zschrc profile? it looks amazing!

@wassimseif
Copy link
Author

Hey @vgoklani
Thanks !

Oh-My-Zsh + A LOT of configurations & long restless nights

@dhaniram-kshirsagar
Copy link
Contributor

This will be addressed via #302.

@harshbafna
Copy link
Contributor

BERT models are already part of TorchServe examples. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants