Skip to content
This repository has been archived by the owner on Nov 21, 2022. It is now read-only.

Store tokenizer metadata/object within model #13

Closed
SeanNaren opened this issue Jan 4, 2021 · 0 comments · Fixed by #18
Closed

Store tokenizer metadata/object within model #13

SeanNaren opened this issue Jan 4, 2021 · 0 comments · Fixed by #18
Labels
bug / fix Something isn't working help wanted Extra attention is needed Priority P0

Comments

@SeanNaren
Copy link
Contributor

When a model is saved, we do not store information pertaining to the tokenizer. This means we require the tokenizer to be re-created and assigned like below at inference/test time:

model = LitAutoModelTransformer.load_from_checkpoint('checkpoint.pt')
tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
model.tokenizer = tokenizer
...

It would be preferred that after specifying the tokenizer at training time, inference knows which tokenizer to use.

@SeanNaren SeanNaren added bug / fix Something isn't working help wanted Extra attention is needed Priority P0 labels Jan 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug / fix Something isn't working help wanted Extra attention is needed Priority P0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant