XLMR tokenizer is fully picklable #13577

ben-davidson-6 · 2021-09-15T11:28:22Z

What does this PR do?

This addresses the issue here #13200 to summarize:

unpickling was dependant on what was on disk
the tokenizer is now unpickled only with the serialised proto.

This is needed if you want to write a pyspark udf which tokenizes a column, as the tokenizer needs to be pickled and sent to other nodes.

Who can help

@LysandreJik

LysandreJik

This looks good to me - do you think you could implement a test in tests/test_tokenization_xlm_roberta.py?

ben-davidson-6 · 2021-09-16T09:49:25Z

This looks good to me - do you think you could implement a test in tests/test_tokenization_xlm_roberta.py?

done

LysandreJik

Wonderful! Thank you, @ben-davidson-6!

* made tokenizer fully picklable * remove whitespace * added testcase

ben-davidson-6 added 2 commits September 15, 2021 12:19

made tokenizer fully picklable

3d85aca

remove whitespace

d0c3cd3

LysandreJik reviewed Sep 15, 2021

View reviewed changes

added testcase

9a3654f

LysandreJik approved these changes Sep 16, 2021

View reviewed changes

LysandreJik merged commit e02ed0e into huggingface:master Sep 16, 2021

Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 13, 2022

XLMR tokenizer is fully picklable (huggingface#13577)

ad77528

* made tokenizer fully picklable * remove whitespace * added testcase

icyblade mentioned this pull request Jul 6, 2023

LlamaTokenizer should be picklable #24681

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XLMR tokenizer is fully picklable #13577

XLMR tokenizer is fully picklable #13577

ben-davidson-6 commented Sep 15, 2021

LysandreJik left a comment

ben-davidson-6 commented Sep 16, 2021

LysandreJik left a comment

XLMR tokenizer is fully picklable #13577

XLMR tokenizer is fully picklable #13577

Conversation

ben-davidson-6 commented Sep 15, 2021

What does this PR do?

Who can help

LysandreJik left a comment

Choose a reason for hiding this comment

ben-davidson-6 commented Sep 16, 2021

LysandreJik left a comment

Choose a reason for hiding this comment