A way to disable normalization rules in parsbert Tokenizer #31

ShahrzadShaterzadeh · 2024-01-23T08:25:41Z

Hi,
Is there any way to disable all or some of normalization rules in parsbert tokenizer?
For example do not convert "آ" to "ا" or "ئ" to "ی".
Also the tokenizer removes all half-spaces and concatenate the words.
Setting the do_lower_case and srip_accents parameters to false does not work.
I would be so grateful if you let me know whether there is any solution to my problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A way to disable normalization rules in parsbert Tokenizer #31

A way to disable normalization rules in parsbert Tokenizer #31

ShahrzadShaterzadeh commented Jan 23, 2024

A way to disable normalization rules in parsbert Tokenizer #31

A way to disable normalization rules in parsbert Tokenizer #31

Comments

ShahrzadShaterzadeh commented Jan 23, 2024