Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language of the Evaluator's Tokenizer is set to English when asked for Dutch or French. #62

Closed
amobular opened this issue Jan 19, 2022 · 1 comment · Fixed by #63
Closed

Comments

@amobular
Copy link

The tokenizer of the Evaluator class, will be set to TokenizerEN when languages from ['nl', 'fr'] are chosen.

See the following code:

if language == 'nl':
from deidentify.tokenizer.tokenizer_ons import TokenizerOns
self.tokenizer = TokenizerOns(disable=('tagger', 'parser', 'ner'))
if language == 'fr':
from deidentify.tokenizer.tokenizer_fr import TokenizerFR
self.tokenizer = TokenizerFR(disable=('tagger', 'parser', 'ner'))
if language == 'de':
from deidentify.tokenizer.tokenizer_de import TokenizerDE
self.tokenizer = TokenizerDE(disable=('tagger', 'parser', 'ner'))
else:
from deidentify.tokenizer.tokenizer_en import TokenizerEN
self.tokenizer = TokenizerEN(disable=('tagger', 'parser', 'ner'))

Change the if statements to elif and it will work as intended.

Nice project btw :)

@jantrienes
Copy link
Collaborator

jantrienes commented Jan 19, 2022

Great catch! The fix should be simple: replace second and third if with elif. Do you mind submitting a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants