-
Notifications
You must be signed in to change notification settings - Fork 872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] How to load only the tokenizer in multilingual pipeline #1199
Comments
You can use the
Unfortunately, this doesn't generalize well since you'd have to add the same thing for every language your pipeline will encounter. I'll add the ability to pass in a default map. |
sorry, had to edit mistake in the above code - don't rely on the email |
That's basically the workaround I found as well but as you said, it must be specified for each lang. Would be nice indeed to have this ability to pass a default list of processors :) Should I open a feature request instead? |
That's fine. I'm doing it right now |
…s will allow for specifying only the tokenize processor as requested in #1199 for example
If you install the dev branch, you can now pass in a
Not sure it's the best solution, but passing in a defaultdict wouldn't have worked as expected previously |
Perfect! Looks good to me 👍 Thanks very much! |
Hello,
I struggle to find how to do not load all the processors in a multilingual context.
When I do:
I get the following processors for the detected language:
Bust I only need to use and load the "tokenize" processor. How can I avoid the usage of all the other processors?
I specify that I don't know the language of the input text in advance.
Thanks in advance for any help :)
The text was updated successfully, but these errors were encountered: