-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language and Locale #19
Comments
There are additional models available on Hugging Face's model hub. There are also additional models provided by the Sentence Transformers team. #17 has an example of how to use one of these. |
Thanks for the answer, I will try with a couple of different models from the hub ! Adding to that, do you have an example on how to pass a tokenizer, to don't use the default at: txtai/src/python/txtai/extractor.py Line 33 in c85256f
|
For the tokenizer, you can pass any object that implements the method: def tokenize(text) In this method, you can split the text as you see fit. Could be as simple as text.split(). There is also a lower level abstraction called Pipeline, that you may find more useful if you want to skip the tokenization/content filtering process all together. |
Thanks a lot 👍 |
Dear commiters,
I would like to use txtai for a search query purpose but currently my content is not in English, is there parameters that can be provided to improve the results based on language and locale ?
Thanks,
The text was updated successfully, but these errors were encountered: