Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any plan to support no-whitespace language? #443

Closed
faisalron opened this issue Oct 8, 2021 · 9 comments
Closed

Any plan to support no-whitespace language? #443

faisalron opened this issue Oct 8, 2021 · 9 comments

Comments

@faisalron
Copy link

I am planning to use rubrix for Japanese text data.
The search functionality doesn't seem to work well on this language.
I think it's better if we can customize the tokenizer used in elasticsearch instead of hardcoded "whitespace" tokenizer.

@frascuchon
Copy link
Member

frascuchon commented Oct 13, 2021

Hi @faisalron

Thanks for your comment.

Customize elasticsearch index settings and mappings is a feature that we have in our roadmap in a medium term, but we have some tech debt and considerations that we must aboard before apply these customizations.

Anyway, we could make some minimal changes to allow elasticsearch "out-of-rubrix" customizations by disabling automatic index template creation here

We could introduce an environment variable in order to enable/disable automatic index template creation, given more control to experimented users that would like to customize some elasticsearch aspect.

Would you like to aboard that? In that case, I can open an issue including some code guides for that.

@frascuchon
Copy link
Member

Hi @faisalron

You can try configure your own elasticsearch index templates and disable automatic index template crreation in rubrix using DISABLE_ES_INDEX_TEMPLATE_CREATION env var

@faisalron
Copy link
Author

@frascuchon Thank you so much for your quick response.
Let me try it and I'll let you know if anything comes up.

@faisalron
Copy link
Author

@frascuchon I tried to work with the env var and it works perfectly. Thanks

@frascuchon
Copy link
Member

Great @faisalron

We would appreciate a little tutorial explaining an example of use with this elasticsearch customization from rubrix. Would you like to try it?

@faisalron
Copy link
Author

Hi @frascuchon

We would appreciate a little tutorial explaining an example of use with this elasticsearch customization from rubrix. Would you like to try it?

Sure, I would love to if you don't mind.

However, I'm sorry the other day I closed the issue, because I think it worked fine, but it turns out that my index template is overwritten because of this line hasn't been modified yet.
https://github.com/recognai/rubrix/blob/master/src/rubrix/server/tasks/commons/dao/dao.py#L121

@frascuchon
Copy link
Member

It should works now from master. Could you check it @faisalron ?

@dvsrepo
Copy link
Member

dvsrepo commented Nov 5, 2021

Hi @faisalron , did you manage to take a look at this? Let us know if you need support

@frascuchon
Copy link
Member

closing since no activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants