Any plan to support no-whitespace language? #443

faisalron · 2021-10-08T08:08:33Z

I am planning to use rubrix for Japanese text data.
The search functionality doesn't seem to work well on this language.
I think it's better if we can customize the tokenizer used in elasticsearch instead of hardcoded "whitespace" tokenizer.

frascuchon · 2021-10-13T08:56:27Z

Hi @faisalron

Thanks for your comment.

Customize elasticsearch index settings and mappings is a feature that we have in our roadmap in a medium term, but we have some tech debt and considerations that we must aboard before apply these customizations.

Anyway, we could make some minimal changes to allow elasticsearch "out-of-rubrix" customizations by disabling automatic index template creation here

We could introduce an environment variable in order to enable/disable automatic index template creation, given more control to experimented users that would like to customize some elasticsearch aspect.

Would you like to aboard that? In that case, I can open an issue including some code guides for that.

frascuchon · 2021-10-18T08:25:33Z

Hi @faisalron

You can try configure your own elasticsearch index templates and disable automatic index template crreation in rubrix using DISABLE_ES_INDEX_TEMPLATE_CREATION env var

faisalron · 2021-10-19T07:52:38Z

@frascuchon Thank you so much for your quick response.
Let me try it and I'll let you know if anything comes up.

faisalron · 2021-10-19T10:03:04Z

@frascuchon I tried to work with the env var and it works perfectly. Thanks

frascuchon · 2021-10-20T15:06:51Z

Great @faisalron

We would appreciate a little tutorial explaining an example of use with this elasticsearch customization from rubrix. Would you like to try it?

faisalron · 2021-10-21T09:54:02Z

Hi @frascuchon

We would appreciate a little tutorial explaining an example of use with this elasticsearch customization from rubrix. Would you like to try it?

Sure, I would love to if you don't mind.

However, I'm sorry the other day I closed the issue, because I think it worked fine, but it turns out that my index template is overwritten because of this line hasn't been modified yet.
https://github.com/recognai/rubrix/blob/master/src/rubrix/server/tasks/commons/dao/dao.py#L121

frascuchon · 2021-10-22T14:59:30Z

It should works now from master. Could you check it @faisalron ?

dvsrepo · 2021-11-05T12:28:50Z

Hi @faisalron , did you manage to take a look at this? Let us know if you need support

frascuchon · 2022-01-24T13:27:57Z

closing since no activity

frascuchon mentioned this issue Oct 13, 2021

Allow disabling ES index template creation by environment var #455

Closed

torkashvand mentioned this issue Oct 14, 2021

Allow disabling ES index template creation by environment var #469

Merged

faisalron closed this as completed Oct 19, 2021

faisalron reopened this Oct 21, 2021

frascuchon mentioned this issue Oct 22, 2021

fix(server): prevent index template recreation for records index #497

Merged

frascuchon closed this as completed Jan 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any plan to support no-whitespace language? #443

Any plan to support no-whitespace language? #443

faisalron commented Oct 8, 2021

frascuchon commented Oct 13, 2021 •

edited

frascuchon commented Oct 18, 2021

faisalron commented Oct 19, 2021

faisalron commented Oct 19, 2021

frascuchon commented Oct 20, 2021

faisalron commented Oct 21, 2021

frascuchon commented Oct 22, 2021

dvsrepo commented Nov 5, 2021

frascuchon commented Jan 24, 2022

Any plan to support no-whitespace language? #443

Any plan to support no-whitespace language? #443

Comments

faisalron commented Oct 8, 2021

frascuchon commented Oct 13, 2021 • edited

frascuchon commented Oct 18, 2021

faisalron commented Oct 19, 2021

faisalron commented Oct 19, 2021

frascuchon commented Oct 20, 2021

faisalron commented Oct 21, 2021

frascuchon commented Oct 22, 2021

dvsrepo commented Nov 5, 2021

frascuchon commented Jan 24, 2022

frascuchon commented Oct 13, 2021 •

edited