Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using cached components in the _analyze API #19827

Closed
nik9000 opened this issue Aug 5, 2016 · 0 comments
Closed

Stop using cached components in the _analyze API #19827

nik9000 opened this issue Aug 5, 2016 · 0 comments
Labels
blocker :Search/Analysis How text is split into tokens v5.0.0-beta1

Comments

@nik9000
Copy link
Member

nik9000 commented Aug 5, 2016

We'd like to simplify AnalysisService and remove most of its members (#19814), maybe reducing it to just a Map<String, Analyzer>. To do that, we should remove calls to tokenizer, charFilter, and tokenFilter from the _analyze API, instead rebuilding these analyzers on the fly. This will make some calls to the _analyze API slower but it'll reduce the per index heap overhead.

@nik9000 nik9000 added blocker :Search/Analysis How text is split into tokens v5.0.0-beta1 labels Aug 5, 2016
johtani added a commit to johtani/elasticsearch that referenced this issue Aug 10, 2016
Stop calling tokenizer/tokenFilters/chaFilter method of IndexService
Add some getAnalysisProvider methods
Change SynonymTokenFilterFactory constructor

Closes elastic#19827
johtani added a commit to johtani/elasticsearch that referenced this issue Aug 12, 2016
s1monw added a commit to s1monw/elasticsearch that referenced this issue Sep 22, 2016
…ping

Today we hold on to all possible tokenizers, tokenfilters etc. when we create
an index service on a node. This was mainly done to allow the `_analyze` API to
directly access all these primitve. We fixed this in elastic#19827 and can now get rid of
the AnalysisService entirely and replace it with a simple map like class. This
ensures we don't create a gazillion long living objects that are entirely useless since
they are never used in most of the indices. Also those objects might consume a considerable
amount of memory since they might load stopwords or synonyms etc.

Closes elastic#19828
s1monw added a commit that referenced this issue Sep 23, 2016
…ping (#20627)

Today we hold on to all possible tokenizers, tokenfilters etc. when we create
an index service on a node. This was mainly done to allow the `_analyze` API to
directly access all these primitive. We fixed this in #19827 and can now get rid of
the AnalysisService entirely and replace it with a simple map like class. This
ensures we don't create a gazillion long living objects that are entirely useless since
they are never used in most of the indices. Also those objects might consume a considerable
amount of memory since they might load stopwords or synonyms etc.

Closes #19828
s1monw added a commit that referenced this issue Sep 23, 2016
…ping (#20627)

Today we hold on to all possible tokenizers, tokenfilters etc. when we create
an index service on a node. This was mainly done to allow the `_analyze` API to
directly access all these primitive. We fixed this in #19827 and can now get rid of
the AnalysisService entirely and replace it with a simple map like class. This
ensures we don't create a gazillion long living objects that are entirely useless since
they are never used in most of the indices. Also those objects might consume a considerable
amount of memory since they might load stopwords or synonyms etc.

Closes #19828
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker :Search/Analysis How text is split into tokens v5.0.0-beta1
Projects
None yet
Development

No branches or pull requests

2 participants