Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move parent/child over from id cache to field data #4930

Closed
martijnvg opened this issue Jan 28, 2014 · 0 comments
Closed

Move parent/child over from id cache to field data #4930

martijnvg opened this issue Jan 28, 2014 · 0 comments

Comments

@martijnvg
Copy link
Member

Move all parent/child queries (has_child, has_parent, top_children) from id cache to field data. This has a number of advantages:

  • Parent/child memory footprint will get reduced by using field data, compared to what it now takes with id cache. The id cache use concrete object arrays to store the parent ids which is wasteful in terms of memory usage compared the field data which uses native byte arrays to store the parent ids (via Lucene's PagedBytes). Initial benchmarks have shown that the memory usage can be reduced up to half with parent/child using field data.
  • Parent child can use paged data structures because field data uses paged data structures under the hood as well. This will result in a better stability because on the jvm level, because of less garbage collection, which boils down to the fact that the storage behind paged data structures is reused between requests and paged data structures taking less memory in general compared to the concrete object arrays in id cache.
  • By reusing the field data parent/child can reuse its infrastructure For example using the CircuitBreaker to fail search requests if too much memory is being spent on parent/child rather then going out of memory.
  • The id cache is similar to field data in a sense that represents field values into memory by removing the id cache a lot of duplicate logic / code will be removed.

These advantages come at a cost of a small performance loss of up to 10% in query time execution, but the advantages outweigh the performance loss in terms of stability, predictability (less sudden gc collections) and less memory usage.

The id cache can be removed, since nothing inside ES is using it. For backward compatibility reason in 1.x releases the id cache statistics will be reported as was before, but it will be based on the _parent field in field data and the _parent field will not be reported in field data statistics.

@ghost ghost assigned martijnvg Jan 28, 2014
martijnvg added a commit that referenced this issue Feb 26, 2014
… to use paging data structures (BytesRefHash, BigFloatArray, BigIntArray) instead of hppc maps / sets.

Also removed the id cache.

Closes #4930
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant