Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't create mapping entry for dynamic templates #6619

Closed
mikemccand opened this issue Jun 25, 2014 · 5 comments
Closed

Don't create mapping entry for dynamic templates #6619

mikemccand opened this issue Jun 25, 2014 · 5 comments

Comments

@mikemccand
Copy link
Contributor

Today, when a new field name shows up in a document matching a dynamic template, we record that field name, its type information, etc.

But if you add many, many fields this way, the mappings become very large and serializing them into the cluster state becomes very costly.

I think we may be able to get away with not making a mapping entry and just re-matching that same field the next time it comes? Or maybe making mapping entries only up until a limit..

@mikemccand
Copy link
Contributor Author

I spent some time looking at the mapping code but I don't understand it enough to make progress here... can someone who knows ObjectMapper.java give some pointers?

I tried commenting out the putMapper(mapper) and context.setMappingsModified() in the end of parseDynamicValue, but this makes many tests angry...

@areek areek assigned rjernst and unassigned rjernst Jun 29, 2014
@kimchy
Copy link
Member

kimchy commented Jun 30, 2014

This will be a rather big change, since we also need to change in each place that looks up a mapping (for search and such). I think that concrete mappings, even with dynamic templates, is very valuable, for example, Kibana can then auto suggest existing fields and such.

I think that there is a lot of improvements that we can add to ES even when it concretely creates mappings. One is this: #6648, the other is potentially to move from update on write data structures (that have a better concurrency story) to update in place concurrent data structures above a certain threshold. Based on my tests, I think we can get to a very good perf while still maintaing the concrete mappings case.

The cluster state is the place that will suffer, or when someone has 1 million fields for example. But I think that this is simply abusing the system and things will break in other places (in terms of resources used, ...), not just mappings.

@kimchy
Copy link
Member

kimchy commented Jul 3, 2014

update

@kimchy kimchy removed v1.3.0 labels Jul 5, 2014
@kimchy
Copy link
Member

kimchy commented Jul 5, 2014

#6707 has been pushed as well, I think we are at a good state performance wise, so closing this for now, we can reopen a new issue if this is still a problem

@kimchy kimchy closed this as completed Jul 5, 2014
@kimchy
Copy link
Member

kimchy commented Jul 21, 2014

#6843 another one that helps a lot with memory usage here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants