New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queries cannot be sorted by a field using its defined "index_name" #8980
Comments
Indeed. Setting
It gives:
That said, I'm wondering if you should not better look at |
Well it doesn't seem like it's ignored in another scenario I have. That one was a gist I found that seemed to reproduce my same problem, but my original problem is this. I have a dynamic template like this (sorry for lack of valid json here, I just copied it from the head plugin which has formatted it):
That template is used with a document that looks like this: { id: 123, textfields: { summary: "hello" } } And it yields a mapping like this (note the index_name of summary_sort seems to be working):
But it does not like it when I sort by "summary_sort". I have this 'textfields' container for the fields I want to have analyzed, and doing that allows my dynamic template to target them easily by path (I have other string fields that are not analyzed that go into a string fields container). But I don't want to search for them by that nested path, hence my attempt to use index_name to hide the fact they are nested fields from searches. So anyway, you're finding that index_name is ignored in a subfield doesn't seem to be the case in my example here, that seems odd. I thought I had seen somewhere that copy_to was obsolete. I'd be happy to use it though, as long as I can accomplish my goal of 'hiding' the nested path of "textfields.summary_sort". I want it to be just "summary_sort"... can copy_to do that? Thanks so much for your help! |
Should work. |
Well that'd be awesome! Is there any concern about it being less performant though since copy_to seems to imply we'll have two instances of the data? Or is it just another reference to the same data? Not in a position to try it at the moment, so I will try it out tomorrow and update this issue :) If that works though is this still a legit bug? I'm curious if you were able to reproduce my 2nd mapping having index_name appear in the mapping via the dynamic template, and then still not being able to sort on it (but search/sort on the analyzed version works). |
It's a copy of data. So you can index it in a different way, using another analyzer. It's more a workaround I think as IMO what you described initially looks like a bug but I'd love to hear @clintongormley thoughts as well to confirm or infirm this. :) |
First, Note: multi-fields (or the new sub-fields) and copy-to do duplicate content as they create an index for each field name. That said, you can set the original (source) field to However, your intended goal is to be able to refer to these fields without the While you could have a template which matches I think you're stuck... |
I see. So, I have a document with a mix of fields of different types, and of the string fields some should be analyzed and some should not. I also do not know the names of the fields ahead of time, hence the dynamic template (and there could also be more fields added to documents at any time). I also have containers for numeric fields, date fields, etc, because I want that explicit mapping that "this is a date" without relying on date detection, etc. If there were a way to create a dynamic mapping that let me differentiate fields I want analyzed and fields I do not, even though they are the same type (string), and that didn't force me to use a field prefix/suffix in searches (in the document is ok), then I could maybe get away without having the 'textfields' container or any other container. Index_name seemed to be the perfect solution to that. Basically, index_name and dynamic templates were a great pair, because it meant you could use something to hang your template on, without having to dictate the structure of the fields in queries. I'm quite unsure what to do now... I've invested many sprints in rewriting a search system from using RavenDB to ElasticSearch, and now I'm not even sure it's possible to support our needs. Not only is index_name not being honored in my case, but it's being completely removed... I beg you to reconsider or at least validate that my scenario is a valid one that you want to support so I have hope for future versions. Is there a way I can dictate within the document what the mapping should be? I'll do anything at this point... :( |
How do you know on the RavenDB side which fields you want analyzed and which fields you don't? What happens if you have duplicate field names, but with different mapping requirements? With #8870 you are going to have to use the full path to reference fields, no longer the short name, but you can still use wildcards, eg Are you allowing your users to specify their own queries using the query DSL, or are you providing your own API and generating the DSL for them? If the latter, then all you have to do is to maintain a field to namespace.field mapping in your application (which can be refreshed on restart with a GET mapping request) and then rewriting fieldnames to their namespaced variety will be easy. If you're exposing the whole DSL then it is still possible, but will take a lot more work to get it right. I think that your current design will prove to be flawed in the long term - while it may work with your current requirements, later you'll want to do other stuff like retrieve the docs from Elasticsearch, or run aggregations, or highlight on fields etc, and you'll end up with this complicated scheme where the fields in your docs have no relationship with the fields in Elasticsearch. |
I don't. In fact, we have a hard-coded list of full text fields in the index definition, and we often need to manually fix it when someone needs a new one. Part of rewriting it towards ES was to hopefully get rid of that.
Right now there's absolutely a problem if different projects have the same field name of different mapping types. Thankfully though that just hasn't been an actual problem we ran into. But I was hoping to solve that by separating each project into its own Type in ES. They each get their own mapping. It's only a problem then if they actually try to search or do some thing with that field across all the types, which is ok, we can live with that. But most searches are within a single type, so that's ok. I've been assuming that ES is ok with two types in an index having the same field name with different mappings, is that not the case (again though, I understand it may have issues with cross type searches etc).
Since the only reason I had the structure in the document was to hang a path match onto it in a dynamic template, I shall have to switch away from dynamic templates. I think I can do that, but it means that I will have to generate a specific tailored mapping dynamically (and know when I need to amend it). It would be really awesome if dynamic templates were more flexible though, it would save a lot of complexity.
We abstract lucene away from the user. They are basically writing TSQL-like where clauses using a custom syntax we defined. We are taking that string, tokenizing it, and generating a lucene query string.
Yeah, I could have a flat list of all the field names across all the types and expand the namespace like you said. If they are searching across all types and there's a conflict, I can't really do that, but we've already established that just can't logically work so that's ok. I will think on this...
Fair enough, I don't want that. All I want is to define a mapping that works for my dynamic schema :) I think I can either (1) generate a non-dynamic mapping from each project configuration (and maintain the mapping by amending to it if a field is added -- which is a lot of complexity because there are constant reindexes occurring as documents change and I will need to coordinate the mapping update), or (2) expand field names as you suggested. Either solution will get me out from being stuck, but do add complexity that I didn't realize I would end up having due to index_name being removed. In short, I hope that ES could improve on the options we have for dynamic schemas and dynamic mappings. It doesn't have to be index_name, just something that can allow me to map my fields correctly without having to introduce search/sort-breaking structure to my documents. Perhaps a hint field in the document, or the ability to use a prefix on the field name that can be stripped off but matched on... or what have you. I'd be open to anything that makes it easier. Please! :) Thanks for your time and attention, I greatly appreciate it. |
Are you planning on moving off RavenDB completely, or using Elasticsearch in conjunction with it? Either way, the manual list of of fields is a good approach - that way you have complete control over the mapping, rather than having to try to munge things with dynamic mapping.
This is a problem: fields with the same name in different types are the same field! This is the source of numerous problems, just see how many tickets are linked to #4081. With #8870 we are planning to enforce the requirement that fields with the same name in the same index have the same mapping. You will have to use a separate index for these different projects, rather than separate types.
Actually this isn't very complex at all. You will need to create a new index with the appropriate mappings when you reindex anyway, so it should be very easy to generate the mappings for each field as part of the same process. (The requirement to have separate projects in separate indices actually makes this step easier too) |
I don't think there is any more to do here. Closing this ticket |
If you try to sort by "fieldname", and "fieldname" is the name of a mapped field as specified by "index_name", you get parse error stating that no mapping was found for that field. It doesn't seem right that you can search by the index_name but not sort by it, hence this bug. Without a fix for this I may be forced to put my data in the document twice, so that I can have an actual field with the correct name instead of using index_name.
Discussion on the forum:
https://groups.google.com/forum/#!topic/elasticsearch/6-BWdQTPTH0
Reproduced in 1.4 via this gist:
https://gist.github.com/pmishev/11375297
The text was updated successfully, but these errors were encountered: