-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Case insensitive field names #68561
Comments
Thanks for the comments.
Unfortunately I think that's the answer. We're built on JSON and its behaviour is something we can't change.
Any breaking change has to reach a certain level of importance for it to be considered. The importance can be measured by things like:
By the above measures:
I'll keep this issue open to see if it attracts any more interest but I wouldn't bank on this change happening anytime soon. |
Pinging @elastic/es-search (Team:Search) |
I've wondered this as well. It seems like insanity to me that you can create 2 fields with the same name and different casing. |
I discussed this with the team today and we agreed that while a nice-to-have for some users the impact of such a change is huge and not something we will realistically attempt in the foreseeable future. Thanks for reaching out and sorry we're not able to help with this. |
@markharwood - Would any of you assign a different meaning to the word "DOG" if it were spelled "dog"? No, because its still referring to canis familiaris This problem manifests itself as duplicated data points, often with different values. Think thats rare? Happens every time I use NEST. I can create a mapping that is this... {
"settings": {
"analysis": {
"normalizer": {
"customSearchNormalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"properties": {
"ListName": {
"type": "keyword",
"normalizer": "customSearchNormalizer"
},
"FieldNames": {
"type": "keyword",
"normalizer": "customSearchNormalizer"
}
}
}
} ... and verify that in kibana, but when the first document is indexed, NEST will decide to use some other casing for field names and changes the mapping to this... {
"mappings": {
"_doc": {
"properties": {
"FieldNames": {
"type": "keyword",
"normalizer": "customSearchNormalizer"
},
"ListName": {
"type": "keyword",
"normalizer": "customSearchNormalizer"
}
"fieldNames": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"listName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
... and then I cant find any documents because I created the index using "_source" : {
"listName" : "MyList",
"fieldNames" : [
"OrgField1",
... Yes, I know about JSON is spec-ed wrong for the same reason. Nobody really wants to have two object properties with different cases and different values in a data payload. Thats part of the recipe for a nightmare-level support and troubleshooting experience, and ES isnt required to jump into that pit just because JSON does. MSSQL (or Sybase at the time) figured out 30+ years ago that users dont want to deal with differences in casing when they go look for their data, and that they dont want their data altered by forcing some normalization scheme in order to enable that searching. The scenario in your userbase where someone actually depends on having field names of different cases is going to be exceedingly rare if at all. Give an option to allow case different duplicates if you think there are any users who need it, but please dont make the rest of us continue to suffer this problem. |
We're not compiling a dictionary here :)
Trust me, someone out there somewhere is using field names to store hashes where a change in case would be catastrophic to them. As a result, we have to go through a complex procedure of introducing opt-in case-insensitivity flags, deprecation warnings and backward compatibility code for old clusters before flipping default behaviour etc. This migration effort for us and our users is what puts this firmly in the "high-hanging fruit" category and why we are not rushing to fix right now. |
Are you sure you arent compiling a dictionary somewhere? At least as a test case? Thank you for hearing my complaints. |
I need this Maybe you can introduce a new API that will let us do this
|
Then I could create documents with
with the |
without |
|
@niemyjski thanks for creating this issue, I was confused why elasticsearch was behaving like this. Pinging @elastic/es-search (Team:Search) Can we open this issue up again to discuss an OPT-IN non breaking solution? |
I'd love for field names to be case insensitive, this would really allow for more scenarios for things like source includes (take a field list from an api of data to include and it just work). It probably would cause a lot less time for people tracking down other issues as well...
For example, we have a query parser (https://github.com/FoundatioFx/Foundatio.Parsers) and we can resolve any mapped field to the correct case, but cannot with unmapped fields.
Also, I'd really love to know why fields names are still case sensitive (I understand that JSON is case sensitive) and how there hasn't been a breaking change to change the field name behavior. One could logging a warning when multiple field names are present, or just indexing the first field (and discarding any extra with the same name?). I could see the document having different cases of a field and that's ok, it would share a single mapping. This would also help out by preventing field explosions and make querying and using the elastic api easier.
The text was updated successfully, but these errors were encountered: