Case insensitive field names #68561

niemyjski · 2021-02-04T21:36:18Z

I've searched the issues and community forum but couldn't really find any requests or issues talking about this.

I'd love for field names to be case insensitive, this would really allow for more scenarios for things like source includes (take a field list from an api of data to include and it just work). It probably would cause a lot less time for people tracking down other issues as well...

POST test-v1/_doc/test
{
  "test": "abc",
  "Test": "abcd",
  "tEst": "abcde"
}

GET /test-v1/_mapping
{
  "test-v1" : {
    "mappings" : {
      "properties" : {
        "Test" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "tEst" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "test" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

For example, we have a query parser (https://github.com/FoundatioFx/Foundatio.Parsers) and we can resolve any mapped field to the correct case, but cannot with unmapped fields.

Also, I'd really love to know why fields names are still case sensitive (I understand that JSON is case sensitive) and how there hasn't been a breaking change to change the field name behavior. One could logging a warning when multiple field names are present, or just indexing the first field (and discarding any extra with the same name?). I could see the document having different cases of a field and that's ok, it would share a single mapping. This would also help out by preventing field explosions and make querying and using the elastic api easier.

Databases have a variety of sensitivities. SQL, by default, is case insensitive to identifiers and keywords, but case sensitive to data. JSON is case sensitive to both field names and data.
https://blog.couchbase.com/json-case-sensitive-insensitive-search-index-data/#:~:text=SQL%2C%20by%20default%2C%20is%20case,both%20field%20names%20and%20data.

The text was updated successfully, but these errors were encountered:

markharwood · 2021-02-05T10:18:49Z

Thanks for the comments.

Also, I'd really love to know why fields names are still case sensitive (I understand that JSON is case sensitive)

Unfortunately I think that's the answer. We're built on JSON and its behaviour is something we can't change.
As far as I can tell MongoDB is the same in this regard

and how there hasn't been a breaking change to change the field name behavior

Any breaking change has to reach a certain level of importance for it to be considered. The importance can be measured by things like:

The number of people calling for the change
The lack of any good workarounds in the status quo
Our ability to migrate cleanly (e.g. having old indices and new indices co-exist under new software)

By the above measures:

I think this is the first time we've had this issue logged
Clients can normalise data in their client code or using ingest pipelines (although query/agg field names have no equivalent of doc ingest pipelines to change fieldnames)
I imagine it will be very difficult to provide software that allows a cluster to run a mix of old and new indices. Also, the alternative of asking customers to reindex all historical data is typically a no-no

I'll keep this issue open to see if it attracts any more interest but I wouldn't bank on this change happening anytime soon.

elasticmachine · 2021-02-05T10:18:59Z

Pinging @elastic/es-search (Team:Search)

ejsmith · 2021-02-05T16:17:20Z

I've wondered this as well. It seems like insanity to me that you can create 2 fields with the same name and different casing.

markharwood · 2021-02-05T16:52:12Z

I discussed this with the team today and we agreed that while a nice-to-have for some users the impact of such a change is huge and not something we will realistically attempt in the foreseeable future.
Closing, but will reopen if this ever changes.

Thanks for reaching out and sorry we're not able to help with this.

StingyJack · 2021-09-18T15:01:15Z

a nice-to-have for some user

@markharwood - Would any of you assign a different meaning to the word "DOG" if it were spelled "dog"? No, because its still referring to canis familiaris

This problem manifests itself as duplicated data points, often with different values. Think thats rare? Happens every time I use NEST. I can create a mapping that is this...

{
    "settings": {
        "analysis": {
            "normalizer": {
                "customSearchNormalizer": {
                    "type": "custom",
                    "char_filter": [],
                    "filter": ["lowercase", "asciifolding"]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "ListName": {
                "type": "keyword",
                "normalizer": "customSearchNormalizer"
            },
            "FieldNames": {
                "type": "keyword",
                "normalizer": "customSearchNormalizer"
            }          
        }
    }
}

... and verify that in kibana, but when the first document is indexed, NEST will decide to use some other casing for field names and changes the mapping to this...

{
  "mappings": {
    "_doc": {
      "properties": {
        "FieldNames": {
          "type": "keyword",
          "normalizer": "customSearchNormalizer"
        },
        "ListName": {
          "type": "keyword",
          "normalizer": "customSearchNormalizer"
        }
        "fieldNames": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "listName": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}

... and then I cant find any documents because I created the index using FieldNames and its added to that mapping to make fieldNames the place where the data is actually stored.

 "_source" : {
          "listName" : "MyList",
          "fieldNames" : [
            "OrgField1",
...

Yes, I know about .DefaultFieldNameInferrer(p => p), but the point is I shouldnt have to google around or remember to do that.

JSON is spec-ed wrong for the same reason. Nobody really wants to have two object properties with different cases and different values in a data payload. Thats part of the recipe for a nightmare-level support and troubleshooting experience, and ES isnt required to jump into that pit just because JSON does.

MSSQL (or Sybase at the time) figured out 30+ years ago that users dont want to deal with differences in casing when they go look for their data, and that they dont want their data altered by forcing some normalization scheme in order to enable that searching. The scenario in your userbase where someone actually depends on having field names of different cases is going to be exceedingly rare if at all. Give an option to allow case different duplicates if you think there are any users who need it, but please dont make the rest of us continue to suffer this problem.

markharwood · 2021-09-20T08:51:31Z

Would any of you assign a different meaning to the word "DOG" if it were spelled "dog"?

We're not compiling a dictionary here :)
As you well know, some things in computing are case sensitive e.g. the unix file system and the same questions over "usefulness" could be raised there.
I happen to agree that case sensitivity is not generally useful in field names but it is so firmly entrenched in so many deployments that we cannot simply flick a switch to change this.

where someone actually depends on having field names of different cases is going to be exceedingly rare if at all.

Trust me, someone out there somewhere is using field names to store hashes where a change in case would be catastrophic to them. As a result, we have to go through a complex procedure of introducing opt-in case-insensitivity flags, deprecation warnings and backward compatibility code for old clusters before flipping default behaviour etc. This migration effort for us and our users is what puts this firmly in the "high-hanging fruit" category and why we are not rushing to fix right now.

StingyJack · 2021-09-21T18:43:59Z

Are you sure you arent compiling a dictionary somewhere? At least as a test case?

Thank you for hearing my complaints.

olfek · 2023-02-28T12:35:04Z

I need this

Maybe you can introduce a new API that will let us do this

{
    "my_index": {
        "mappings": {
            "dynamic": "strict",
            "properties": {
                "name": {
                    "type": "text",
                    "key_case_sensitivity": "case_insensitive"  <-- THIS
                }
            }
        }
    }
}

olfek · 2023-02-28T12:40:35Z

Then I could create documents with name looking like

nAmE
nAME
NAME
...

with the "dynamic": "strict", setting, documents with the correct keys but different casing are rejected.

olfek · 2023-02-28T12:45:31Z

without "dynamic": "strict", the mapping can become polluted with casing variations each of which do not adhere to the manually configured mapping potentially altering or breaking application functionality.

olfek · 2023-02-28T12:46:30Z

"key_case_sensitivity": "case_insensitive" can be made OPT-IN so it is not a breaking change.

olfek · 2023-02-28T12:58:12Z

@niemyjski thanks for creating this issue, I was confused why elasticsearch was behaving like this.

Pinging @elastic/es-search (Team:Search)

Can we open this issue up again to discuss an OPT-IN non breaking solution?

niemyjski added >enhancement needs:triage Requires assignment of a team area label labels Feb 4, 2021

niemyjski changed the title ~~Support case insensitive field names~~ Case insensitive field names Feb 4, 2021

markharwood added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Feb 5, 2021

elasticmachine added the Team:Search Meta label for search team label Feb 5, 2021

markharwood removed the needs:triage Requires assignment of a team area label label Feb 5, 2021

markharwood closed this as completed Feb 5, 2021

markharwood added the high hanging fruit label Sep 20, 2021

javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Case insensitive field names #68561

Case insensitive field names #68561

niemyjski commented Feb 4, 2021 •

edited

Loading

markharwood commented Feb 5, 2021 •

edited

Loading

elasticmachine commented Feb 5, 2021

ejsmith commented Feb 5, 2021

markharwood commented Feb 5, 2021

StingyJack commented Sep 18, 2021

markharwood commented Sep 20, 2021 •

edited

Loading

StingyJack commented Sep 21, 2021

olfek commented Feb 28, 2023

olfek commented Feb 28, 2023

olfek commented Feb 28, 2023

olfek commented Feb 28, 2023

olfek commented Feb 28, 2023

Case insensitive field names #68561

Case insensitive field names #68561

Comments

niemyjski commented Feb 4, 2021 • edited Loading

markharwood commented Feb 5, 2021 • edited Loading

elasticmachine commented Feb 5, 2021

ejsmith commented Feb 5, 2021

markharwood commented Feb 5, 2021

StingyJack commented Sep 18, 2021

markharwood commented Sep 20, 2021 • edited Loading

StingyJack commented Sep 21, 2021

olfek commented Feb 28, 2023

olfek commented Feb 28, 2023

olfek commented Feb 28, 2023

olfek commented Feb 28, 2023

olfek commented Feb 28, 2023

niemyjski commented Feb 4, 2021 •

edited

Loading

markharwood commented Feb 5, 2021 •

edited

Loading

markharwood commented Sep 20, 2021 •

edited

Loading