Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for a fully numeric flattened field #61550

Open
jimczi opened this issue Aug 25, 2020 · 15 comments
Open

Support for a fully numeric flattened field #61550

jimczi opened this issue Aug 25, 2020 · 15 comments
Assignees
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@jimczi
Copy link
Contributor

jimczi commented Aug 25, 2020

This issue is a spinoff of #43805 that focuses on a specific use case: supporting numeric fields in the flattened field.
We've discussed this internally and agreed that it is something that we'd like to provide.
This new field could be considered as the numeric version of the flattened field where all values should be parseable as numbers. The details of the implementation are still unclear but multiple ideas were shared internally:

  • We could reuse the framework added for the rank_feature query where field names could be indexed as terms and values as frequencies.

  • We could use points with multiple dimensions and/or prefixes/suffixes to index the pair field name, value.

This issue is a placeholder to provide feedback and updates on the overall plan (supporting a fully numeric flattened field).

@jimczi jimczi added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types labels Aug 25, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@jpountz
Copy link
Contributor

jpountz commented Aug 26, 2020

Once we have this field, I guess the next question will be how to deal with objects that have a mix of strings and numbers. This makes me wonder whether we should try to fold this functionality into the existing flattened field, or start thinking about whether we could have a sort of wrapper that could redirect fields to either flattened or its numeric variant at both index and search time, e.g. something like that:

{
  "foo": {
    "type": "flattened",
    "numeric_field_pattern": [ "*.count" ]
  }
}

so that an object like

{
  "foo": {
    "tags": [ "x", "y" ],
    "count": 42
  },
  "bar": {
    "tags": [ "x" ],
    "count": 100
  }
}

would have its foo.tags/bar.tags fields indexed and searched with flattened while the foo.count/bar.count fields would be indexed and searched with the numeric variant.

@jtibshirani
Copy link
Contributor

jtibshirani commented Oct 28, 2020

@polyfractal brought up the good point that in some telemetry use cases, all values represent counts. This type of data is similar to a histogram, but with labeled buckets. For example, we could be tracking the usage of every aggregation:

{
  "agg_usage": {
    "terms": 101,
    "date_histogram": 2450,
    ...
  }
}

It would be natural to perform a histogram-like aggregation on agg_usage to sum up the counts for each entry terms, date_histogram, etc. When designing the feature, it'd be good to keep this case in mind -- for example, it could affect whether we want to distinguish long counts vs. arbitrary numerics.

@hendrikmuhs
Copy link
Contributor

it could affect whether we want to distinguish long counts vs. arbitrary numerics

I similar fashion this feature might be useful for ML use cases. It seems to me that being able to specify the sub-type (long, float, double, ...) would be good. For ML these vectors can become huge, but on the other side don't require necessarily a double. Being able to define the sub-type (e.g. float) would be a way to choose between precision and space.

@axw
Copy link
Member

axw commented Feb 3, 2021

Does this issue cover support for histogram and aggregate_metric_double fields? For the APM/Metrics use-case of #63530, we will need to store basic numbers, histograms, and at some point probably aggregate metrics.

@egalpin
Copy link

egalpin commented May 6, 2021

+1, following. This feature will unblock the ability to remove nested fields in a use case I have 😁

@baybatu
Copy link

baybatu commented May 31, 2021

+1, following. I need to have numeric(float) flattened fields to use on thousands of unique field names with field_value_factor functions.. Currently, I had to increase default mapping count but it's bad practice as doc said.

@patodevilla
Copy link

+1, following!

@yshyshkin
Copy link

yshyshkin commented Jul 30, 2021

+1. It would really help in storing lots of financial information without a mapping explosion.

@Fgerthoffert
Copy link

+1

1 similar comment
@koenbouwmans
Copy link

+1

@vchhabra
Copy link

vchhabra commented Aug 1, 2023

While this is being worked upon, I am able to way around numeric range query on flattened type leveraging runtime fields at query time ('query time' - as in my case the numeric field names are not known in advance).

Example:

Index Mappings

{
  "flattened_test": {
    "mappings": {
      "properties": {
        "host": {
          "type": "flattened"
        }
      }
    }
  }
}

Sample documents

"host": {
  "hostname": "bionic_1",
  "name": "bionic_1",
  "num_one": 1323
}
---
"host": {
  "hostname": "bionic_2",
  "name": "bionic_2",
  "num_one": 2323
}
---
"host": {
  "hostname": "bionic_3",
  "name": "bionic_3",
  "num_one": 3323
}

Sample Range Query

GET flattened_test/_search
{
  "runtime_mappings": {
    "doc['host.num_one']": {
      "type": "long"
    }
  },
  
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "host.num_one": {
              "gte": 4000,
              "lte": 7000
            }
          }
        }
      ]
    }
  }
}

This serves me well for the use-case at hand. And I understand the performance implications of query time runtime fields and the trade-off is acceptable in my case.

However being new to ES, wanted to validate here - if I am over-looking anything obvious or any other feedback?

Thanks,

@felixbarny
Copy link
Member

felixbarny commented Aug 17, 2023

To follow up and update on the use case in Elastic APM (#61550 (comment)):

We're not planning to use flattened. Instead, we'll use subobjects: false at the root of the metric mappings. This will allow ingesting metrics such as connections and connections.idle in the same index, without causing a mapping conflict. Currently, this requires all incoming documents to be flat but the ES team is working on also supporting nested object notations in documents where subobjects are disabled in the mapping: #97972. This makes adding the subobjects: false flag backwards compatible.

I'm sure there are other valid use cases for numeric flattened fields, though, such as avoiding field explosions.

Having said that, we're also working on a new way of dealing with field explosions by ignoring fields that exceed the limit instead of rejecting documents: #96235

@leehaotan
Copy link

+1 need numeric fields in flattened types to be fully supported for range queries

@javanna javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests