Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ip_range field formatting (cidr, range) #89698

Open
warewolf opened this issue Aug 29, 2022 · 4 comments
Open

Support ip_range field formatting (cidr, range) #89698

warewolf opened this issue Aug 29, 2022 · 4 comments
Labels
>enhancement :Search/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team

Comments

@warewolf
Copy link

warewolf commented Aug 29, 2022

Description

Borrowing from the concepts here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html#search-api-fields I would like to be able to format an IP range field in either a range, or CIDR notation using "fields" in a query, to make the returned data consistent, rather than dependent on how it was indexed.

Since "fields" always returns an array of values, and documents can be indexed with a range that doesn't line up with a single CIDR block { "gt": "192.168.1.22", "lte": "192.168.1.37" } -- those edge cases in "cidr" format should be returned as a deaggreated array of CIDR netblocks like:
[ "192.168.1.23/32", "192.168.1.24/29", "192.168.1.32/30", "192.168.1.36/31" ]. In "range" format it should return an inclusive list as if the document was indexed with "gte" and "lte" -- e.g. [ "192.168.1.23-192.168.1.37" ]

Example mapping:

PUT /networks
{
  "mappings": {
    "properties": {
      "network": {
        "type": "ip_range"
      }
    }
  }
}

Example documents:

PUT /networks/_doc/1
{
  "network": "192.168.1.0/24"
}

PUT /networks/_doc/2
{
  "network": {
    "gte": "192.168.0.0",
    "lte": "192.168.0.255"
  }
}

Example query, showing results today:

GET /networks/_search
{
  "query": {
    "match_all": {}
  }
}

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "networks",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "network" : "192.168.1.0/24"
        }
      },
      {
        "_index" : "networks",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "network" : {
            "gte" : "192.168.0.0",
            "lte" : "192.168.0.255"
          }
        }
      }
    ]
  }
}

Example query, showing desired cidr format functionality:

GET /networks/_search
{
  "query": {
    "match_all": {}
  },
  "fields": [
    {
      "field": "network",
      "format": "cidr"
    }
  ]
}

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "networks",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "network" : "192.168.1.0/24"
        },
        "fields" : {
          "network": [
            "192.168.1.0/24"
          ]
        }
      },
      {
        "_index" : "networks",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "network" : {
            "gte" : "192.168.0.0",
            "lte" : "192.168.0.255"
          }
        },
        "fields" : {
          "network": [
            "192.168.0.0/24"
          ]
        }
      }
    ]
  }
}

Example query, showing desired range format functionality:

GET /networks/_search
{
  "query": {
    "match_all": {}
  },
  "fields": [
    {
      "field": "network",
      "format": "range"
    }
  ]
}

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "networks",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "network" : "192.168.1.0/24"
        },
        "fields" : {
          "network": [
            "192.168.1.0-192.168.1.255"
          ]
        }
      },
      {
        "_index" : "networks",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "network" : {
            "gte" : "192.168.0.0",
            "lte" : "192.168.0.255"
          }
        },
        "fields" : {
          "network": [
            "192.168.0.0-192.168.0.255"
          ]
        }
      }
    ]
  }
}
@warewolf warewolf added >enhancement needs:triage Requires assignment of a team area label labels Aug 29, 2022
@DJRickyB DJRickyB added :Search/Mapping Index mappings, including merging and defining field types and removed needs:triage Requires assignment of a team area label labels Aug 31, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Aug 31, 2022
@benwtrent
Copy link
Member

benwtrent commented Feb 15, 2024

I like this idea, but it seems to me like it would be even better if it was used to aggregate information. One of the downsides is that we may be losing fidelity a bit. Even though a cidr could encompass 64 IPs, the actual count that was indexed or we care about is only 57.

So, in a multi-bucket ranges count, we would overcount various ranges. We have this metadata field called _doc_count to alleviate this that could be used in conjunction with this new field type

This may be helpful in the o11y space as a whole.

@felixbarny what do you think?

@warewolf
Copy link
Author

@benwtrent the point I was making here is the consistency of returned data is desirable. It's the same data, just represented seemingly randomly to the end user/consumer, because it's oddly dependent on how the data was indexed, even if the ranges are exactly the same.

If I index 192.168.0.0/24 it comes back as a CIDR block 192.168.0.0/24.
If I index 192.168.0.0-192.168.0.255 it comes back as a range.

These are exactly the same value just expressed with a different notation.

@benwtrent
Copy link
Member

Thank you for the clarification @warewolf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

6 participants