# Mapping Explosion

## Mapping
ElasticSearch uses a **dynamic schema**: it will infer it from the documents that are inserted in the index if you do not provide it in advance.

In ElasticSearch defining the schema is referred as **mapping**: Mapping is the process of defining how a document and its fields are stored and indexed in ElasticSearch.

We can use **dynamic mapping** or **explicit mapping** to define how our data will be represented.

In general the recommendation for production use-cases is to use explicit mapping.

## Preventing mapping explosion
There is as situation that we should avoid that is called **mapping explosion**, this happens when the number of fields in an index grows to a large amount causing out of memory errors and other problems.

This can happen for example when using dynamic mapping and every new document introduces new files, this causes elasticsearch to create a new mapping for each new field.

For example imagine a document like the following generated by filebeat  from the logs of the server (more on that later):
```json
{
  "@timestamp" : "2022-10-11T23:03:18.494Z",
  "input" : {
    "type" : "log"
  },
  "host" : {
    "name" : "c27-32",
    "mac" : [
      "bc:97:e1:11:22:33",
      "bc:97:e1:11:22:33"         
    ],
    "hostname" : "c27-32",
    "architecture" : "x86_64",
    "os" : {
      "type" : "linux",
      "platform" : "centos",
      "version" : "8",
      "family" : "redhat",
      "name" : "CentOS Stream",
      "kernel" : "4.18.0-348.2.1.el8_5.x86_64"
    },
    "id" : "41ea3b672705481fb9112c32c461d10e",
    "containerized" : false,
    "ip" : [
      "fe80::be97:e1ff:fee5:9430",
      "1.2.3.4"
    ]
  },
  "agent" : {
    "version" : "7.12.1",
    "hostname" : "c27-32",
    "ephemeral_id" : "0cd90f91-0c99-4346-a747-0516d3e2613f",
    "id" : "dfdee6c4-68f4-41b5-94a1-b0fe87c72209",
    "name" : "c27-32",
    "type" : "filebeat"
  },
  "ecs" : {
    "version" : "1.8.0"
  },
  "container" : {
    "id" : "neutron-server.log"
  },
  "log" : {
    "offset" : 51873356,
    "file" : {
      "path" : "/var/log/kolla/neutron/neutron-server.log"
    }
  },
  "message" : """2022-10-12 01:03:18.283 23 INFO neutron.wsgi [req-114a9dd4-be1b-49d3-b85a-4c2de4138e0d 1e16a103f9f44d9591cd4b4046050d36 77619eace356487892f47ff36c3e45c4 - default default] 10.108.27.24,10.108.27.35 "GET /v2.0/networks/4381458d-1edc-4ad5-bb46-b54272123f58?fields=segments HTTP/1.1" status: 200  len: 188 time: 0.0732589""",
  "tags" : [
    "cloud",
    "openstack-prod"
  ]
}
```

Now imagine that a new document arrives including additional fields, for example extending the information about the agent running in the server, for example indicating the `uptime` of the agent and the `memory` used:
```json
...
  "agent" : {
    "version" : "7.12.1",
    "hostname" : "c27-32",
    "ephemeral_id" : "0cd90f91-0c99-4346-a747-0516d3e2613f",
    "id" : "dfdee6c4-68f4-41b5-94a1-b0fe87c72209",
    "name" : "c27-32",
    "type" : "filebeat",
    "uptime": 3600,
    "memory": 100
  },
...
```

This way the number of fields will keep growing.

To prevent this we can define the `agent` field as a **flattened data type**:
```json
  "properties": {
    "agent": {
      "type": "flattened"
    }
  }
```

NOTE: The flattened data type is not available in opensearch or the oss elasticsearch, you need to get a version with at least basic license.

- [Flattened field type](https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html)

In [2]:
%%bash
# First we will delete the testing index if it exists
curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X DELETE \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/mapping-explosion"

# And now we insert the data using automatic index creationg and dynamic mapping
curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X POST -H "Content-Type: application/json" \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/mapping-explosion/_doc/1" -d '
{
  "agent" : {
    "version" : "7.12.1",
    "hostname" : "c27-32",    
    "name" : "c27-32",
    "type" : "filebeat"
  }
}'

{"acknowledged":true}{"_index":"mapping-explosion","_type":"_doc","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

In [3]:
%%bash

curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X GET \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/mapping-explosion?pretty"

{
  "mapping-explosion" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "agent" : {
          "properties" : {
            "hostname" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "type" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "version" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keywo

In [4]:
%%bash

# Now a new document with additional fields
curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X POST -H "Content-Type: application/json" \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/mapping-explosion/_doc/2" -d '
{
  "agent" : {
    "version" : "7.12.1",
    "hostname" : "c27-32",
    "name" : "c27-32",
    "type" : "filebeat",
    "uptime": 3600,
    "memory": 100
  }
}'

{"_index":"mapping-explosion","_type":"_doc","_id":"2","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":1,"_primary_term":1}

In [5]:
%%bash

curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X GET \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/mapping-explosion?pretty"

{
  "mapping-explosion" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "agent" : {
          "properties" : {
            "hostname" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "memory" : {
              "type" : "long"
            },
            "name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "type" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "uptime" : {
              "type" : "long"
            },
       

To avoid this issue we could update the `agent` field mapping and define it as type `flattened`:

In [6]:
%%bash

curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X PUT -H "Content-Type: application/json" \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/mapping-explosion/_mapping" -d '
{
    "properties": {
        "agent": {"type": "flattened"}
    }
}'

{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"No handler for type [flattened] declared on field [agent]"}],"type":"mapper_parsing_exception","reason":"No handler for type [flattened] declared on field [agent]"},"status":400}

## Final note

Unfortunately right now opensearch does not yet support the `flattened` type.

This is something you should take into account if running in the ElasticSearch commercial version.

Reference:
- [Flattened field type](https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html)
