Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Disable dynamic mappings for all ML indices #61083

Open
droberts195 opened this issue Aug 13, 2020 · 6 comments
Open

[ML] Disable dynamic mappings for all ML indices #61083

droberts195 opened this issue Aug 13, 2020 · 6 comments
Labels
:ml Machine learning

Comments

@droberts195
Copy link
Contributor

Dynamic mappings cause untold problems for ML. If new mappings aren't added to an ML index as we expect after upgrading then dynamic mappings are created for the new fields. These usually have the wrong types and then cause problems either for searching the index or when we eventually try to put the correct mappings on the index.

In the case where our desired mappings do not get applied for some reason it would be much better if the new fields had no mappings at all than the wrong mappings. At least then it is possible to added the correct mappings at a later time without running into a conflict.

The difficult part is how we get from where we are today to the desired state. Every ML user today has ML indices that have dynamic mappings enabled. We need to continue to work with these indices and safely upgrade them while also gradually moving over to the future world where ML indices do not create dynamic mappings.

Here is a tentative plan:

  1. In 7.10 change the templates used to create new ML indices so that they disallow dynamic mappings.
  2. Request that when the migration assistant reindexes an ML index prior to upgrading from 7.x to 8.x instead of copying the mappings for the new index from the old one it uses the latest template to create the new index. ML indices created in 7.x will not need reindexing to move to 8.x, but this change would mean that by the time we're into 9.x no ML indices would permit dynamic mappings.

The biggest risk here is that a few releases after disabling dynamic mappings we forget we ever had them and get lazy about ensuring the latest upgrade code works in situations where the ML indices have dynamic mappings.

@droberts195 droberts195 added the :ml Machine learning label Aug 13, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@droberts195
Copy link
Contributor Author

Problems that would have been easier to work around and recover from if we didn't have dynamic mappings are #37607, #61157 and elastic/kibana#74935.

@davidkyle
Copy link
Member

elastic/kibana#74935 is caused by a broadly applied dynamic template defined in the .ml-config mappings.

The template matches every unknown field regardless of inferred type and maps it as a string. Dynamic mappings would have correctly deduced the type and mapped the new field as a boolean if it had not been for this dynamic template.

@davidkyle
Copy link
Member

Depending on the inferred type dynamic mappings can cause these specific errors:

1. Strings are mapped as multi-fields with keyword and String parts creating the mappings foo.keyword and foo.string for the field foo

This behaviour can be overridden with a dynamic template to map strings as keyword which is usually the intended mapping.

    "dynamic_templates" : [
      {
        "strings_as_keywords" : {
          "match" : "string",
          "mapping" : {
            "type" : "keyword"
          }
        }
      }
    ],

2. Boolean fields will only be correctly mapped if the value is true or false

0 and 1 will be mapped as integers not a boolean and the strings "true" & "false" are strings.

3. Dates in epoch seconds or milli-seconds format will not be recognised as Dates

The native processes write dates as epoch values, this is a problem if the mappings are missing as the epoch will be dynamically mapped as a long.

Dynamic mappings do not consider anything that is not a string to be a candidate date and the default set of date formats that are checked do not include epoch_millis. Additional formats used to identify a date field dynamically can be added with the dynamic_date_formats setting.

Dynamic mappings break down because of the use of epoch values which will not mapped as Dates. The other problems are easy to work around.

@droberts195
Copy link
Contributor Author

The biggest risk here is that a few releases after disabling dynamic mappings we forget we ever had them and get lazy about ensuring the latest upgrade code works in situations where the ML indices have dynamic mappings.

We discussed this aspect in the team meeting.

One option would be to do something along the lines of what's been done for the transforms internal index, where we stop writing to existing indices with dynamic mappings and instead write to replacement indices that don't have dynamic mappings. The downside is that when we search we need to search both and then deduplicate, favouring the newer index if a particular document exists in both.

Another option is to stick with the approach we are using today, of adding mappings to existing indices after upgrade. But to keep us honest we could have an upgrade test that deliberately creates indices with 7.0 mappings before our internal code has a chance to create indices. This will prove that our code can cope with upgrading mappings on a 7.0 index created with dynamic mappings enabled. We would need to keep this test until 8.last, and could remove it in 9.0 (when we will be sure we're fully into a no dynamic mappings on ML internal indices environment).

@droberts195
Copy link
Contributor Author

The biggest risk here is that a few releases after disabling dynamic mappings we forget we ever had them and get lazy about ensuring the latest upgrade code works in situations where the ML indices have dynamic mappings.

#68044 revealed something which solves this problem very nicely for the indices that are becoming system indices: dynamic updates are banned for system indices.

This means that from 7.12 onwards, the two system indices whose mappings allow dynamic updates, .ml-config and .ml-meta, we can no longer rely on these dynamic updates. But we don't have to worry about how to remove the section of the mappings that says dynamic updates are allowed, as it's completely separate code that bans the dynamic updates on system indices. We can keep our existing mappings yet still benefit from disallowing dynamic updates. For these two indices, as soon as someone adds a new field to a config or meta document and fails to update the mappings any test that uses that new field will fail. So as long as we have some test coverage somewhere for every new field we will not ship a release that doesn't contain the required mappings.

The approaches outlined in #61083 (comment) are still relevant for the indices that have dynamic mappings and are hidden indices rather than system indices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning
Projects
None yet
Development

No branches or pull requests

3 participants