Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Transition to typeless (mapping) APIs #39256

Merged
merged 16 commits into from
Feb 28, 2019

Conversation

davidkyle
Copy link
Member

@davidkyle davidkyle commented Feb 21, 2019

Removes deprecated usage of types in the APIs and resolves a reindex issue with the upgrade assistant in 7 to wit:

  • ml indices use doc as the single mapping type by convention
  • 7.0 has an upgrade assistant which advises to reindex the ml indices from the previous version
  • after reindex the mapping type of the destination index is _doc unless a target type is specified
  • ml may update the mapping of the results index when a new job is created
  • if ml continues to use the doc type after reindex a mapping conflict occurs as the index would have 2 types doc and _doc. This is the cause of [ML] Need to cope with .ml-config being an alias #38796

Depending on whether or not the upgrade assistant has been run the single mapping type of the ml indices may be doc or _doc, moving to the typeless APIs largely covers this situation however, there is the case of updating the results index mapping on job creation in which case the new mapping must have the same type as the old otherwise the index will have 2 types of mappings

WIP in progress as testing is required, depends on #39243

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@droberts195
Copy link
Contributor

droberts195 commented Feb 21, 2019

These changes are definitely great for 8.0 but for 7.x some edge cases that need testing are:

  1. Pure 7.x cluster with indices created in 6.x and not reindexed
  2. Mixed 7.x/6.7 cluster with indices created in 6.x and not reindexed
  3. Pure 7.x cluster with indices originally created in 5.x and reindexed to 6.x format via the 6.7 upgrade assistant
  4. Mixed 7.x/6.7 cluster with indices originally created in 5.x and reindexed to 6.x format via the 6.7 upgrade assistant

For cases 2. and 4. it could possibly make a difference if the shards of the 6.x indices are on a 6.7 or 7.x node and also whether the master node is a 6.7 or 7.x node, so actually there are 10 possibly different scenarios in the list above.

@davidkyle
Copy link
Member Author

please run elasticsearch-ci/1

@davidkyle
Copy link
Member Author

please run elasticsearch-ci/2

@davidkyle
Copy link
Member Author

After removing the mapping type from the ml templates I hit a problem that caused the TemplateUpgradeService to continually update the ml templates as this test was failing. It appears if no type is specified in a template mapping it is rewritten with the mapping type _doc, this then caused the equality test above to fail and the template would be re-applied.

I reverted the template mappings back to using a type to fix this using the _doc type in 5cceabe

@droberts195
Copy link
Contributor

I reverted the template mappings back to using a type to fix this using the _doc type

This sounds fine for master but for 7.x I think it will run into a problem in the case where we update the mappings of an existing index that was created in 6.x. In this case we need to use whatever type the index was created with. We don't care what that type is, we just need to make sure our updated mapping specifies it, otherwise the update will fail with a "more than one type" error.

It makes me think that to solve this problem in 7.x will require a change to non-ML code - either TemplateUpgradeService or something in core ES.

I wonder if what is needed is an API where we can say, "update the mappings for whatever single type this index is already using".

/cc @jtibshirani @markharwood

@davidkyle
Copy link
Member Author

run elasticsearch-ci/packaging-sample

@davidkyle
Copy link
Member Author

run elasticsearch-ci/2

@droberts195
Copy link
Contributor

Once #39469 is fixed the BWC problems should be resolved and these changes should be good to merge.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidkyle
Copy link
Member Author

run elasticsearch-ci/2

@davidkyle davidkyle merged commit c075107 into elastic:master Feb 28, 2019
@davidkyle davidkyle deleted the remove-type branch February 28, 2019 13:36
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Mar 1, 2019
Following elastic#39256 some of these tests were generating
errors when indexing documents.  For consistency it's
best that all of them use typeless APIs.
droberts195 added a commit that referenced this pull request Mar 1, 2019
Following #39256 some of these tests were generating
errors when indexing documents.  For consistency it's
best that all of them use typeless APIs.
davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Mar 2, 2019
ML has historically used doc as the single mapping type but reindex in 7.x
will change the mapping to _doc. Switching to the typeless APIs handles the
case where the mapping type is either doc or _doc. This change removes 
deprecated typed usages.
davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Mar 2, 2019
ML has historically used doc as the single mapping type but reindex in 7.x
will change the mapping to _doc. Switching to the typeless APIs handles the
case where the mapping type is either doc or _doc. This change removes 
deprecated typed usages.
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Mar 4, 2019
…9574)

Following elastic#39256 some of these tests were generating
errors when indexing documents.  For consistency it's
best that all of them use typeless APIs.
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Mar 4, 2019
…9574)

Following elastic#39256 some of these tests were generating
errors when indexing documents.  For consistency it's
best that all of them use typeless APIs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants