Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Using a snapshot with ML job model snapshots in 7.x but created in 6.8 fails to update mappings to .ml-config #78456

Closed
sabarasaba opened this issue Sep 29, 2021 · 7 comments
Assignees
Labels
>bug :ml Machine learning

Comments

@sabarasaba
Copy link
Member

In a 6.8 installation I created a few ML jobs in order to trigger snapshots so I could test deprecations logs in 7.x. Upon starting ES with that es snapshot, I see it outputs quite a few info lines saying that mapping updates to .ml-config failed:

info [o.e.i.SystemIndexManager] [Ignacios-macbook.local] Put mapping request for [.ml-config] failed
     java.lang.IllegalArgumentException: Rejecting mapping update to [.ml-config] as the final mapping would have more than 1 type: [doc, _doc]
     	at org.elasticsearch.cluster.metadata.MetadataMappingService$PutMappingExecutor.applyRequest(MetadataMappingService.java:115) ~[elasticsearch-7.16.0-SNAPSHOT.jar:7.16.0-SNAPSHOT]

This is the snapshot I used: 6.8-data-snapshot.zip

@sabarasaba sabarasaba added >bug needs:triage Requires assignment of a team area label labels Sep 29, 2021
@droberts195 droberts195 added :ml Machine learning and removed needs:triage Requires assignment of a team area label labels Sep 29, 2021
@droberts195
Copy link
Contributor

These errors are coming from the SystemIndexManager class.

We have a nasty situation with the .ml-config index where if it was created in 6.x then its single mapping type will be doc, and if it was created in 7.x then its single mapping type will be _doc.

In the 7.x version of the ML code that updates index mappings, we account for this by looking at the existing mapping type and preserving that into the update:

// Use the mapping type of the first index in the update
IndexMetadata indexMetadata = state.metadata().index(indicesThatRequireAnUpdate[0]);
String mappingType = indexMetadata.mapping().type();
try {
String mapping = mappingSupplier.apply(mappingType);
PutMappingRequest putMappingRequest = new PutMappingRequest(indicesThatRequireAnUpdate);
putMappingRequest.type(mappingType);

@gwbrown @williamrandolph and @pugnascotia would you mind if I changed SystemIndexManager to do something similar in 7.x, i.e. preserve a 6.x mapping type that's different from the 7.x mapping type? So I'd be making a change around:

PutMappingRequest request = new PutMappingRequest(indexName).type(descriptor.getIndexType())

Or can you think of a better way to fix this.

I am also thinking that the reason we haven't had test failures or end users reporting bugs due to this is probably because it's non-fatal, because the ML code is upgrading index mappings as well as the system indices code. So tests that assert the mappings are correct after upgrade but don't assert that there are no error messages in the logs (which our integration tests don't) will pass. I still need to check this theory but it sounds plausible. @sabarasaba did you experience any failures of ML functionality as a result of these errors, or did you just see the errors in the log?

@sabarasaba
Copy link
Member Author

@droberts195 I haven't tested anything from ML tbh, I was just focusing on getting deprecation logs for UA and just happen to notice these errors in the logs

@droberts195 droberts195 self-assigned this Sep 29, 2021
@droberts195 droberts195 changed the title Using a snapshot with ML job model snapshots in 7.x but created in 6.8 fails to update mappings to .ml-config [ML] Using a snapshot with ML job model snapshots in 7.x but created in 6.8 fails to update mappings to .ml-config Sep 29, 2021
@droberts195
Copy link
Contributor

@jtibshirani sorry for the blast from the past here, but I seem to remember you led the types removal work between 6.x and 7.x. We have a 6.x index with a type that isn't _doc. We have some updated mappings that are suitable for updating those original mappings except they are for type _doc. I am just wondering if there's already code somewhere in the 7.x branch that can apply the mappings to whatever type happens to exist in the index - we know there's only one. Or is there already some code that can take mappings in string form and switch the type. Or any other sneakily simple way you can think of to deal with the problem in this issue? If you can't then I'll create something. But I thought I'd ask before spending time reinventing the wheel.

@droberts195
Copy link
Contributor

I am also thinking that the reason we haven't had test failures or end users reporting bugs due to this is probably because it's non-fatal, because the ML code is upgrading index mappings as well as the system indices code. So tests that assert the mappings are correct after upgrade but don't assert that there are no error messages in the logs (which our integration tests don't) will pass. I still need to check this theory but it sounds plausible.

I just confirmed this, and indeed it is the case.

After upgrading from 6.8 where ML has been used to a version that has system indices functionality, i.e. 7.12 and above, every single cluster state update causes the logs to be spammed with the error that the mappings on the .ml-config index cannot be updated due to differing types. However, once you do something in the newer cluster that causes ML's own mappings update code to run, that runs successfully and then the errors stop.

So it's a log spam problem rather than a problem that breaks the externally visible functionality. The users worst affected by this will be users who tried out ML in 6.x, stopped using it altogether, then upgraded to 7.12+, because they won't use ML in the upgraded cluster and that's what stops the log spam. Users who use ML immediately after upgrading to 7.12+ may not even realise anything was wrong.

Since we haven't had any user-reported bugs for this problem I am thinking the safest way to fix it is to change the system indices upgrade code to report that the index needs upgrading if its type is different to what the index descriptor says. The way the code is currently structured, this means a mappings update won't be attempted by the system index code - this isn't a regression on what happens today because today that update fails. I'll open a PR for this.

droberts195 added a commit to droberts195/elasticsearch that referenced this issue Oct 4, 2021
The system indices code that updates index mappings fails if
a system index was originally created in 6.x and has a type
that's not "_doc".

This change treats such indices as requiring upgrade rather
than requiring a mappings update. The mappings update was
failing anyway, so this doesn't really change functionality,
it just removes log spam that was occurring on every cluster
state update until the index mappings were corrected by some
other code.

It is assumed that every system index that didn't have type
"_doc" in 6.x must have separate mappings update code or full
upgrade code outside of the system indices framework.
Certainly this is true for the ML indices that are affected
by this problem.

Fixes elastic#78456
@jtibshirani
Copy link
Contributor

@droberts195 it sounds like you may already have a different plan, but in case it's helpful: in 7.0 we added a "typeless" mode for all the indices APIs that work with any index, regardless of its old concrete type. So it's surprising this is not already working! Using these typeless APIs from the transport layer is a bit tricky. I'm guessing what's happening is that the mapping contains "_doc" as a top-level key, so it's not being recognized as a "typeless" API call. Is it possible to remove this and see if the error goes away?

@droberts195
Copy link
Contributor

Thanks for the insight @jtibshirani. I had a go at switching ML system index mappings to be typeless in 7.x and some parts of the code worked, but others didn't, for example:

CreateIndexClusterStateUpdateRequest updateRequest =
new CreateIndexClusterStateUpdateRequest(request.cause(), concreteIndexName, request.index())
.ackTimeout(request.timeout())
.masterNodeTimeout(request.masterNodeTimeout());
updateRequest.waitForActiveShards(ActiveShardCount.ALL);
if (mappings != null) {
updateRequest.mappings(Collections.singletonMap(descriptor.getIndexType(), descriptor.getMappings()));
}

I guess this shows that system indices cannot use the "typeless" functionality in 7.x, because they're hooking into cluster state updates below the level where the type switching magic happens.

Maybe it would be possible to change the system indices code to work with typeless mappings, but it seems like quite a big change to attempt just to solve a log spam problem. And of course none of this is relevant in 8.x, so the benefit would be pretty short-lived.

Therefore I think the best thing is to merge my log spam avoidance PR, which is #78622. @gwbrown or @williamrandolph please could one of you review it?

droberts195 added a commit that referenced this issue Oct 11, 2021
The system indices code that updates index mappings fails if
a system index was originally created in 6.x and has a type
that's not "_doc".

This change treats such indices as requiring upgrade rather
than requiring a mappings update. The mappings update was
failing anyway, so this doesn't really change functionality,
it just removes log spam that was occurring on every cluster
state update until the index mappings were corrected by some
other code.

It is assumed that every system index that didn't have type
"_doc" in 6.x must have separate mappings update code or full
upgrade code outside of the system indices framework.
Certainly this is true for the ML indices that are affected
by this problem.

Fixes #78456
@droberts195
Copy link
Contributor

Fixed by #78622

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning
Projects
None yet
Development

No branches or pull requests

3 participants