New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Using a snapshot with ML job model snapshots in 7.x but created in 6.8 fails to update mappings to .ml-config #78456
Comments
These errors are coming from the We have a nasty situation with the In the 7.x version of the ML code that updates index mappings, we account for this by looking at the existing mapping type and preserving that into the update: Lines 175 to 182 in 7e2b7bc
@gwbrown @williamrandolph and @pugnascotia would you mind if I changed elasticsearch/server/src/main/java/org/elasticsearch/indices/SystemIndexManager.java Line 185 in 7e2b7bc
Or can you think of a better way to fix this. I am also thinking that the reason we haven't had test failures or end users reporting bugs due to this is probably because it's non-fatal, because the ML code is upgrading index mappings as well as the system indices code. So tests that assert the mappings are correct after upgrade but don't assert that there are no error messages in the logs (which our integration tests don't) will pass. I still need to check this theory but it sounds plausible. @sabarasaba did you experience any failures of ML functionality as a result of these errors, or did you just see the errors in the log? |
@droberts195 I haven't tested anything from ML tbh, I was just focusing on getting deprecation logs for UA and just happen to notice these errors in the logs |
@jtibshirani sorry for the blast from the past here, but I seem to remember you led the types removal work between 6.x and 7.x. We have a 6.x index with a type that isn't |
I just confirmed this, and indeed it is the case. After upgrading from 6.8 where ML has been used to a version that has system indices functionality, i.e. 7.12 and above, every single cluster state update causes the logs to be spammed with the error that the mappings on the So it's a log spam problem rather than a problem that breaks the externally visible functionality. The users worst affected by this will be users who tried out ML in 6.x, stopped using it altogether, then upgraded to 7.12+, because they won't use ML in the upgraded cluster and that's what stops the log spam. Users who use ML immediately after upgrading to 7.12+ may not even realise anything was wrong. Since we haven't had any user-reported bugs for this problem I am thinking the safest way to fix it is to change the system indices upgrade code to report that the index needs upgrading if its type is different to what the index descriptor says. The way the code is currently structured, this means a mappings update won't be attempted by the system index code - this isn't a regression on what happens today because today that update fails. I'll open a PR for this. |
The system indices code that updates index mappings fails if a system index was originally created in 6.x and has a type that's not "_doc". This change treats such indices as requiring upgrade rather than requiring a mappings update. The mappings update was failing anyway, so this doesn't really change functionality, it just removes log spam that was occurring on every cluster state update until the index mappings were corrected by some other code. It is assumed that every system index that didn't have type "_doc" in 6.x must have separate mappings update code or full upgrade code outside of the system indices framework. Certainly this is true for the ML indices that are affected by this problem. Fixes elastic#78456
@droberts195 it sounds like you may already have a different plan, but in case it's helpful: in 7.0 we added a "typeless" mode for all the indices APIs that work with any index, regardless of its old concrete type. So it's surprising this is not already working! Using these typeless APIs from the transport layer is a bit tricky. I'm guessing what's happening is that the mapping contains "_doc" as a top-level key, so it's not being recognized as a "typeless" API call. Is it possible to remove this and see if the error goes away? |
Thanks for the insight @jtibshirani. I had a go at switching ML system index mappings to be typeless in 7.x and some parts of the code worked, but others didn't, for example: Lines 197 to 206 in 83355d3
I guess this shows that system indices cannot use the "typeless" functionality in 7.x, because they're hooking into cluster state updates below the level where the type switching magic happens. Maybe it would be possible to change the system indices code to work with typeless mappings, but it seems like quite a big change to attempt just to solve a log spam problem. And of course none of this is relevant in 8.x, so the benefit would be pretty short-lived. Therefore I think the best thing is to merge my log spam avoidance PR, which is #78622. @gwbrown or @williamrandolph please could one of you review it? |
The system indices code that updates index mappings fails if a system index was originally created in 6.x and has a type that's not "_doc". This change treats such indices as requiring upgrade rather than requiring a mappings update. The mappings update was failing anyway, so this doesn't really change functionality, it just removes log spam that was occurring on every cluster state update until the index mappings were corrected by some other code. It is assumed that every system index that didn't have type "_doc" in 6.x must have separate mappings update code or full upgrade code outside of the system indices framework. Certainly this is true for the ML indices that are affected by this problem. Fixes #78456
Fixed by #78622 |
In a 6.8 installation I created a few ML jobs in order to trigger snapshots so I could test deprecations logs in 7.x. Upon starting ES with that es snapshot, I see it outputs quite a few info lines saying that mapping updates to
.ml-config
failed:This is the snapshot I used: 6.8-data-snapshot.zip
The text was updated successfully, but these errors were encountered: