Our masked lm saved model testing appears to be occasionally flaking in the tf format. See https://github.com/keras-team/keras-nlp/pull/856/checks?check_run_id=12517800555 for example. We should investigate the root cause here. This could be potentially related to the tf 2.12 release.