DBReadError(MapLoadError(CorruptChunk(Corrupt("missing key")))) #351

phritz · 2021-05-19T18:05:26Z

https://rocicorp.slack.com/archives/C01JJGGS6CU/p1621426062298400

UnhandledRejection
Non-Error promise rejection captured with value: DBReadError(MapLoadError(CorruptChunk(Corrupt(“missing key”))))
Pull returned: PullFailed(FetchFailed(RequestTimeout(TimeoutError { _private: () })))
logger: console
arguments: [“Pull returned: PullFailed(FetchFailed(RequestTimeout(TimeoutError { _private: () })))“]

The text was updated successfully, but these errors were encountered:

phritz · 2021-05-19T18:06:58Z

phritz · 2021-05-19T19:05:04Z

fyi 11 occurrences over 8 users

phritz · 2021-05-19T21:02:15Z

In debugging this I discovered a separate annoyance: #354

phritz · 2021-05-20T22:46:35Z

The line throwing the error is here:

repc/src/prolly/leaf.rs

Line 42 in 0319a68

return Err(LoadError::Corrupt("missing key"));

. The key in the leafentry proto is None. This is happening when we go do an opentransaction and read the main head, the main head chunk is corrupt in this way. However here's where we create the proto and it does not look possible for it to write None:

repc/src/prolly/leaf.rs

Line 59 in 0319a68

key: Some(builder.create_vector(e.key)),

. I can't find anywhere else where we construct this proto (other than tests). I also don't see how there could be a replicache-level bug in how we read the proto which is here:

repc/src/prolly/leaf.rs

Line 27 in 0319a68

let root = leaf::get_root_as_leaf(chunk.data());

. We're just iterating the entries in the proto, there's literally nothing else going on.

I don't see a pattern with what happens in the logs just before it hits this error, other than pushes and pulls completing just before. The 18 occurrences of the error were not limited to one user, they were spread across 14 users.

I'm wondering if it really is the chunk's bytes being corrupted somehow. But that's a bit of a stretch: the data have to be corrupted in such a way that it still parses correctly as a proto. There are no other map load or corrupt chunk errors other than this one. If it were being corrupted with random data I would expect at least some of the time for it not to parse at all. But we don't see that. Perhaps the data is being partially written? Or partially overwritten?

Something that I did notice is that 18 out of 18 occurrences of this error are on Chrome Mobile 91.0.4472, which I think is a newish version. (They are 89% Chrome Mobile 91.0.4472 and Chrome Mobile WebView 91.0.4472). @arv @aboodman is there a clue in that maybe? Seems a pretty clear indicator of... something.

As for what to do next I'm open to suggestions but thinking:

Improve the logging/error so that we get the chunk hash and bytes when this happens, and then get it into users hands if we can.
Go through the flatbuffers bug reports and see if anything jumps out.
Carefully read the memstore and prolly map code to see if there's anything that jumps out. For example I can imagine if a map entry gets aliased and is accessed without synchronization then we could read a partially written value. (But rust should make this hard, so....).

phritz · 2021-05-20T22:54:28Z

Suggestion from aaron which i think is good: try to craft the minimal byte array that yields this error.

phritz added this to the 2: Repc ready for launch 🤾🏻‍♂️🏹🚀↗️ milestone May 19, 2021

This was referenced May 19, 2021

Return js_sys::Error in Wasm #352

Open

provide custom debug output for our error types #354

Open

phritz self-assigned this May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DBReadError(MapLoadError(CorruptChunk(Corrupt("missing key")))) #351

DBReadError(MapLoadError(CorruptChunk(Corrupt("missing key")))) #351

phritz commented May 19, 2021

phritz commented May 19, 2021

phritz commented May 19, 2021

phritz commented May 19, 2021

phritz commented May 20, 2021 •

edited

Loading

phritz commented May 20, 2021

DBReadError(MapLoadError(CorruptChunk(Corrupt("missing key")))) #351

DBReadError(MapLoadError(CorruptChunk(Corrupt("missing key")))) #351

Comments

phritz commented May 19, 2021

phritz commented May 19, 2021

phritz commented May 19, 2021

phritz commented May 19, 2021

phritz commented May 20, 2021 • edited Loading

phritz commented May 20, 2021

phritz commented May 20, 2021 •

edited

Loading