Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[CI] SourceOnlySnapshotIT testSnapshotAndRestore fails reproducibly #36330
I was able to reproduce this locally. Looks like a node wasn't able to start.
After some investigation I think the issue is the same as #36276. There are stack traces that come before the timeout waiting for green:
Additionally, I think I have isolated what is the underlying issue. Although I do not yet know that is causing it.
What I see is that we start a recovery from snapshot.
This obtains a filesystem lock in the
This lock will be held until
In the test we update the number of replicas.
At this point I see another recovery start. However, this is a
The peer recovery throws an exception when it goes to get the
If I remove one case of
I see the recovery from snapshot that opens a
And the recovery from
The only different that I see is that the filesystem locks are obtained for different temp directories. In the failing case both are for directory 002. And in the passing case one is on directory 002 and the other is on directory 003.
My understanding of
Where we create the new node: