Mongod sometimes fails to start due to ETXTBSY #872

Simon-Hayden-iteratec · 2024-05-16T06:10:07Z

Versions

NodeJS: ^20 (Github Action uses actions/setup-node@v3 with node-version: 20)
mongodb-memory-server: 9.1.6
mongodb(the binary version): 0
mongodb(the js package): 6.3.0 (from mongoose
mongoose: 8.1.2
system: Linux, most likely Ubuntu, but we are running on a selfhosted runner provided by my client

package: mongo-memory-server

What is the Problem?

We have a project with two integration tests using the mongodb-memory-server with a replica set because we are using Change Streams. When we run the tests on the CI, we sometimes get the this error (full logs below):

2024-05-15T07:08:36.6267264Z 2024-05-15T07:08:27.279Z MongoMS:MongoMemoryServer Mongo[unknown]: _startUpInstance threw a Error:  Error: spawn ETXTBSY
2024-05-15T07:08:36.6269314Z     at ChildProcess.spawn (node:internal/child_process:421:11)
2024-05-15T07:08:36.6270360Z     at spawn (node:child_process:761:9)
2024-05-15T07:08:36.6273299Z     at MongoInstance._launchMongod (/home/runner/work/<redacted>/<redacted>/node_modules/.pnpm/mongodb-memory-server-core@9.1.6/node_modules/mongodb-memory-server-core/src/util/MongoInstance.ts:504:31)
2024-05-15T07:08:36.6277477Z     at MongoInstance.start (/home/runner/work/<redacted>/<redacted>/node_modules/.pnpm/mongodb-memory-server-core@9.1.6/node_modules/mongodb-memory-server-core/src/util/MongoInstance.ts:392:31)
2024-05-15T07:08:36.6281617Z     at Function.create (/home/runner/work/<redacted>/<redacted>/node_modules/.pnpm/mongodb-memory-server-core@9.1.6/node_modules/mongodb-memory-server-core/src/util/MongoInstance.ts:292:5)
2024-05-15T07:08:36.6286043Z     at MongoMemoryServer._startUpInstance (/home/runner/work/<redacted>/<redacted>/node_modules/.pnpm/mongodb-memory-server-core@9.1.6/node_modules/mongodb-memory-server-core/src/MongoMemoryServer.ts:490:22)
2024-05-15T07:08:36.6290386Z     at MongoMemoryServer.start (/home/runner/work/<redacted>/<redacted>/node_modules/.pnpm/mongodb-memory-server-core@9.1.6/node_modules/mongodb-memory-server-core/src/MongoMemoryServer.ts:297:5)
2024-05-15T07:08:36.6293230Z     at async Promise.all (index 0)
2024-05-15T07:08:36.6296114Z     at MongoMemoryReplSet.initAllServers (/home/runner/work/<redacted>/<redacted>/node_modules/.pnpm/mongodb-memory-server-core@9.1.6/node_modules/mongodb-memory-server-core/src/MongoMemoryReplSet.ts:468:5)
2024-05-15T07:08:36.6300501Z     at MongoMemoryReplSet.start (/home/runner/work/<redacted>/<redacted>/node_modules/.pnpm/mongodb-memory-server-core@9.1.6/node_modules/mongodb-memory-server-core/src/MongoMemoryReplSet.ts:380:5)
2024-05-15T07:08:36.6304725Z     at Function.create (/home/runner/work/<redacted>/<redacted>/node_modules/.pnpm/mongodb-memory-server-core@9.1.6/node_modules/mongodb-memory-server-core/src/MongoMemoryReplSet.ts:193:5)
2024-05-15T07:08:36.6308684Z     at Object.<anonymous> (/home/runner/work/<redacted>/<redacted>/libs/backend/db/src/lib/mongodb/services/user-follow-entity/user-follow-entity.service.spec.ts:337:21) {
2024-05-15T07:08:36.6310935Z   errno: -26,
2024-05-15T07:08:36.6311790Z   code: 'ETXTBSY',
2024-05-15T07:08:36.6312428Z   syscall: 'spawn'
2024-05-15T07:08:36.6313120Z }

After some googling, the errno ETXTBSY is supposed to mean "Text file busy". So I assume there is some race condition, which disallows mongod to be called/spawned twice in quick succession. However, I did not find any information to mongod + ETXTBSY.

Because I assume this to be a race condition issue, it is also hard to reproduce. Locally, this error has not yet happened on any dev machine. On the CI, however, I'd say this error happens around 1/3 to 2/3 of runs. My assumption is that the CI cores are just slower and thus give a larger time window for the race condition to happen? Or maybe there is some weird OS issue with the runner, but I couldn't find anything providing an OS fix for this error either.

As a workaround, we are running tests with the --runInBand flag to instruct Jest to run one test after another. This is okay-ish for us, because we are using Nx to manage the monorepo, so we still parallelize based on Nx projects. However, if we have a second project which also needs to run MongoDB integration tests, I'd assume that the error would return.

Code Example

// Wish I could pin it down to a specific code example
// My best bet: Have multiple tests starting an in-memory MongoDB and run them in parallel.

Debug Output

The logs are quite verbose:

https://gist.github.com/Simon-Hayden-iteratec/93a2a822a8a081d987b4b427061adac4

Something I noticed is that it looks like there are multiple downloads of the mongod binary. Maybe the issue is also caused by process A still downloading the binary while process B tries to start mongod?

Do you know why it happenes?

Sadly no

The text was updated successfully, but these errors were encountered:

hasezoey · 2024-05-16T09:33:31Z

Because I assume this to be a race condition issue, it is also hard to reproduce

it is, and my best guess is that is that the problem is the following happens (according to the logs, and my best guesses):

worker 1 tries to start, does not find binary, acquires download lock and downloads
worker 2 tries to start, does not find binary, tries to acquire download lock, but exists and so waits until done
worker 1 starts to extract the binary
worker 3 tries to start, does find the binary, and tries to use that binary (but its incomplete, so ETXTBUSY)
worker 1 finished extracting the binary

this is at least my best guess from the logs, currently the checks are:

if file exists, try to use it
if file does not exist, look for the download lock

i guess i need to add a check to see if the download lock still exists before trying to use the binary

Simon-Hayden-iteratec · 2024-05-16T09:38:16Z

Would be amazing if this is the issue and you could fix it 😄

Though I wonder, why we are the only once reporting on this, though. As this race condition you're describing could have happened for anybody, there should have been at least some report already?

Anyway, thanks a lot for your fast reply :)

hasezoey · 2024-05-16T09:45:00Z

Though I wonder, why we are the only once reporting on this, though. As this race condition you're describing could have happened for anybody, there should have been at least some report already?

i guess the extraction part is quite quick and so is basically "atomic" on any other system (the archive is about ~70mb and the extracted binary around ~130mb to ~180mb).
and mostly the instances are run on the same process instead of separate processes (like globalSetup instead of in each setupFile / beforeAll; thanks to how jest works), because if they are in the same process they would likely wait (dont know the specifics for that at the moment)

and treat it like the binary does not exists fixes #872

hasezoey · 2024-05-16T10:19:29Z

put in a fix in c600609, which will be available as 9.2.1-beta.1, please test it for a while once available and report back (there will be a bot message)

github-actions · 2024-05-16T10:21:49Z

🎉 This issue has been resolved in version 9.2.1-beta.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Simon-Hayden-iteratec · 2024-05-16T10:55:28Z

I'll give it a spin and see if we get any errors

Simon-Hayden-iteratec · 2024-05-16T12:42:58Z

I've ran the pipeline now five times with the beta version and it didn't fail. But I also can't find the log message "binary found, but also a download-lock, trying to resolve lock" anywhere, which most likely means that I didn't trigger the race condition this time.

I'm not sure if something else is currently preventing the issue (e.g. lower load on the runner?) but I will keep an eye on this. If there are more failed runs I will let you know. But maybe in a new ticket, so this one can be closed.

Thanks for your effort and quick fix implementation 🥇

hasezoey · 2024-05-16T13:53:23Z

i will give it a few more days open as there has not been the message you were searching for, at the latest it will be closed once the stale message comes up (when i forgot to close beforehand)

Simon-Hayden-iteratec · 2024-05-22T12:23:11Z

So our test pipeline hasn't been failing the last few days. After looking at the logs I also found at least one, which hat the log line binary found, but also a download-lock, trying to resolve lock in it. To me that sounds like the race condition is fixed 🎉

I'll close the bug report now. Thanks again for your super fast help :)

github-actions · 2024-05-30T14:42:46Z

🎉 This issue has been resolved in version 9.3.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Simon-Hayden-iteratec added the bug label May 16, 2024

hasezoey added a commit that referenced this issue May 16, 2024

fix(DryMongoBinary::locateBinary): check if download-lock still exists

c600609

and treat it like the binary does not exists fixes #872

github-actions bot added the released on @beta label May 16, 2024

Simon-Hayden-iteratec closed this as completed May 22, 2024

github-actions bot added the released Pull Request released | Issue is fixed label May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mongod sometimes fails to start due to ETXTBSY #872

Mongod sometimes fails to start due to ETXTBSY #872

Simon-Hayden-iteratec commented May 16, 2024

hasezoey commented May 16, 2024

Simon-Hayden-iteratec commented May 16, 2024

hasezoey commented May 16, 2024

hasezoey commented May 16, 2024 •

edited

github-actions bot commented May 16, 2024

Simon-Hayden-iteratec commented May 16, 2024

Simon-Hayden-iteratec commented May 16, 2024

hasezoey commented May 16, 2024

Simon-Hayden-iteratec commented May 22, 2024

github-actions bot commented May 30, 2024

Mongod sometimes fails to start due to ETXTBSY #872

Mongod sometimes fails to start due to ETXTBSY #872

Comments

Simon-Hayden-iteratec commented May 16, 2024

Versions

What is the Problem?

Code Example

Debug Output

Do you know why it happenes?

hasezoey commented May 16, 2024

Simon-Hayden-iteratec commented May 16, 2024

hasezoey commented May 16, 2024

hasezoey commented May 16, 2024 • edited

github-actions bot commented May 16, 2024

Simon-Hayden-iteratec commented May 16, 2024

Simon-Hayden-iteratec commented May 16, 2024

hasezoey commented May 16, 2024

Simon-Hayden-iteratec commented May 22, 2024

github-actions bot commented May 30, 2024

hasezoey commented May 16, 2024 •

edited