-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.6b23] tests fails on startup: MongoError: not master and slaveOk=false #9026
Comments
I haven't seen this problem so far in Meteor 1.6-beta.* so a reproduction of sorts would be most helpful as I'm not sure how else to debug your problem. Do you have your own I might suggest running Meteor from a checkout and adding some debug statements in the |
so frustrating, it doesn't seem to fail from a checkout, only from a release |
no custom |
Problem is still in b24. I managed to capture the Mongo logs and everything seems to be the same except that in the failure situation it seems to open 2 connections instead of 7 on success. Update: number of connections seems to vary between runs, so that's not it Here is the MongoD output. https://gist.github.com/hexsprite/e563eed0898bc60b6969e774fc0b4e33 |
Doing some further debugging, I can see that Meteor successfully starts Mongo. It initially starts as a SECONDARY and then after waiting for some time it achieves the PRIMARY status. However, when the Meteor app is connecting the response from Mongo server indicates that it is NOT a primary. If I run the tests without --once then when the app restarts it works fine. The gist shows some of the debugging output: https://gist.github.com/hexsprite/e563eed0898bc60b6969e774fc0b4e33#file-more-debug-logs |
Also tested with Mongo versions back to 3.2.12 and 3.2.16 (newer) as well. Same issue. newer npm mongo 2.2.31 doesn't help. Nor does rolling back to 2.2.24 What does help is adding: |
I believe a related traceback I was getting from my CI server
|
I don't have a solution, but just so you know you're not alone: An internal team at MDG has reported experiencing this as well:
|
After spending some time debugging this with @abernix yesterday, we determined a few things: |
Oh, good to know! This has been driving me crazy, especially on the CI server side especially. I updated the traceback I posted above with of the preceding lines of output as well since it seemed relevant: #9026 (comment) |
@hexsprite Can you try downloading https://fastdl.mongodb.org/osx/mongodb-osx-x86_64-3.2.16.tgz and putting the The exact location of the You can confirm that you've successfully replaced the correct files by running $ meteor mongo
MongoDB shell version: 3.2.16
connecting to: 127.0.0.1:3001/meteor
meteor:PRIMARY> db.version()
3.2.16 Both the |
I'm having the same issue and upgrading to |
@alexpriceonline What about downgrading the mongo to 3.2.12 (which is what it was before)? So, assuming you're on macOS (which the above instructions also assumed) same instructions as before, except with https://fastdl.mongodb.org/osx/mongodb-osx-x86_64-3.2.12.tgz. |
@abernix Thanks, but no dice. Same error using |
@alexpriceonline Is this happening to you on a new project as well? |
@abernix This project is only a few days old |
I experienced this, even with my own suggestions (above) applied. I'm working to reliably reproduce it and find the cause. |
@abernix Great. Let me know if there is anything I can help with. |
Ok, I'm able to reproduce this, outside of tests even, with Meteor versions back to at least 1.4.4.3. It happens frequently, in development (i.e. when running our own local Mongo), after the database database configuration has been cleared (that is to say It makes perfect sense that it would happen under The race condition isn't too hard to see. Essentially, when a fresh replica set is being created on the new Mongo configuration, the meteor/tools/runners/run-mongo.js Lines 515 to 520 in b2df97b
...is satisfied prior to the replica set actually being elected as the new primary. I believe this is due to an idle interval in the MongoDB replica set "heartbeat", which isn't fully reflected until the next heartbeat. This is indicated by this (heavily edited) log. Mongo logs start with => Running Meteor from a checkout
[[[[[ ~/Dev/Misc/Meteor/tests/mongo-miss ]]]]]
=> Started proxy.
... trying to get mongo started ...
MONGO: I CONTROL [initandlisten] MongoDB starting : pid=89940 port=3001 dbpath=.meteor/local/db 64-bit
MONGO: I NETWORK [initandlisten] waiting for connections on port 3001
readyToTalkPromise has been resolved (ALREADY!)
** replicaset is NOT READY YET. initiating replicaset **
MONGO: I REPL [conn1] replSetInitiate config object with 1 members parses ok
MONGO: I REPL [ReplicationExecutor] New replica set config in use: { ... }
MONGO: I REPL [ReplicationExecutor] transition to STARTUP2
MONGO: I REPL [conn1] Starting replication applier threads
MONGO: I COMMAND [conn1] command local.replset.minvalid command: replSetInitiate { ... } protocol:op_query 139ms
MONGO: I REPL [ReplicationExecutor] transition to RECOVERING
MONGO: I REPL [ReplicationExecutor] transition to SECONDARY
MONGO: I REPL [ReplicationExecutor] conducting a dry run election to see if we could be elected
MONGO: I REPL [ReplicationExecutor] dry election run succeeded, running for election
waiting for election...
replicaSet State: SECONDARY
replicaSet State: SECONDARY
replicaSet State: SECONDARY
replicaSet State: SECONDARY
MONGO: I REPL [ReplicationExecutor] election succeeded, assuming primary role in term 1
MONGO: I REPL [ReplicationExecutor] transition to PRIMARY
replicaSet State: PRIMARY
^^^ We think this means things are ready, but they really aren't yet.
=> Started MongoDB.
^^^ The application has began its startup now!
MONGO: I NETWORK [initandlisten] connection accepted from 127.0.0.1:49890 #2 (1 connection now open)
MONGO: I NETWORK [initandlisten] connection accepted from 127.0.0.1:49891 #3 (2 connections now open)
MONGO: I NETWORK [initandlisten] connection accepted from 127.0.0.1:49892 #4 (3 connections now open)
^^^ this is the application connecting and getting ready to do things.
* First write to the database by accounts-base (for Meteor.users)
* Error: MongoError: not master and slaveOk=false
=> Exited with code: 1
MONGO: I REPL [rsSync] transition to primary complete; database writes are now permitted
^^^ This is ACTUALLY when the database is ready for writes though! Since the I'm guessing this is more pronounced now that Node.js 8 actually starts applications faster. Fix incoming! |
> Prologue: A heartbeat is used amongst members of a MongoDB replica set to poll the status of said members. When we are initiating a new replicaset for the test Mongo server, the replicaset is not fully prepared to accept writes until the voting members have negotiated and propagated their decision about who is the "primary" to all members involved. This seems to be delayed by almost _exactly_ the default heartbeat interval, which is 2000ms. The heartbeat interval is marked as an "internal only" property in Mongo so I was hesitant to lower it. It's also a new property in Mongo 3.2 which might explain why this cropped up a while ago. I believe this heartbeat delay is the only explanation for why the `rs.status()` (i.e. `replSetGetStatus`) believes it is ready before the `mongod` has actually printed "transition to primary complete" to the log. Fixes #9026.
Oh, that makes sense. Also I expect that the (maybe single virtual) CPU is quite clogged on CI as well. |
…tion. (#9119) * When awaiting Mongo, wait until the heartbeat has pulsed after election. > Prologue: A heartbeat is used amongst members of a MongoDB replica set to poll the status of said members. When we are initiating a new replicaset for the test Mongo server, the replicaset is not fully prepared to accept writes until the voting members have negotiated and propagated their decision about who is the "primary" to all members involved. This seems to be delayed by almost _exactly_ the default heartbeat interval, which is 2000ms. The heartbeat interval is marked as an "internal only" property in Mongo so I was hesitant to lower it. It's also a new property in Mongo 3.2 which might explain why this cropped up a while ago. I believe this heartbeat delay is the only explanation for why the `rs.status()` (i.e. `replSetGetStatus`) believes it is ready before the `mongod` has actually printed "transition to primary complete" to the log. Fixes #9026. * Replace addl. variable occurrence with the new `firstMemberState` var.
What's the resolution for this? I had to cleanup my |
I had only mongodb-tools installed (Arch), so I installed the mongodb package, and it all did go away. Not sure if this is what actually resolved it as I had other standing errors (node-fibers and node-gyp, among others), but this one disappeared with this. |
still happening for me using Meteor 1.6.0 |
Yeah, I saw it again, today, along with |
So, this was closed two months ago, but the problem is there there! What can we do to help debug this? It suddenly happened to me when I did a |
In an attempt at finding a solution, I cleaned everything from the project
And I noticed this :
I thought that, maybe, something with CoffeeScript was breaking and reverting would work. But, no, it did not. Instead, I had this issue on the client, while running tests : Well, the proposed solution is to upgrade to
Well... good.. I guess?
Well.... crap? 🤷♂️ |
It's still present on
|
Still have this issue with version Happens for me on startup with |
Can confirm this problem still persists on our CircleCI server at version |
This is still a problem in Meteor. Shouldn't this issue be re-opened so that it gets eyes on it? |
This is still happening for us on Circle CI (currently running on 1.6.1.3). Very annoying. It fails randomly. This issue should be reopened. |
I don't know how to really fix this problem, so I just reinstall Meteor, then I can test one time....... |
This is what I do in CI:
|
Same problem for me in Meteor 1.7.0.3 |
Still an issue for us as well on Meteor 1.6.1 |
@hexsprite are you still experiencing this? We tried your |
@abernix any suggestions on how we can help diagnose the issue? Any places to add logging, waits or anything else. This has gotten progressively worse for us from 1.6 --> 1.7 --> 1.8 and all the hacks we've attempted have fallen short 😞 |
@etyp I can't see why it would OOM any more than a single invocation. It's just sequentially running the tests up to 3 times and exiting early if they pass. |
So our solution here, which has worked well so far, is running a separate local mongodb and having Meteor use that. We haven't encountered the issue since, but hopefully we don't have to stick with it forever. |
@etyp can you share the config? I might switch over to use that as well |
@hexsprite sure. See modified version of our config.yml file below:
The key things to call out are:
Alternatively, if you want to just run meteor as-is (like we do elsewhere for e2e tests), just make sure to prefix the
Hope that helps. We haven't experienced this issue at all since making the change. |
Still having this problem on 1.8, February 2019 |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
* trying update based on https://unix.stackexchange.com/questions/508724/failed-to-fetch-jessie-backports-repository * added apt-transport-https based on https://askubuntu.com/questions/165676/how-do-i-fix-a-e-the-method-driver-usr-lib-apt-methods-http-could-not-be-foun * alternative solution to https://unix.stackexchange.com/questions/508724/failed-to-fetch-jessie-backports-repository * trying feedback in meteor/meteor#9026 (comment) to fix mongo not slave problem * changing port to 3001 since that's what circleci uses for mongo docker image... lets see if that works * fixed typo ordering of arguments
* copied some caching techniques from meteor/meteor#9026 * fix the meteor install from breaking * moved order to not break Meteor build
Some important notes:
Getting "MongoError: not master and slaveOk=false" since updating to beta.18. I even tried adding the mongo-dev-server package but still the same. Fine if rollback to 17. Also present in b23.
I've tried this in
devel
branch and can reliably avoid the problem by adding a 2000ms sleep after the "Started MongoDB" message is printed. So it seems like a possible race condition of some sort where Meteor thinks MongoDB is ready but it really isn't.When I have more spare time I'll try to put together a repro but perhaps this may have given you some clues.
The text was updated successfully, but these errors were encountered: