Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INFO Exception in queued task: Error: No replica set primary available for query with ReadPreference PRIMARY #1060

Closed
stephentcannon opened this issue May 20, 2013 · 49 comments

Comments

@stephentcannon
Copy link

@stephentcannon stephentcannon commented May 20, 2013

Seeing this a lot on apps that have been migrated to production-db-a3.meteor.io:27017/atomic_etherpos_com

[Mon May 20 2013 16:46:30 GMT+0000 (UTC)] INFO Exception in queued task: Error: No replica set primary available for query with ReadPreference PRIMARY
at Object.Future.wait (/meteor/dev_bundles/0.3.0/lib/node_modules/fibers/future.js:319:16)
at _.extend._nextObject (app/packages/mongo-livedata/mongo_driver.js:466:47)
at _.extend.forEach (app/packages/mongo-livedata/mongo_driver.js:487:22)
at _.extend.getRawObjects (app/packages/mongo-livedata/mongo_driver.js:531:12)
at _.extend.pollMongo (app/packages/mongo-livedata/mongo_driver.js:802:46)
at Object.
.extend._unthrottledEnsurePollIsScheduled as task
at _.extend._run (app/packages/meteor/fiber_helpers.js:124:18)
at _.extend._scheduleRun (app/packages/meteor/fiber_helpers.js:102:14)
- - - - -
at ReplSet.checkoutReader (/meteor/containers/003e5cf7-b661-9001-1f2c-0e5139f21cdb/bundle/app/packages/mongo-livedata/node_modules/mongodb/lib/mongodb/connection/repl_set.js:1099:14)
at Cursor.nextObject (/meteor/containers/003e5cf7-b661-9001-1f2c-0e5139f21cdb/bundle/app/packages/mongo-livedata/node_modules/mongodb/lib/mongodb/cursor.js:638:108)
at Future.wrap as _synchronousNextObject
at _.extend._nextObject (app/packages/mongo-livedata/mongo_driver.js:466:22)
at _.extend.forEach (app/packages/mongo-livedata/mongo_driver.js:487:22)
at _.extend.getRawObjects (app/packages/mongo-livedata/mongo_driver.js:531:12)
at _.extend.pollMongo (app/packages/mongo-livedata/mongo_driver.js:802:46)
at Object.
.extend._unthrottledEnsurePollIsScheduled as task
at _.extend._run (app/packages/meteor/fiber_helpers.js:124:18)
at _.extend._scheduleRun (app/packages/meteor/fiber_helpers.js:102:14)

@stephentcannon
Copy link
Author

@stephentcannon stephentcannon commented May 20, 2013

from meteor mongo shell into db I get this weirdness

Mon May 20 12:49:24.922 Socket recv() errno:104 Connection reset by peer 54.235.126.126:27017
Mon May 20 12:49:24.923 SocketException: remote: 54.235.126.126:27017 error: 9001 socket exception [1] server [54.235.126.126:27017]
Mon May 20 12:49:24.923 DBClientCursor::init call() failed
Mon May 20 12:49:24.925 JavaScript execution failed: Error: error doing query: failed at src/mongo/shell/query.js:L78
Mon May 20 12:49:24.926 trying reconnect to production-db-a3.meteor.io:27017
Mon May 20 12:49:25.207 reconnect production-db-a3.meteor.io:27017 ok
Mon May 20 12:49:25.778 reconnect: auth failed db:userSource: "atomic_etherpos_com" user:user: "client" { code: 18, ok: 0.0, errmsg: "auth fails" }

db.registers.findOne()
Mon May 20 12:49:48.124 JavaScript execution failed: error: {
"$err" : "not authorized for query on atomic_etherpos_com.registers",
"code" : 16550
} at src/mongo/shell/query.js:L128

@BenjaminRH
Copy link

@BenjaminRH BenjaminRH commented May 20, 2013

This is a major issue! I'm getting this too, and nobody can currently log into my site with external services.

Exception while invoking method 'login' Error: No replica set primary available for query with ReadPreference PRIMARY
@n1mmy
Copy link
Member

@n1mmy n1mmy commented May 20, 2013

I'm looking into this.

@n1mmy
Copy link
Member

@n1mmy n1mmy commented May 20, 2013

I've restarted a bunch of things, hopefully this is better now. Still looking to find the root cause.

@BenjaminRH
Copy link

@BenjaminRH BenjaminRH commented May 20, 2013

Hmm... Doesn't seem to have solved the issue. Incidentally, thanks for the really quick response!

@n1mmy
Copy link
Member

@n1mmy n1mmy commented May 20, 2013

@BenjaminRH what is your app's name? (feel free to send to me personally at nim@meteor.com if it is a secret)

@BenjaminRH
Copy link

@BenjaminRH BenjaminRH commented May 20, 2013

http://taskly.meteor.com. I was testing it with the person I'm doing it for today, or it wouldn't really be a problem at all.

@stephentcannon
Copy link
Author

@stephentcannon stephentcannon commented May 20, 2013

I had to redeploy after n1mmy last update and it seems fine.

Mongo was misbehavin! It seemed related to inserts, updates and possibly removes. Reads seemed okay but inserts and updates were seemed like the culprit but it is hard to tell from my perspective.

@BenjaminRH
Copy link

@BenjaminRH BenjaminRH commented May 20, 2013

Ahh, I haven't tried redeploying, it didn't occur to me that would be an issue. I'll try that now.

@n1mmy
Copy link
Member

@n1mmy n1mmy commented May 20, 2013

@BenjaminRH I restarted and moved your app. Has this cleared up the error for you?

@BenjaminRH
Copy link

@BenjaminRH BenjaminRH commented May 20, 2013

Yep, it's working fine! Thanks very much @n1mmy

@stephentcannon
Copy link
Author

@stephentcannon stephentcannon commented May 20, 2013

Should I move my apps? I figured redeploying should work but it wasn't for about an hour until after I reported this and you worked your wizardry!

@n1mmy
Copy link
Member

@n1mmy n1mmy commented May 20, 2013

@stephentcannon Is your app still broken too? What are your app names? (again, feel free to email me if you want to keep them secret)

@stephentcannon
Copy link
Author

@stephentcannon stephentcannon commented May 20, 2013

No, it is working now. It wasn't when you first announced that the issue was fixed but I just deployed to see if that would fix it and it is working now. Thank you very much though for fixing this!

@n1mmy
Copy link
Member

@n1mmy n1mmy commented May 20, 2013

No prob. Still working to address the root cause. Sorry for the inconvenience!

@stephentcannon
Copy link
Author

@stephentcannon stephentcannon commented May 20, 2013

Please keep us updated. We just saw it happen again. This time just redeploying seems to have fixed it whereas earlier today that didn't seem to work. Definitely appreciate the help.

@n1mmy
Copy link
Member

@n1mmy n1mmy commented May 20, 2013

The root of the issue is that the mongo driver is not reconnecting to the new server after a mongodb handoff.

There was just another handoff, so I've restarted everyone's apps again.

There are a couple next steps here:

  1. stabilize the A cluster. I think the A cluster is a bit overloaded after the last batch of apps I moved, so I'm gonna try and move some of the big traffic sources off. Maybe also see if there is a single app or two causing all the load and see if they need an index or something.

  2. figure out why the mongo driver isn't reconnecting, and fix it. Possibly this entails upgrading to the newest driver? Our version is getting old at this point, and the changelog has a note replica set reconnects.

@n1mmy
Copy link
Member

@n1mmy n1mmy commented May 21, 2013

I just did one more mongo handoff. I already restarted all the apps, so hopefully no one will notice. Now it's on a bigger server and will hopefully handle the load better.

@n1mmy
Copy link
Member

@n1mmy n1mmy commented May 21, 2013

Ok, I think I've got this nailed finally.

As I said above the immediate cause of the issue was the production-a cluster becoming overloaded and handing off to a new primary. Here's what I've done to address the overload:

  • upgraded the production-a to a larger instance size. It now has 4x as many cores, and somewhat beefier IO. It looks like this is handling the load much better =)
  • examined the logs and found an app with a frequent large query. added an index to their collection. This reduced mongo CPU by about 40%.

The deeper cause was that the node mongo driver does not reconnect correctly, so the error persists until the app is restarted. I was able to replicate this by forcing handoffs on a test cluster. It turns out the mongo driver only reconnects after handoff when a write occurs, not when there are only reads! This was fixed when they re-wrote the whole HA/replica set connection code in node-mongo-native 1.3.

There are two things you can do to fix your app so it doesn't break on handoff:

  • Run with the upgraded driver. Run from the devel branch, run with --release issue-1060-test-1, or cherry pick 9822ace. This will be in the next meteor release.
  • Add a periodic write to your app. It doesn't actually have to write anything. Eg:
Meteor.setInterval(function () {
  SomeCollection.update({_id: 'NoSuch'}, {$unset: {nosuch: 1}});
}, 30000);

This will force the mongo driver to reconnect when mongo hands over.

@n1mmy n1mmy closed this May 21, 2013
@stephentcannon
Copy link
Author

@stephentcannon stephentcannon commented May 21, 2013

Doh! I hope it isn't my app with the big query! If it is/was please tell me. Thanks Nick! I owe you a beer, actually lots of beers!

@n1mmy
Copy link
Member

@n1mmy n1mmy commented May 21, 2013

Just doing my job =)

@Primigenus
Copy link
Contributor

@Primigenus Primigenus commented Aug 12, 2013

This is happening for me today:

[Mon Aug 12 2013 12:20:33 GMT+0000 (UTC)] INFO Exception in queued task: Error: No replica set primary available for query with ReadPreference PRIMARY
at Object.Future.wait (/meteor/dev_bundles/0.3.6/lib/node_modules/fibers/future.js:319:16)
at _.extend._nextObject (app/packages/mongo-livedata/mongo_driver.js:466:47)
at _.extend.forEach (app/packages/mongo-livedata/mongo_driver.js:493:22)
at _.extend.getRawObjects (app/packages/mongo-livedata/mongo_driver.js:537:12)
at _.extend._pollMongo (app/packages/mongo-livedata/mongo_driver.js:808:46)
at Object._.extend._unthrottledEnsurePollIsScheduled [as task] (app/packages/mongo-livedata/mongo_driver.js:752:12)
at _.extend._run (app/packages/meteor/fiber_helpers.js:143:18)
at _.extend._scheduleRun (app/packages/meteor/fiber_helpers.js:121:14)
- - - - -
at ReplSet.checkoutReader (/meteor/containers/4945e4c4-4cdc-4d05-7f5b-36709e6a9049/bundle/app/packages/mongo-livedata/node_modules/mongodb/lib/mongodb/connection/repl_set/repl_set.js:537:14)
at __executeQueryCommand (/meteor/containers/4945e4c4-4cdc-4d05-7f5b-36709e6a9049/bundle/app/packages/mongo-livedata/node_modules/mongodb/lib/mongodb/db.js:1448:38)
at Db._executeQueryCommand (/meteor/containers/4945e4c4-4cdc-4d05-7f5b-36709e6a9049/bundle/app/packages/mongo-livedata/node_modules/mongodb/lib/mongodb/db.js:1640:5)
at Cursor.nextObject (/meteor/containers/4945e4c4-4cdc-4d05-7f5b-36709e6a9049/bundle/app/packages/mongo-livedata/node_modules/mongodb/lib/mongodb/cursor.js:673:13)
at Future.wrap [as _synchronousNextObject] (/meteor/dev_bundles/0.3.6/lib/node_modules/fibers/future.js:30:6)
at _.extend._nextObject (app/packages/mongo-livedata/mongo_driver.js:466:22)
at _.extend.forEach (app/packages/mongo-livedata/mongo_driver.js:493:22)
at _.extend.getRawObjects (app/packages/mongo-livedata/mongo_driver.js:537:12)
at _.extend._pollMongo (app/packages/mongo-livedata/mongo_driver.js:808:46)
at Object._.extend._unthrottledEnsurePollIsScheduled [as task] (app/packages/mongo-livedata/mongo_driver.js:752:12)

The affected sites are q42.nl and q42.com, both hosted with meteor.com. You can see the effect by going to q42.com/blog, where none of the blog posts will load; you'll just see a loading spinner indefinitely.

@avital
Copy link
Contributor

@avital avital commented Aug 12, 2013

@Primigenus Are you running on the latest Meteor release? @n1mmy upgraded the mongo driver in 0.6.4 to a new one with better reconnect logic.

@Primigenus
Copy link
Contributor

@Primigenus Primigenus commented Aug 12, 2013

@avital Yep, the site has been running on 0.6.4 for the past two months (https://github.com/Q42/q42.nl/blob/develop/.meteor/release)

@avital
Copy link
Contributor

@avital avital commented Aug 12, 2013

Looks like it's back up. Did you redeploy? (I was looking into Mongo but couldn't find a reason for the problem. There was definitely a primary on the replica set used by your app.)

@wackazong
Copy link

@wackazong wackazong commented Aug 12, 2013

I have this at the moment since today 11:00 CET. Will redeploy now and see.

@Primigenus
Copy link
Contributor

@Primigenus Primigenus commented Aug 12, 2013

I didn't redeploy, no. Still showing loading spinner for me. Should I redeploy to see if it fixes the problem?

@wackazong
Copy link

@wackazong wackazong commented Aug 12, 2013

I meant to say: My application has the same problem right now. I redeployed it now, that fixed it. Some data loss, though.

@Primigenus
Copy link
Contributor

@Primigenus Primigenus commented Aug 12, 2013

I also just redeployed and that seems to have fixed it for me also. Not sure if there's any data loss as we don't write very often.

@avital
Copy link
Contributor

@avital avital commented Aug 13, 2013

The problem was that one of the mongo nodes in our replica set failed and reached a state it could not get out of (this is the cause of the small amount of data loss). In addition, the mongo driver got stuck trying to connect to the failed node, causing apps to go down. We restored this node and restarted all apps. At the moment we believe that all apps should be functioning. We're still trying to fully understand the root cause and improve our monitoring so that we can more quickly detect this failure state.

@Primigenus
Copy link
Contributor

@Primigenus Primigenus commented Aug 13, 2013

Thanks for taking a look at this, @avital. Everything seems to be fine for now.

@TimothyAhong
Copy link

@TimothyAhong TimothyAhong commented May 5, 2015

I'm running into this issue again when I attempt to deploy via meteor deploy x.meteor.com. I have posted more details on stackoverflow: http://stackoverflow.com/questions/30063102/meteorjs-mongodb-deploy-error

@nabiltntn
Copy link

@nabiltntn nabiltntn commented May 12, 2015

It's back again, i have the same issue after redeploying my app, the same error in logs :No replica set primary available for query with ReadPreference PRIMARY.
The url of my application : http://www.tuniliv.com

Thanks.

@RazZziel
Copy link

@RazZziel RazZziel commented May 12, 2015

My site is still offline since yesterday evening, even after a re-deploy: http://pastebin.com/raw.php?i=UY6z9Bqr
I can't even extract the mongo url with meteor mongo --url {site}: http://pastebin.com/raw.php?i=cSgYLmLQ

@jirkamelichar
Copy link

@jirkamelichar jirkamelichar commented May 12, 2015

I'm having exactly the same issue as well when deploying to *.meteor.com

Log: http://pastebin.com/GpWf8i6f

Connection to Mongo via meteor mongo --url {site} not working either.

@cpalanzo
Copy link

@cpalanzo cpalanzo commented May 12, 2015

Also having this problem since this morning after deploy.

@RazZziel
Copy link

@RazZziel RazZziel commented May 12, 2015

Now it's working again for me

@nabiltntn
Copy link

@nabiltntn nabiltntn commented May 12, 2015

Me too, things are getting better !

@jirkamelichar
Copy link

@jirkamelichar jirkamelichar commented May 12, 2015

Working now - thank you!

@glasser
Copy link
Member

@glasser glasser commented May 12, 2015

We are aware of this outage to some of our Mongo servers and are actively working on resolving it.

@glasser
Copy link
Member

@glasser glasser commented May 12, 2015

(Let's track this in #4373.)

@theartsnetwork
Copy link

@theartsnetwork theartsnetwork commented May 13, 2015

I am facing the same issue with my application site buzzin.artfrontierproject.com since night before yesterday. I couldn't connect with mongoBD and error page showing "This site is loading" message. I am waiting for this issue to be resolved sooner. Please help.

@abhisheksoni27
Copy link

@abhisheksoni27 abhisheksoni27 commented Jul 8, 2015

How can this be resolved?

@jaredtmartin
Copy link

@jaredtmartin jaredtmartin commented Jul 14, 2015

I'm having this problem with returnvisits.meteor.com
To be more specific, I keep getting:
WARNING Error: No replica set primary available for query with ReadPreference PRIMARY

@niranjans
Copy link

@niranjans niranjans commented Aug 26, 2015

I am getting this too on my App deployed on Modulus.

Exception while invoking method 'login' Error: No replica set primary available for query with ReadPreference PRIMARY
@mrcoles
Copy link

@mrcoles mrcoles commented Dec 15, 2015

This is happening to me for the last couple days on http://instaslideshow.meteor.com/

@NikosEfthias
Copy link

@NikosEfthias NikosEfthias commented May 29, 2016

i have the same problem

@motomux
Copy link

@motomux motomux commented Jun 10, 2016

I got the same issue with Meteor 1.2.1.
Is this issue solved on Meteor 1.3?

@33Fraise33
Copy link

@33Fraise33 33Fraise33 commented Jun 14, 2016

Some guy made a comment on a similar problem over her:
#2092 (comment)

Could this be the solution? I got this error yesterday but I don't know why exactly as the only thing that has changed is that I converted a standalone to a single replica set member for Meteor Oplog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.