INFO Exception in queued task: Error: No replica set primary available for query with ReadPreference PRIMARY #1060
Comments
from meteor mongo shell into db I get this weirdness Mon May 20 12:49:24.922 Socket recv() errno:104 Connection reset by peer 54.235.126.126:27017
|
This is a major issue! I'm getting this too, and nobody can currently log into my site with external services.
|
I'm looking into this. |
I've restarted a bunch of things, hopefully this is better now. Still looking to find the root cause. |
Hmm... Doesn't seem to have solved the issue. Incidentally, thanks for the really quick response! |
@BenjaminRH what is your app's name? (feel free to send to me personally at nim@meteor.com if it is a secret) |
http://taskly.meteor.com. I was testing it with the person I'm doing it for today, or it wouldn't really be a problem at all. |
I had to redeploy after n1mmy last update and it seems fine. Mongo was misbehavin! It seemed related to inserts, updates and possibly removes. Reads seemed okay but inserts and updates were seemed like the culprit but it is hard to tell from my perspective. |
Ahh, I haven't tried redeploying, it didn't occur to me that would be an issue. I'll try that now. |
@BenjaminRH I restarted and moved your app. Has this cleared up the error for you? |
Yep, it's working fine! Thanks very much @n1mmy |
Should I move my apps? I figured redeploying should work but it wasn't for about an hour until after I reported this and you worked your wizardry! |
@stephentcannon Is your app still broken too? What are your app names? (again, feel free to email me if you want to keep them secret) |
No, it is working now. It wasn't when you first announced that the issue was fixed but I just deployed to see if that would fix it and it is working now. Thank you very much though for fixing this! |
No prob. Still working to address the root cause. Sorry for the inconvenience! |
Please keep us updated. We just saw it happen again. This time just redeploying seems to have fixed it whereas earlier today that didn't seem to work. Definitely appreciate the help. |
The root of the issue is that the mongo driver is not reconnecting to the new server after a mongodb handoff. There was just another handoff, so I've restarted everyone's apps again. There are a couple next steps here:
|
I just did one more mongo handoff. I already restarted all the apps, so hopefully no one will notice. Now it's on a bigger server and will hopefully handle the load better. |
Ok, I think I've got this nailed finally. As I said above the immediate cause of the issue was the production-a cluster becoming overloaded and handing off to a new primary. Here's what I've done to address the overload:
The deeper cause was that the node mongo driver does not reconnect correctly, so the error persists until the app is restarted. I was able to replicate this by forcing handoffs on a test cluster. It turns out the mongo driver only reconnects after handoff when a write occurs, not when there are only reads! This was fixed when they re-wrote the whole HA/replica set connection code in node-mongo-native 1.3. There are two things you can do to fix your app so it doesn't break on handoff:
This will force the mongo driver to reconnect when mongo hands over. |
Doh! I hope it isn't my app with the big query! If it is/was please tell me. Thanks Nick! I owe you a beer, actually lots of beers! |
Just doing my job =) |
This is happening for me today:
The affected sites are q42.nl and q42.com, both hosted with meteor.com. You can see the effect by going to q42.com/blog, where none of the blog posts will load; you'll just see a loading spinner indefinitely. |
@Primigenus Are you running on the latest Meteor release? @n1mmy upgraded the mongo driver in 0.6.4 to a new one with better reconnect logic. |
@avital Yep, the site has been running on 0.6.4 for the past two months (https://github.com/Q42/q42.nl/blob/develop/.meteor/release) |
Looks like it's back up. Did you redeploy? (I was looking into Mongo but couldn't find a reason for the problem. There was definitely a primary on the replica set used by your app.) |
I have this at the moment since today 11:00 CET. Will redeploy now and see. |
I didn't redeploy, no. Still showing loading spinner for me. Should I redeploy to see if it fixes the problem? |
I meant to say: My application has the same problem right now. I redeployed it now, that fixed it. Some data loss, though. |
I also just redeployed and that seems to have fixed it for me also. Not sure if there's any data loss as we don't write very often. |
The problem was that one of the mongo nodes in our replica set failed and reached a state it could not get out of (this is the cause of the small amount of data loss). In addition, the mongo driver got stuck trying to connect to the failed node, causing apps to go down. We restored this node and restarted all apps. At the moment we believe that all apps should be functioning. We're still trying to fully understand the root cause and improve our monitoring so that we can more quickly detect this failure state. |
Thanks for taking a look at this, @avital. Everything seems to be fine for now. |
I'm running into this issue again when I attempt to deploy via meteor deploy x.meteor.com. I have posted more details on stackoverflow: http://stackoverflow.com/questions/30063102/meteorjs-mongodb-deploy-error |
It's back again, i have the same issue after redeploying my app, the same error in logs :No replica set primary available for query with ReadPreference PRIMARY. Thanks. |
My site is still offline since yesterday evening, even after a re-deploy: http://pastebin.com/raw.php?i=UY6z9Bqr |
I'm having exactly the same issue as well when deploying to Log: http://pastebin.com/GpWf8i6f Connection to Mongo via |
Also having this problem since this morning after deploy. |
Now it's working again for me |
Me too, things are getting better ! |
Working now - thank you! |
We are aware of this outage to some of our Mongo servers and are actively working on resolving it. |
(Let's track this in #4373.) |
I am facing the same issue with my application site buzzin.artfrontierproject.com since night before yesterday. I couldn't connect with mongoBD and error page showing "This site is loading" message. I am waiting for this issue to be resolved sooner. Please help. |
How can this be resolved? |
I'm having this problem with returnvisits.meteor.com |
I am getting this too on my App deployed on Modulus.
|
This is happening to me for the last couple days on http://instaslideshow.meteor.com/ |
i have the same problem |
I got the same issue with Meteor 1.2.1. |
Some guy made a comment on a similar problem over her: Could this be the solution? I got this error yesterday but I don't know why exactly as the only thing that has changed is that I converted a standalone to a single replica set member for Meteor Oplog |
Seeing this a lot on apps that have been migrated to production-db-a3.meteor.io:27017/atomic_etherpos_com
[Mon May 20 2013 16:46:30 GMT+0000 (UTC)] INFO Exception in queued task: Error: No replica set primary available for query with ReadPreference PRIMARY
at Object.Future.wait (/meteor/dev_bundles/0.3.0/lib/node_modules/fibers/future.js:319:16)
at _.extend._nextObject (app/packages/mongo-livedata/mongo_driver.js:466:47)
at _.extend.forEach (app/packages/mongo-livedata/mongo_driver.js:487:22)
at _.extend.getRawObjects (app/packages/mongo-livedata/mongo_driver.js:531:12)
at _.extend.pollMongo (app/packages/mongo-livedata/mongo_driver.js:802:46)
at Object..extend._unthrottledEnsurePollIsScheduled as task
at _.extend._run (app/packages/meteor/fiber_helpers.js:124:18)
at _.extend._scheduleRun (app/packages/meteor/fiber_helpers.js:102:14)
- - - - -
at ReplSet.checkoutReader (/meteor/containers/003e5cf7-b661-9001-1f2c-0e5139f21cdb/bundle/app/packages/mongo-livedata/node_modules/mongodb/lib/mongodb/connection/repl_set.js:1099:14)
at Cursor.nextObject (/meteor/containers/003e5cf7-b661-9001-1f2c-0e5139f21cdb/bundle/app/packages/mongo-livedata/node_modules/mongodb/lib/mongodb/cursor.js:638:108)
at Future.wrap as _synchronousNextObject
at _.extend._nextObject (app/packages/mongo-livedata/mongo_driver.js:466:22)
at _.extend.forEach (app/packages/mongo-livedata/mongo_driver.js:487:22)
at _.extend.getRawObjects (app/packages/mongo-livedata/mongo_driver.js:531:12)
at _.extend.pollMongo (app/packages/mongo-livedata/mongo_driver.js:802:46)
at Object..extend._unthrottledEnsurePollIsScheduled as task
at _.extend._run (app/packages/meteor/fiber_helpers.js:124:18)
at _.extend._scheduleRun (app/packages/meteor/fiber_helpers.js:102:14)
The text was updated successfully, but these errors were encountered: