Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on collections sharded by "_id", Mongoid::Locker can't get a lock #19

Open
mepatterson opened this issue Sep 11, 2013 · 5 comments
Open

Comments

@mepatterson
Copy link

(at least in my production environment)
On any collection sharded by the shard key "_id", this code

2.0.0p247 :001 > i = Item.first
2.0.0p247 :002 > i.with_lock do
2.0.0p247 :003 >     puts i.inspect
2.0.0p247 :004?>   end

throws the exception "Mongoid::Locker::LockError: could not get lock"
from line 148 in lib/mongoid/locker.rb 'lock'

a collection sharded by some other key doesn't seem to have this problem, nor does an unsharded collection.

@afeld
Copy link
Collaborator

afeld commented Sep 11, 2013

consistently?? iiiiiinteresting. not sure how easy it will be to replicate that in a test environment :-/ will try to set up a couple of sharded mongo instances locally.

just for due diligence, would you mind upping the :retries value and the timeout value and see if that makes any difference?

@mepatterson
Copy link
Author

Yeah, man. I tried EVERYTHING. The only reason I figured it out is I had 3 collections, two sharded on "_id" and one sharded on some other field. The latter was the only one that didn't throw the locker exception. So I had my ops guy rebuild the other two collections with different shard keys and it started working perfectly, no code changes on my side.

Certainly open to the idea that you might discover something even more insidious going on, but that's what we determined.

I traced it down to your lock() method where you do the atomic check to see if something is locked or can acquire a lock (and then does it). On my "id" sharded collections, that would fail (return false) on a totally new, totally unlocked document with nils for all the locked* fields. At that point, I couldn't see an obvious problem, but you do use "_id" in your atomic query, so perhaps something going on there when a collection is sharded by _id?

On Sep 11, 2013, at 12:30 AM, Aidan Feldman notifications@github.com wrote:

consistently?? iiiiiinteresting. not sure how easy it will be to replicate that in a test environment :-/ will try to set up a couple of sharded mongo instances locally.

just for due diligence, would you mind upping the :retries value and the timeout value and see if that makes any difference?


Reply to this email directly or view it on GitHub.

@mepatterson
Copy link
Author

I set the retries to 20 or something and it just spun and spun and then threw the exception

On Sep 11, 2013, at 12:34 AM, "Matt E. Patterson" mep@deadluck.net wrote:

Yeah, man. I tried EVERYTHING. The only reason I figured it out is I had 3 collections, two sharded on "_id" and one sharded on some other field. The latter was the only one that didn't throw the locker exception. So I had my ops guy rebuild the other two collections with different shard keys and it started working perfectly, no code changes on my side.

Certainly open to the idea that you might discover something even more insidious going on, but that's what we determined.

I traced it down to your lock() method where you do the atomic check to see if something is locked or can acquire a lock (and then does it). On my "id" sharded collections, that would fail (return false) on a totally new, totally unlocked document with nils for all the locked* fields. At that point, I couldn't see an obvious problem, but you do use "_id" in your atomic query, so perhaps something going on there when a collection is sharded by _id?

On Sep 11, 2013, at 12:30 AM, Aidan Feldman notifications@github.com wrote:

consistently?? iiiiiinteresting. not sure how easy it will be to replicate that in a test environment :-/ will try to set up a couple of sharded mongo instances locally.

just for due diligence, would you mind upping the :retries value and the timeout value and see if that makes any difference?


Reply to this email directly or view it on GitHub.

@afeld
Copy link
Collaborator

afeld commented Sep 11, 2013

Sharding/replication are the things about Mongo I know the least about, so I might pop over to the MongoDB office hours they hold in NYC to see if they have ideas.

Just a stab in the dark, but what indexes do you have on that collection that fails? Any compound indexes that include the _id?

@mepatterson
Copy link
Author

Nope. One of the two troubled collections has a bunch of compound indexes, but none with _id

On Sep 11, 2013, at 12:41 AM, Aidan Feldman notifications@github.com wrote:

Sharding/replication are the things about Mongo I know the least about, so I might pop over to the MongoDB office hours they hold in NYC to see if they have ideas.

Just a stab in the dark, but what indexes do you have on that collection that fails? Any compound indexes that include the _id?


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants