Skip to content


Subversion checkout URL

You can clone with
Download ZIP


Emily Stolfo edited this page · 8 revisions

Ruby MongoDB FAQ

This is a list of frequently asked questions about using Ruby with MongoDB. If you have a question you'd like to have answered here, please post your question to the mongodb-user list.

Can I run (insert command name here) from the Ruby driver?

Yes. You can run any of the available database commands from the driver using the DB#command method. The only trick is to use an OrderedHash when specifying the command. For example, here's how you'd run an asynchronous fsync from the driver:

include Mongo

# This command is run on the admin database.
@db ='localhost', 27017).db('admin')

# Build the command.
cmd =
cmd['fsync'] = 1

# Run it.

It's important to keep in mind that some commands, like fsync, must be run on the admin database, while other commands can be run on any database. If you're having trouble, check the [command reference|List of Database Commands] to make sure you're using the command correctly.

Does the Ruby driver support an EXPLAIN command?

Yes. explain is, technically speaking, an option sent to a query that tells MongoDB to return an explain plan rather than the query's results. You can use explain by constructing a query and calling explain at the end:

@collection = @db['users']
result = @collection.find({:name => "jones"}).explain

Because this collection has an index on the "name" field, the query uses that index, only having to scan a single record. "n" is the number of records the query will return. "millis" is the time the query takes, in milliseconds. "oldPlan" indicates that the query optimizer has already seen this kind of query and has, therefore, saved an efficient query plan. "allPlans" shows all the plans considered for this query.

I see that BSON supports a symbol type. Does this mean that I can store Ruby symbols in MongoDB?

You can store Ruby symbols in MongoDB, but only as values. BSON specifies that document keys must be strings. So, for instance, you can do this:

@collection = @db['test']

boat_id ={:vehicle  => :boat})
car_id  ={"vehicle" => "car"})

@collection.find_one('_id' => boat_id)
{"_id" => ObjectID('4bb372a8238d3b5c8c000001'), "vehicle" => :boat}

@collection.find_one('_id' => car_id)
{"_id" => ObjectID('4bb372a8238d3b5c8c000002'), "vehicle" => "car"}

Notice that the symbol values are returned as expected, but that symbol keys are treated as strings.

I see BSON documents with identical keys. What happened?

As a rule of thumb, always use ALL symbols or ALL strings as Ruby hash keys, never a mix of the two. As noted above, Ruby symbols are serialized into BSON strings. You could feasibly end up in the following situation:

record = collection.find_one({ :_id => "an_id" })
record.update(:something_to_update => "my_original_value")
# The record document now looks like this in Ruby: { "_id" => "an_id", :something_to_update => "my_original_value" }
# Note that the _id is a string because the original Ruby document's symbol keys were converted to BSON strings.
# You have a mix of symbol and string keys here (which breaks the rule of thumb).

collection.update({ :_id => "an_id" }, record)
# The record is serialized into BSON as: { "_id" => "an_id", "something_to_update" => "my_original_value"}
# Note that the key, "something_to_update" is now a string.

record = collection.find_one({ :_id => "an_id" })
# The record document looks like this in Ruby when deserialized: { "_id" => "an_id", "something_to_update" => "my_original_value" }
# Note that the key "something_to_update" is a string.

record.update(:something_to_update => "a_more_recent_value")
# This is allowed because the Ruby hash has a key "something_to_update" as a string and adding the symbol :something_to_update is allowed. 
# The Ruby hash is now: { "_id" => "an_id", "something_to_update" => "my_original_value", :something_to_update => "a_more_recent_value" }

collection.update({ :_id => "an_id" }, record})
# The document gets serialized to { "_id" => "an_id", "something_to_update" => "my_original_value", "something_to_update" => "a_more_recent_value"}
# because keys and values are just written to a buffer upon serialization. There is no validation on key uniqueness done either by the driver or MongoDB.

As mentioned above, make sure you use ALL symbols or ALL strings as keys in Ruby hashes being converted to BSON documents. In this particular case, another way to avoid this problem is to use update operators. Update operators have the added benefit of requiring only one trip to the server. For example:

collection.update({ :_id => "an_id"}, { "$set" => { :something_to_update => "a_more_recent_value" } } )

Why can't I access random elements within a cursor?

MongoDB cursors are designed for sequentially iterating over a result set, and all the drivers, including the Ruby driver, stick closely to this directive. Internally, a Ruby cursor fetches results in batches by running a MongoDB getmore operation. The results are buffered for efficient iteration on the application-side.

What this means is that a cursor is nothing more than a device for returning a result set on a query that's been initiated on the server. Cursors are not containers for result sets. If we allow a cursor to be randomly accessed, then we run into issues regarding the freshness of the data. For instance, if I iterate over a cursor and then want to retrieve the cursor's first element, should a stored copy be returned, or should the cursor re-run the query? If we returned a stored copy, it may not be fresh. And if the the query is re-run, then we're technically dealing with a new cursor.

To avoid those issues, we're saying that anyone who needs flexible access to the results of a query should store those results in an array and then access the data as needed.

Why can't I save an instance of TimeWithZone?

MongoDB stores times in UTC as the number of milliseconds since the epoch. This means that the Ruby driver serializes Ruby Time objects only. While it would certainly be possible to serialize a TimeWithZone, this isn't preferable since the driver would still deserialize to a Time object.

All that said, if necessary, it'd be easy to write a thin wrapper over the driver that would store an extra time zone attribute and handle the serialization/deserialization of TimeWithZone transparently.

I keep getting CURSOR_NOT_FOUND exceptions. What's happening?

The most likely culprit here is that the cursor is timing out on the server. Whenever you issue a query, a cursor is created on the server. Cursor naturally time out after ten minutes, which means that if you happen to be iterating over a cursor for more than ten minutes, you risk a CURSOR_NOT_FOUND exception.

There are two solutions to this problem. You can either:

  1. Limit your query. Use some combination of limit and skip to reduce the total number of query results. This will, obviously, bring down the time it takes to iterate.

  2. Turn off the cursor timeout. To do that, invoke find with a block, and pass :timeout => false:

@collection.find({}, :timeout => false) do |cursor|
  cursor.each do |document
    # Process documents here

I periodically see connection failures between the driver and MongoDB. Why can't the driver retry the operation automatically?

A connection failure can indicate any number of failure scenarios. Has the server crashed? Are we experiencing a temporary network partition? Is there a bug in our ssh tunnel?

Without further investigation, it's impossible to know exactly what has caused the connection failure. Furthermore, when we do see a connection failure, it's impossible to know how many operations prior to the failure succeeded. For example, imagine that we are expecting writes to be acknowledged, and we send an $inc operation to the server. It's entirely possible that the server has received the $inc but failed on the call to getLastError. In that case, retrying the operation would result in a double-increment.

Because of the indeterminacy involved, the MongoDB drivers will not retry operations on connection failure. How connection failures should be handled is entirely dependent on the application. Therefore, we leave it to the application developers to make the best decision in this case.

The drivers will reconnect on the subsequent operation.

Why is the 'm' option always set when Ruby regexes are serialized to BSON regexes?

You should always first consider using the BSON::Regex class over defining regexes in Ruby if you are going to save them to the database or use them in a query.

Concerning deserializaton, consider setting the compile_regex option to false in case a BSON regular expression won't compile correctly to a Ruby regular expression.

coll.find({}, :compile_regex => false)

With this option set to false, BSON regular expressions will be deserialized to instances of the BSON::Regex class. BSON::Regex instances define the regex and flags separately and aren't compiled, unless you call the #try_compile method on them.

In Ruby, the ^ and $ characters ALWAYS match the start and end of a line. This is not configurable. BSON regular expressions (and many other regex implementations) do make this an option and refer to it as the MULTILINE mode ('m').

Since Ruby does not allow this as an option for regular expressions, it instead uses the MULTILINE flag to mean that the dot (.) matches newlines. BSON regular expressions refer to this as the DOTALL mode ('s').

As an example, consider the following regular expression defined in your Ruby code:

collection.find({ name: /^Mich/ })

Because what BSON regular expressions consider the "m" option is always on in Ruby regular expressions, the above expression translates to:

query: { $query: { name: /^Mich/m } }

If you are looking to create a BSON regular expression without the m flag, you should use the BSON::Regex class:

collection.find({ name:"^Mich") })

will correspond to the following expression on the server:

query: { $query: { name: /^Mich/ } }

Additionally, be aware that defining a Ruby regular expression with the "m" option will translate to an expression with the "s" option on the server:

collection.find({ name: /^Mich/m })

will become:

query: { $query: { name: /^Mich/ms } }

I occasionally get an error saying that responses are out of order. What's happening?

See (this JIRA issue)[].

Something went wrong with that request. Please try again.