Skip to content
This repository has been archived by the owner on Dec 22, 2020. It is now read-only.

Commit

Permalink
Initial open-source release
Browse files Browse the repository at this point in the history
  • Loading branch information
nelhage committed Jan 22, 2013
0 parents commit df1dcf5
Show file tree
Hide file tree
Showing 20 changed files with 1,122 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
collections.yml
/.bundle/
4 changes: 4 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
source 'https://rubygems.org'

gemspec

48 changes: 48 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
GIT
remote: git@github.com:stripe-internal/mongoriver
revision: d5b5ca1471f9efe7c91b3abe2c26f612a2dd4e9c
ref: d5b5ca1471f9efe7c91b3abe2c26f612a2dd4e9c
specs:
mongoriver (0.0.1)
bson_ext
log4r
mongo (>= 1.7)

PATH
remote: .
specs:
mosql (0.0.1)
bson_ext
json
log4r
mongo
pg
rake
sequel

GEM
remote: https://intgems.stripe.com:446/
specs:
bson (1.7.1)
bson_ext (1.7.1)
bson (~> 1.7.1)
json (1.7.5)
log4r (1.1.10)
metaclass (0.0.1)
minitest (3.0.0)
mocha (0.10.5)
metaclass (~> 0.0.1)
mongo (1.7.1)
bson (~> 1.7.1)
pg (0.14.1)
rake (10.0.2)
sequel (3.41.0)

PLATFORMS
ruby

DEPENDENCIES
minitest
mocha
mongoriver!
mosql!
160 changes: 160 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# MoSQL: a MongoDB → SQL streaming translator

At Stripe, we love MongoDB. We love the flexibility it gives us in
changing data schemas as we grow and learn, and we love its
operational properties. We love replsets. We love the uniform query
language that doesn't require generating and parsing strings, tracking
placeholder parameters, or any of that nonsense.

The thing is, we also love SQL. We love the ease of doing ad-hoc data
analysis over small-to-mid-size datasets in SQL. We love doing JOINs
to pull together reports summarizing properties across multiple
datasets. We love the fact that virtually every employee we hire
already knows SQL and is comfortable using it to ask and answer
questions about data.

So, we thought, why can't we have the best of both worlds? Thus:
MoSQL.

# MoSQL: Put Mo' SQL in your NoSQL

![MoSQL](https://stripe.com/img/blog/posts/mosql/mosql.png)

MoSQL imports the contents of your MongoDB database cluster into a
PostgreSQL instance, using an oplog tailer to keep the SQL mirror live
up-to-date. This lets you run production services against a MongoDB
database, and then run offline analytics or reporting using the full
power of SQL.

## Installation

Install from Rubygems as:

$ gem install mosql

Or build from source by:

$ gem build mosql.gemspec

And then install the built gem.

## The Collection Map file

In order to define a SQL schema and import your data, MoSQL needs a
collection map file describing the schema of your MongoDB data. (Don't
worry -- MoSQL can handle it if your mongo data doesn't always exactly
fit the stated schema. More on that later).

The collection map is a YAML file describing the databases and
collections in Mongo that you want to import, in terms of their SQL
types. An example collection map might be:


mongodb:
blog_posts:
:columns:
- _id: TEXT
- author: TEXT
- title: TEXT
- created: DOUBLE PRECISION
:meta:
:table: blog_posts
:extra_props: true

Said another way, the collection map is a YAML file containing a hash
mapping

<Mongo DB name> -> { <Mongo Collection Name> -> <Collection Definition> }

Where a `<Collection Definition>` is a hash with `:columns` and
`:meta` fields. `:columns` is a list of one-element hashes, mapping
field-name to SQL type. It is required to include at least an `_id`
mapping. `:meta` contains metadata about this collection/table. It is
required to include at least `:table`, naming the SQL table this
collection will be mapped to. `extra_props` determines the handling of
unknown fields in MongoDB objects -- more about that later.

By default, `mosql` looks for a collection map in a file named
`collections.yml` in your current working directory, but you can
specify a different one with `-c` or `--collections`.

## Usage

Once you have a collection map. MoSQL usage is easy. The basic form
is:

mosql [-c collections.yml] [--sql postgres://sql-server/sql-db] [--mongo mongodb://mongo-uri]

By default, `mosql` connects to both PostgreSQL and MongoDB instances
running on default ports on localhost without authentication. You can
point it at different targets using the `--sql` and `--mongo`
command-line parameters.

`mosql` will:

1. Create the appropriate SQL tables
2. Import data from the Mongo database
3. Start tailing the mongo oplog, propogating changes from MongoDB to SQL.


After the first run, `mosql` will store the status of the optailer in
the `mongo_sql` table in your SQL database, and automatically resume
where it left off. `mosql` uses the replset name to keep track of
which mongo database it's tailing, so that you can tail multiple
databases into the same SQL database. If you want to tail the same
replSet, or multiple replSets with the same name, for some reason, you
can use the `--service` flag to change the name `mosql` uses to track
state.

You likely want to run `mosql` against a secondary node, at least for
the initial import, which will cause large amounts of disk activity on
the target node. One option is to use read preferences in your
connection URI:

mosql --mongo mongodb://node1,node2,node3?readPreference=secondary

## Advanced usage

For advanced scenarios, you can pass options to control mosql's
behavior. If you pass `--skip-tail`, mosql will do the initial import,
but not tail the oplog. This could be used, for example, to do an
import off of a backup snapshot, and then start the tailer on the live
cluster.

If you need to force a fresh reimport, run `--reimport`, which will
cause `mosql` to drop tables, create them anew, and do another import.

## Schema mismatches and _extra_props

If MoSQL encounters values in the MongoDB database that don't fit
within the stated schema (e.g. a floating-point value in a INTEGER
field), it will log a warning, ignore the entire object, and continue.

If it encounters a MongoDB object with fields not listed in the
collection map, it will discard the extra fields, unless
`:extra_props` is set in the `:meta` hash. If it is, it will collect
any missing fields, JSON-encode them in a hash, and store the
resulting text in `_extra_props` in SQL. It's up to you to do
something useful with the JSON. One option is to use [plv8][plv8] to
parse them inside PostgreSQL, or you can just pull the JSON out whole
and parse it in application code.

This is also currently the only way to handle array or object values
inside records -- specify `:extra_props`, and they'll get JSON-encoded
into `_extra_props`. There's no reason we couldn't support
JSON-encoded values for individual columns/fields, but we haven't
written that code yet.

[plv8]: http://code.google.com/p/plv8js/

# Development

Patches and contributions are welcome! Please fork the project and
open a pull request on [github][github], or just report issues.

MoSQL includes a small but hopefully-growing test suite. It assumes a
running PostgreSQL and MongoDB instance on the local host; You can
point it at a different target via environment variables; See
`test/functional/_lib.rb` for more information.

[github]: https://github.com/stripe/mosql
12 changes: 12 additions & 0 deletions Rakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
require 'rake/testtask'

task :default
task :build

Rake::TestTask.new do |t|
t.libs = ["lib"]
t.verbose = true
t.test_files = FileList['test/**/*.rb'].reject do |file|
file.end_with?('_lib.rb')
end
end
7 changes: 7 additions & 0 deletions bin/mosql
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/usr/bin/env ruby

require 'rubygems'
require 'bundler/setup'
require 'mosql/cli'

MoSQL::CLI.run(ARGV)
11 changes: 11 additions & 0 deletions lib/mosql.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
require 'log4r'
require 'mongo'
require 'sequel'
require 'mongoriver'
require 'json'

require 'mosql/version'
require 'mosql/log'
require 'mosql/sql'
require 'mosql/schema'
require 'mosql/tailer'
Loading

0 comments on commit df1dcf5

Please sign in to comment.