Permalink
Browse files

Initial open-source release

  • Loading branch information...
0 parents commit df1dcf55085edcf0e1ae7f3d8da0591344b12d32 @nelhage nelhage committed Jan 22, 2013
@@ -0,0 +1,2 @@
+collections.yml
+/.bundle/
@@ -0,0 +1,4 @@
+source 'https://rubygems.org'
+
+gemspec
+
@@ -0,0 +1,48 @@
+GIT
+ remote: git@github.com:stripe-internal/mongoriver
+ revision: d5b5ca1471f9efe7c91b3abe2c26f612a2dd4e9c
+ ref: d5b5ca1471f9efe7c91b3abe2c26f612a2dd4e9c
+ specs:
+ mongoriver (0.0.1)
+ bson_ext
+ log4r
+ mongo (>= 1.7)
+
+PATH
+ remote: .
+ specs:
+ mosql (0.0.1)
+ bson_ext
+ json
+ log4r
+ mongo
+ pg
+ rake
+ sequel
+
+GEM
+ remote: https://intgems.stripe.com:446/
+ specs:
+ bson (1.7.1)
+ bson_ext (1.7.1)
+ bson (~> 1.7.1)
+ json (1.7.5)
+ log4r (1.1.10)
+ metaclass (0.0.1)
+ minitest (3.0.0)
+ mocha (0.10.5)
+ metaclass (~> 0.0.1)
+ mongo (1.7.1)
+ bson (~> 1.7.1)
+ pg (0.14.1)
+ rake (10.0.2)
+ sequel (3.41.0)
+
+PLATFORMS
+ ruby
+
+DEPENDENCIES
+ minitest
+ mocha
+ mongoriver!
+ mosql!
@@ -0,0 +1,160 @@
+# MoSQL: a MongoDB → SQL streaming translator
+
+At Stripe, we love MongoDB. We love the flexibility it gives us in
+changing data schemas as we grow and learn, and we love its
+operational properties. We love replsets. We love the uniform query
+language that doesn't require generating and parsing strings, tracking
+placeholder parameters, or any of that nonsense.
+
+The thing is, we also love SQL. We love the ease of doing ad-hoc data
+analysis over small-to-mid-size datasets in SQL. We love doing JOINs
+to pull together reports summarizing properties across multiple
+datasets. We love the fact that virtually every employee we hire
+already knows SQL and is comfortable using it to ask and answer
+questions about data.
+
+So, we thought, why can't we have the best of both worlds? Thus:
+MoSQL.
+
+# MoSQL: Put Mo' SQL in your NoSQL
+
+![MoSQL](https://stripe.com/img/blog/posts/mosql/mosql.png)
+
+MoSQL imports the contents of your MongoDB database cluster into a
+PostgreSQL instance, using an oplog tailer to keep the SQL mirror live
+up-to-date. This lets you run production services against a MongoDB
+database, and then run offline analytics or reporting using the full
+power of SQL.
+
+## Installation
+
+Install from Rubygems as:
+
+ $ gem install mosql
+
+Or build from source by:
+
+ $ gem build mosql.gemspec
+
+And then install the built gem.
+
+## The Collection Map file
+
+In order to define a SQL schema and import your data, MoSQL needs a
+collection map file describing the schema of your MongoDB data. (Don't
+worry -- MoSQL can handle it if your mongo data doesn't always exactly
+fit the stated schema. More on that later).
+
+The collection map is a YAML file describing the databases and
+collections in Mongo that you want to import, in terms of their SQL
+types. An example collection map might be:
+
+
+ mongodb:
+ blog_posts:
+ :columns:
+ - _id: TEXT
+ - author: TEXT
+ - title: TEXT
+ - created: DOUBLE PRECISION
+ :meta:
+ :table: blog_posts
+ :extra_props: true
+
+Said another way, the collection map is a YAML file containing a hash
+mapping
+
+ <Mongo DB name> -> { <Mongo Collection Name> -> <Collection Definition> }
+
+Where a `<Collection Definition>` is a hash with `:columns` and
+`:meta` fields. `:columns` is a list of one-element hashes, mapping
+field-name to SQL type. It is required to include at least an `_id`
+mapping. `:meta` contains metadata about this collection/table. It is
+required to include at least `:table`, naming the SQL table this
+collection will be mapped to. `extra_props` determines the handling of
+unknown fields in MongoDB objects -- more about that later.
+
+By default, `mosql` looks for a collection map in a file named
+`collections.yml` in your current working directory, but you can
+specify a different one with `-c` or `--collections`.
+
+## Usage
+
+Once you have a collection map. MoSQL usage is easy. The basic form
+is:
+
+ mosql [-c collections.yml] [--sql postgres://sql-server/sql-db] [--mongo mongodb://mongo-uri]
+
+By default, `mosql` connects to both PostgreSQL and MongoDB instances
+running on default ports on localhost without authentication. You can
+point it at different targets using the `--sql` and `--mongo`
+command-line parameters.
+
+`mosql` will:
+
+ 1. Create the appropriate SQL tables
+ 2. Import data from the Mongo database
+ 3. Start tailing the mongo oplog, propogating changes from MongoDB to SQL.
+
+
+After the first run, `mosql` will store the status of the optailer in
+the `mongo_sql` table in your SQL database, and automatically resume
+where it left off. `mosql` uses the replset name to keep track of
+which mongo database it's tailing, so that you can tail multiple
+databases into the same SQL database. If you want to tail the same
+replSet, or multiple replSets with the same name, for some reason, you
+can use the `--service` flag to change the name `mosql` uses to track
+state.
+
+You likely want to run `mosql` against a secondary node, at least for
+the initial import, which will cause large amounts of disk activity on
+the target node. One option is to use read preferences in your
+connection URI:
+
+ mosql --mongo mongodb://node1,node2,node3?readPreference=secondary
+
+## Advanced usage
+
+For advanced scenarios, you can pass options to control mosql's
+behavior. If you pass `--skip-tail`, mosql will do the initial import,
+but not tail the oplog. This could be used, for example, to do an
+import off of a backup snapshot, and then start the tailer on the live
+cluster.
+
+If you need to force a fresh reimport, run `--reimport`, which will
+cause `mosql` to drop tables, create them anew, and do another import.
+
+## Schema mismatches and _extra_props
+
+If MoSQL encounters values in the MongoDB database that don't fit
+within the stated schema (e.g. a floating-point value in a INTEGER
+field), it will log a warning, ignore the entire object, and continue.
+
+If it encounters a MongoDB object with fields not listed in the
+collection map, it will discard the extra fields, unless
+`:extra_props` is set in the `:meta` hash. If it is, it will collect
+any missing fields, JSON-encode them in a hash, and store the
+resulting text in `_extra_props` in SQL. It's up to you to do
+something useful with the JSON. One option is to use [plv8][plv8] to
+parse them inside PostgreSQL, or you can just pull the JSON out whole
+and parse it in application code.
+
+This is also currently the only way to handle array or object values
+inside records -- specify `:extra_props`, and they'll get JSON-encoded
+into `_extra_props`. There's no reason we couldn't support
+JSON-encoded values for individual columns/fields, but we haven't
+written that code yet.
+
+[plv8]: http://code.google.com/p/plv8js/
+
+# Development
+
+Patches and contributions are welcome! Please fork the project and
+open a pull request on [github][github], or just report issues.
+
+MoSQL includes a small but hopefully-growing test suite. It assumes a
+running PostgreSQL and MongoDB instance on the local host; You can
+point it at a different target via environment variables; See
+`test/functional/_lib.rb` for more information.
+
+[github]: https://github.com/stripe/mosql
@@ -0,0 +1,12 @@
+require 'rake/testtask'
+
+task :default
+task :build
+
+Rake::TestTask.new do |t|
+ t.libs = ["lib"]
+ t.verbose = true
+ t.test_files = FileList['test/**/*.rb'].reject do |file|
+ file.end_with?('_lib.rb')
+ end
+end
@@ -0,0 +1,7 @@
+#!/usr/bin/env ruby
+
+require 'rubygems'
+require 'bundler/setup'
+require 'mosql/cli'
+
+MoSQL::CLI.run(ARGV)
@@ -0,0 +1,11 @@
+require 'log4r'
+require 'mongo'
+require 'sequel'
+require 'mongoriver'
+require 'json'
+
+require 'mosql/version'
+require 'mosql/log'
+require 'mosql/sql'
+require 'mosql/schema'
+require 'mosql/tailer'
Oops, something went wrong.

0 comments on commit df1dcf5

Please sign in to comment.