Permalink
Find file
Fetching contributors…
Cannot retrieve contributors at this time
814 lines (552 sloc) 21.9 KB
% MongoDB
% Valtech *Cours du Soir*, Eric Le Merdy
% Feb 28, 2012
What's the plan ?
=================
![](00_mongodb-logo.png)
#. MongoDB startup for developpers
#. Collection of documents != Relational tables
+ Database design and Indexes
#. Learn MongoDB query language
#. Java Driver basic usage
#. Replication
#. Map/Reduce
Slides...
=========
- ...are available in a github repository:
git clone https://ericlemerdy@github.com/ericlemerdy/mongodb-slidy-tutorial.git
firefox mongodb-slidy-tutorial/tutorial.html
MongoDB startup
===============
_Have you ever installed Oracle ?_
Install the database
====================
- Visit the [Download page](http://www.mongodb.org/downloads)
- Install the "Production Release"
- used 2.0.2 for this tutorial
Run the database
================
- MongoDB is essentially a database daemon :
cd mongodb-linux-i686-2.0.2/bin
./mongod
- Expected output:
ericlemerdy@harukiya:~/mongodb-linux-i686-2.0.2/bin$ ./mongod
./mongod --help for help and startup options
Mon Feb 6 22:35:59
Mon Feb 6 22:35:59 warning: 32-bit servers don't have journaling enabled by default. Please use --journal if you want durability.
Mon Feb 6 22:35:59
Mon Feb 6 22:35:59 [initandlisten] MongoDB starting : pid=19742 port=27017 dbpath=/data/db/ 32-bit host=harukiya
...
Mon Feb 6 22:35:59 [initandlisten] db version v2.0.2, pdfile version 4.5
...
Mon Feb 6 22:35:59 [initandlisten] exception in initAndListen: 10296 dbpath (/data/db/) does not exist, terminating
...
Mon Feb 6 22:35:59 dbexit: really exiting now
ericlemerdy@harukiya:~/mongodb-linux-i686-2.0.2/bin$
Why it shutdown ?
=================
- In the logs:
dbpath (/data/db/) does not exist, terminating
- MongoDB expects a place on disk to store the database !
Run the database on a specified path
====================================
- Either
- Create the _/data/db_ default directory
- Or use the _dbpath_ parameter:
./mongod --help | grep "\-\-dbpath"
--dbpath arg directory for datafiles
- Create a space and restart mongo on this space
mkdir ~/mongodatadb
./mongod --dbpath ~/mongodatadb
- The mongod should now run successfully:
...
Mon Feb 6 23:58:04 [initandlisten] options: { dbpath: "~/mongodatadb" }
Mon Feb 6 23:58:04 [websvr] admin web console waiting for connections on port 28017
Mon Feb 6 23:58:04 [initandlisten] waiting for connections on port 27017
The Admin Web Console ?
=======================
- [http://localhost:28017](http://localhost:28017)
![the admin web console screenshot](01_admin-web-console.png)
Does it work out-of-the-box ?
=============================
- Click on [listDatabases](http://localhost:28017/listDatabases?text=1)
REST is not enabled. use --rest to turn on.
check that port 28017 is secured for the network too.
- Damn !
Activate rest
=============
- Shutdown mongod
- Restart with _--rest_ option:
./mongod --dbpath ../../mongodatadb --rest
- Option has been passed:
Tue Feb 7 00:36:02 [initandlisten] options: { dbpath: "../../mongodatadb", rest: true }
- The [listDatabases](http://localhost:28017/listDatabases?text=1) should now present an empty json list:
{ "databases" : [
{ "name" : "local",
"sizeOnDisk" : 1,
"empty" : true } ],
"totalSize" : 0 }
Why JSON ?
==========
{ "whyJSON" :
{ "because" :
{ "JSON" : "rocks" } }
}
- MongoDB Admin Web Console uses the rest extension to easily display the results of commands
- BSON is the binary encoded version of JSON
- JSON is first class citizen in MongoDB
It is time to open a MongoDB shell !
====================================
- Practice first, how to start a mongo client:
./mongo
- You should get a brand new shell
MongoDB shell version: 2.0.2
connecting to: test
>
- This has worked because
- You started _mongod_ on the default _27017_ port
- _mongo_ connects on a database named _test_ by default
Help
====
> help
- Returns
db.help() help on db methods
db.mycoll.help() help on collection methods
rs.help() help on replica set methods
help admin administrative help
help connect connecting to a db help
help keys key shortcuts
help misc misc things to know
help mr mapreduce
show dbs show database names
show collections show collections in current database
show users show users in current database
show profile show most recent system.profile entries with time >= 1ms
show logs show the accessible logger names
show log [name] prints out the last segment of log in memory, 'global' is default
use <db_name> set current database
db.foo.find() list objects in collection foo
db.foo.find( { a : 1 } ) list objects in foo where a == 1
it result of the last line evaluated; use to further iterate
DBQuery.shellBatchSize = x set default number of items to display on shell
exit quit the mongo shell
First Achievement : MongoDB startup
===================================
- First achievement :
![MongoDB startup is easy](02_badge-easy.jpeg) MongoDB startup for developpers is easy !
Data breakdown structure
========================
- Document < Collection < Database < Daemon
- Document : JSON fragment
ex:
{ "mongo" :
{ "features" : [fail-over, replication, sharding, geolocation] }
}
- Collection : several Documents
> show collections
- Database : several Collections
> show dbs
local (empty)
- Daemon : several Databases (physically stored in the dbpath)
JSON
====
- Object
{ pairs }
- Pair
string : Value
- Value
Array, primitive values
- Array
[ Values ]
Sample document
===============
- A JSON document :
> clipart = {
"title" : "Code Story logo",
"uploader" : {
"name" : "ericlemerdy",
"href" : "http://openclipart.org/user-detail/ericlemerdy"
},
"drawn by" : "ericlemerdy",
"created" : ISODate("2012-02-08T02:05:15.401Z"),
"description" : "My contribution for the www.codestory.net logo.",
"tags" : [ "developpers", "code", "computer", "people", "logo", "public domain" ],
"viewed by" : 236,
"image" : {
"image/svg+xml" : "http://openclipart.org/people/ericlemerdy/Code_Story_Logo.svg",
"image/png" : "http://openclipart.org/image/800px/svg_to_png/166309/Code_Story_Logo.png"
}
}
Relational schema
==================
- ![](03_relational-schema.png "Relational schema")
http://yuml.me/diagram/scruffy/class/[Clipart|title : String;drawn-by : String;created : Date;description : String;tags : List<String>;viewed-by : Integer], [User|name : String;href : Url], [Image|svg : Url;png : Url],[Clipart]<>*-uploader 1>[User], [Clipart]<>1-image 1>[Image]
I ♥ SQL
========
- MongoDB is __NOT__ a relational database
- Forget what you learn about normalisation
This just not apply to what mongodb can do
- In relational model, there is a promise : "You can decouple schema design from applicative work (thanks to normalization)"
That's why we get RDBMs specialists after all...
- In MongoDB, you have to design your documents __for the job__ that your application must do.
Document based schema design principles
=======================================
- Transaction is only __document-based__
- Storage is cheap
- __Do not__ perform runtime relations
__Duplicate__ !
E.g. : Clipart -> User
- Think of consequences in your application :
When a user change its display name...
- Does not apply to all domains where A(tomicity), C(onsistency), I(solation), D(urability) is appreciated through the entire schema.
Schema free ?
=============
- A single collection can store different "types" of documents
- Manage the schema evolution by your application
Evolving data structure when accessing the document for exemple
Achievement 2 : Document schema design principles
=================================================
- Second achievement :
![NoSQL document database != relationnal](04_badge-nosql.jpeg) Document oriented schema design is different from relational !
- More informations : [10gen - MongoDB Presentations - Schema Design Principles and Practice](http://www.10gen.com/presentations/mongosv-2011/schema-design-principles-and-practice)
Are you empty ?
===============
- Connect to a database
> use test
switched to db test
- Find the collections of the database
> db.getCollectionNames()
[ ]
- The database is empty !
Create a collection
===================
- Let's create the cliparts collection.
- There is nothing to do as long you do not insert to a first document in it :
> db.cliparts.findOne()
null
- Does a read has created the collection ?
> db.getCollectionNames()
[ ]
- Obviously no
Insert the document in the collection
=====================================
> db.cliparts.save( clipart )
- Does a write has created the collection ?
> db.getCollectionNames()
[ "cliparts", "system.indexes" ]
- Obviously yes
- Even 2 collections !
- cliparts : the collection that contains our new document
- system.indexes : the index to allow mongo to be fast searching
Counting
========
- Counting elements
> db.cliparts.count()
1
Display collection content
==========================
- Show me the content
> db.cliparts.find()
{ "_id" : ObjectId("4f32ed58a1d1c40cacf43637"), "title" : "Code Story logo", (...) "image/png" : "http://openclipart.org/image/800px/svg_to_png/166309/Code_Story_Logo.png" } }
- Formatted
> db.cliparts.findOne()
{
"_id" : ObjectId("4f32ed58a1d1c40cacf43637"),
"title" : "Code Story logo",
(...)
"image/png" : "http://openclipart.org/image/800px/svg_to_png/166309/Code_Story_Logo.png"
}
}
What's new ?
============
- There is now an id in the object :
> db.cliparts.find( { }, { "_id" : 1, "title" : 1 } )
{ "_id" : ObjectId("4f32ed58a1d1c40cacf43637"), "title" : "Code Story logo" }
- Generated by the mongoDB instance
- Must be unique per collection (and per cluser)
+-------+-------+---+-------+
|0 1 2 3|4 5 6 |7 8|9 10 11|
+=======+=======+===+=======+
|time |machine|pid|inc |
+-------+-------+---+-------+
- Unix time + md5 machine + process id + increment value
Insert another document
=======================
- Create a second variable
> clipart2 = {
"title" : "PC LAPTOP NOTEBOOK",
"uploader" : { "name" : "Keistutis", "href" : "http://openclipart.org/user-detail/Keistutis" },
"drawn by" : "Kostas Šliažas / Keistutis", "created" : ISODate("2012-02-23T09:28:36.700Z"),
"description" : null, "tags" : [ "PC", "LAPTOP", "NOTEBOOK" ],
"viewed by" : 173, "image" : {
"image/svg+xml" : "http://openclipart.org/people/Keistutis/PC.svg",
"image/png" : "http://openclipart.org/image/800px/svg_to_png/168489/PC.png"
}
}
- Insert it twice
> db.cliparts.save( clipart2 )
> delete clipart2._id # Otherwise it will be saved in the same document.
true
> db.cliparts.save( clipart2 )
- Verify
> db.cliparts.count()
3
Edit a document
===============
- Extract the first uploader :
> var ericUploader = db.cliparts.findOne( { "uploader.name" : /eric/ } ).uploader
> ericUploader
{
"name" : "ericlemerdy",
"href" : "http://openclipart.org/user-detail/ericlemerdy"
}
- Set it in the second document :
> db.cliparts.findAndModify( {
query : { "title" : "PC LAPTOP NOTEBOOK" }, // Query selector.
sort : {}, // Sort in case several docs are matched.
// remove : true, // To delete the document.
update : { $set : { "uploader" : ericUploader } }, // Modifier object.
new : false, // True to get the newly created document.
// fields : {}, // Retrieve only a sub-set of the document fields.
upsert : false // True to create object if it does not exists.
} )
{
"_id" : ObjectId("4f4a3df18d08530d9516afbe"),
"title" : "PC LAPTOP NOTEBOOK",
"uploader" : {
"name" : "Keistutis",
"href" : "http://openclipart.org/user-detail/Keistutis"
},
(...)
}
Verify document
===============
- Find it :
> db.cliparts.findOne( { "title" : /PC/, "uploader.name" : /eric/ } )
{
"_id" : ObjectId("4f4a3df18d08530d9516afbe"),
"created" : ISODate("2012-02-23T09:28:36.700Z"),
"description" : null,
"drawn by" : "Kostas Šliažas / Keistutis",
"image" : {
"image/svg+xml" : "http://openclipart.org/people/Keistutis/PC.svg",
"image/png" : "http://openclipart.org/image/800px/svg_to_png/168489/PC.png"
},
"tags" : [
"PC",
"LAPTOP",
"NOTEBOOK"
],
"title" : "PC LAPTOP NOTEBOOK",
"uploader" : {
"name" : "ericlemerdy",
"href" : "http://openclipart.org/user-detail/ericlemerdy"
},
"viewed by" : 173
}
Distinct values
===============
- What are the distinct title of the cliparts collection ?
> db.cliparts.distinct("title")
[ "Code Story logo", "PC LAPTOP NOTEBOOK" ]
- What are the distinct tags of the cliparts collection ?
> db.cliparts.distinct("tags")
[ "code", "computer", "developpers", "logo", "people", "public domain", "LAPTOP", "NOTEBOOK", "PC" ]
Explain query
=============
> db.cliparts.find().explain()
{
"cursor" : "BasicCursor",
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 3,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
- Traversing the entire collection
> db.cliparts.find( { "uploader.name": /eric/ } ).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
- Same traversing but only 2 objects returned.
Create an index
===============
- Create a composite index on uploader name:
> db.cliparts.createIndex( { "uploader.name" : 1 } )
- Visualize effect on search
> db.cliparts.find( { "uploader.name": /eric/ } ).explain()
{
"cursor" : "BtreeCursor uploader.name_1 multi",
"nscanned" : 3,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"uploader.name" : [
[ "", { } ],
[ /eric/, /eric/ ]
]
}
}
Achievement 3 : Master shell
============================
- Third achievement :
![Be a shell master or the mastershell](05_badge-shell.jpeg) Be a master shell !
- Let's move on to java !
Maven project
=============
- There is a maven project ready to import in your IDE:
mongodb-slidy-tutorial/mongodb-java-project/
- Un-ignore all tests and make it green !
Achievement 4 : Use mongoDB from java
=====================================
- Fourth achievement :
![Learn MongoDB in Java](06_badge-java.jpeg) You have learnt how to speak MongoDB in java !
- Now, let's replicate
Why replicate a database ?
==========================
- __Fail-over__ :
if a node goes down the service is still up and running
- __Performance__ :
if several nodes can serve the data, performance should increase
Replication in MongoDB
======================
- Implemented with : Replica Set
- Out-of-the-box feature
- Let's build a replica set to understand
I will be your master
=====================
#. Restart mongo with a replicaset name
./mongod --dbpath ../../mongodatadb --rest --replSet valtech
(...)
Mon Feb 27 02:39:28 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Mon Feb 27 02:39:28 [rsStart] replSet info you may need to run replSetInitiate -- rs.initiate() in the shell -- if that is not already done
#. See the status : [http://127.0.0.1:28017/_replSet](http://127.0.0.1:28017/_replSet)
#. Initiating the replica set :
> rs.initiate()
{
"info2" : "no configuration explicitly specified -- making one",
"me" : "harukiya:27017",
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
And what is your IP address ?
=============================
#. Please join
PRIMARY> rs.add("your_ip")
{ "ok" : 1 }
On you logs:
Mon Feb 27 03:02:47 [rsHealthPoll] replSet member harukiya:27018 is now in state STARTUP2
Mon Feb 27 03:02:49 [rsHealthPoll] replSet member harukiya:27018 is now in state RECOVERING
(...)
Mon Feb 27 03:03:04 [slaveTracking] build index local.slaves { _id: 1 }
Mon Feb 27 03:03:04 [slaveTracking] build index done 0 records 0 secs
(...)
Mon Feb 27 03:03:07 [rsHealthPoll] replSet member harukiya:27018 is now in state SECONDARY
#. An arbiter or another node must join
PRIMARY> rs.addArb("your_ip")
{ “ok” : 1 }
Connecting on a replica set
===========================
- The configuration is known from the client side
- If you try to connect to a slave:
./mongo --host harukiya --port 27018
SECONDARY> use test
switched to db test
SECONDARY> db.cliparts.findOne()
Mon Feb 27 03:13:16 uncaught exception: error { "$err" : "not master and slaveok=false", "code" : 13435 }
- You can not execute queries unless you set slaveok=true :
SECONDARY> rs.slaveOk()
SECONDARY> db.cliparts.findOne()
{
"_id" : ObjectId("4f4a3ee88d08530d9516afbf"),
(...)
}
Simulate a network failure
==========================
- Shutdown master or 'stepDown'
PRIMARY> rs.stepDown()
Mon Feb 27 03:21:55 DBClientCursor::init call() failed
Mon Feb 27 03:21:55 query failed : admin.$cmd { replSetStepDown: 60.0 } to: 127.0.0.1
Mon Feb 27 03:21:55 Error: error doing query: failed shell/collection.js:151
Mon Feb 27 03:21:55 trying reconnect to 127.0.0.1
Mon Feb 27 03:21:55 reconnect 127.0.0.1 ok
SECONDARY>
- If there are enough node (a majority), there is a vote and a new primary is elected.
In this case, the PRIMARY has been step down to SECONDARY. And the SECONDARY has been step up to PRIMARY.
There can still have writings on database.
Achievement 5 : MongoDB never fails
===================================
- Fifth achievement :
![MongoDB never fails](07_badge-fail-over.jpeg) MongoDB will never fail !
- Now, let's map reduce
Map/Reduce
==========
- [Source](http://www.slideshare.net/nateabele/building-apps-with-mongodb) from slide 25 to 29
- Define the map function:
> var map = function() {
this.tags.forEach(function(t) {
emit(t, { count : 1 } );
});
}
Reduce
======
- Define the reduce function:
> var reduce = function(key, value) {
var count = 0;
for (var i = 0, len = value.length; i < len; i++) {
count += value[i].count;
}
return { "count" : count };
}
Get the result
==============
- Get the result object
> var result = db.cliparts.mapReduce(map, reduce, { out : "mapreduceresult" });
> result
{
"result" : "mapreduceresult",
"timeMillis" : 188,
"counts" : {
"input" : 2,
"emit" : 6,
"reduce" : 3,
"output" : 3
},
"ok" : 1,
}
- Reading the output collection:
> db.mapreduceresult.find()
{ "_id" : "LAPTOP", "value" : { "count" : 2 } }
{ "_id" : "NOTEBOOK", "value" : { "count" : 2 } }
{ "_id" : "PC", "value" : { "count" : 2 } }
Achievement 6 : Map/Reduce
==========================
- Six achievement :
![MongoDB map reduce](08_badge-map-reduce.jpeg) MongoDB Map Reduce
Thank you
=========
- hu·mon·gous adj
: extremely large : huge <a humongous building> <humongous amounts of money>
perhaps alteration of huge + monstrous
First Known Use: circa 1967
![](02_badge-easy.jpeg) ![](04_badge-nosql.jpeg) ![](05_badge-shell.jpeg) ![](06_badge-java.jpeg) ![](07_badge-fail-over.jpeg) ![](08_badge-map-reduce.jpeg)