# Introduction

MongoDB is a powerful, flexible, and scalable general-purpose database. It combines the ability to scale out with features such as secondary indexes, range queries, sorting, aggregations, and geospatial indexes.

MongoDB was made for high scalability. It does so by providing features like sharding, replication etc out of the box. It solve the scalability problems of relational database, by sacrificing some relational database principles and offloading the processing to the client side code.

## Features

Following are a bunch of features of MongoDB. It provides all of these features without sacrificing speed.

- Ease of use
    MongoDB is a document based database.A document-oriented database replaces the concept of a "row" with a more flexible model, the "document.". Unlike rows, documents don't have a fixed structure so developer can add any key-value pair at a given time.

- Easy Scaling
    MongoDB was build with scalability in mind. The document driven model help a lot in horizontal scaling and MongoDB provides features such as sharding, replica etc out of the box.

- Indexing
    MongoDB supports generic secondary indexes, allowing a variety of fast queries, and provides unique, compound, geospatial, and full-text indexing capabilities as well.

- Aggregation
    MongoDB supports an "aggregation pipeline" that allows you to build complex aggregations from simple pieces and allow the database to optimize it.

- Special collection types
    MongoDB supports time-to-live collections for data that should expire at a certain time, such as sessions. It also supports fixed-size collections, which are useful for holding recent data, such as logs.

- File storage
    MongoDB supports an easy-to-use protocol for storing large files and file metadata.


# Getting Started

Here we go through the basics of MongoDB.

## Installation

MongoDB works by providing a service called as mongod. This service is provides all the database operations. All of the MongoDB client (eg: mongo shell) connect to this service and perform operations when required by the user.

For installation on Ubuntu follow this guide.

Config file of mongod is at ```/etc/mongod.conf```

## Documents
At the heart of MongoDB is the document: an ordered set of keys with associated values. An example of a document is as below.

In [None]:
{
    "key1": "value1",
    "key2": 2,
    "key3": ["abc", "pqr", 3]
}

Things to note.

- Keys must not contain the character the null character. It is used to signify the end of a key.
- Key name is case-sensitive, ```Key1``` and ```key1``` are different.
- Key/value pairs in documents are ordered: ```{"x" : 1, "y" : 2}``` is not the same as {"y" : 2, "x" : 1} .

## Collection

A collection is a group of documents. If a document is the MongoDB analog of a row in a relational database, then a collection can be thought of as the analog to a table.

Collections have dynamic schema's. This means that the documents within a single collection can have any number of different "shapes" For example, both of the following documents could be stored in a single collection:

In [None]:
{ "key1": "value1" }

{ "key2": 2 }

Things to note
- One convention for organizing collections is to use namespaced sub collections separated by the ``.`` (dot) character. For example, an application containing a ```blog``` might have a collection named ```blog.posts``` and a separate collection named ```blog.authors```. This is for organizational purposes only-there is no relationship between the ```blog``` collection (it doesn't even have to exist) and its "children".

## Database

In addition to grouping documents by collection, MongoDB groups collections into databases. A single instance of MongoDB can host several databases, each grouping together zero or more collections. A database has its own permissions, and each database is stored in separate files on disk. A good rule of thumb is to store all data for a single application in the same database.

Things to note
- A database name cannot contain any of these characters: /, , ., " , *, <, >, :, |, ?, \$, (a single space), or the null character. Basically, stick with alphanumeric ASCII.
- Database names are case-sensitive
- Database names are limited to a maximum of 64 bytes.

## Existing Database
There are also several reserved database names, which you can access but which have special semantics. These are as follows:

- ```admin```

    This is the **root** database, in terms of authentication. If a user is added to the ```admin``` database, the user automatically inherits permissions for all databases. There are also certain server-wide commands that can be run only from the ```admin``` database, such as listing all of the databases or shutting down the server.

- ```local```

    This database will never be replicated and can be used to store any collections that should be local to a single server.

- ```config```

    When MongoDB is being used in a sharded setup , it uses the config database to store information about the shards.

## Document Data Type
Following data types for values are supported by mongodb.

- **null**
    
    Null can be used to repesent both a null value and a nonexistent field.

In [None]:
{ "key1":  null }

- **boolean**

    There is a boolean type, which can be used for the values true and false.

In [None]:
{ "key1":  true }

- **number**

    The shell defaults to using 64-bit floating point numbers

In [None]:
{ "key1":  3.14 }

//Or 

{ "key1":  3 }

For integers, use the ```NumberInt``` or ```NumberLong``` classes, which represent 4-byte or 8-byte signed integers, respectively.

In [None]:
{"key1" : NumberInt("3")}

{"key1" : NumberLong("3")}

- **string**

    Any string of UTF-8 characters can be represented using the string type.

In [None]:
{"key1" : "foobar"}

- **date**

    Dates are stored as milliseconds since the epoch. The time zone is not stored

In [None]:
{"key1" : new Date()}

- **regular expression**

    Queries can use regular expressions using JavaScript's regular expression syntax

In [None]:
{"key1" : /foobar/i}

- **array**

    Sets or lists of values can be represented as arrays

In [None]:
{"key1" : ["a", "b", "c"]}

- **embedded document**

    Documents can contain entire documents embedded as values in a parent document

In [None]:
{"key1" : {"foo" : "bar"}}

- **object id**

    An object id is a 12-byte ID for documents.

In [None]:
{"key1" : ObjectId()}

- **binary data**

    Binary data is a string of arbitrary bytes. It cannot be manipulated from the shell. Binary data is the only way to save non-UTF-8 strings to the database.

- **code**

    Queries and documents can also contain arbitrary JavaScript code:

In [None]:
{"key1" : function() { /* ... */ }}

## _id and ObjectIds

Every document stored in MongoDB must have an ```_id``` key. The ```_id``` key's value can be any type, but it defaults to an ObjectId . In a single collection, every document must have a unique value for ```_id``` , which ensures that every document in a collection can be uniquely identified.

### ObjectIds
The 12-byte ID generated by the function ObjectId has following format.

![object_id_format](/static/img/notes/books/databases/mongo/objectid_format.png)

- The first four bytes of an ObjectId are a timestamp in seconds since the epoch, this provides uniqueness at the granularity of a second.
- Next three bytes of an ObjectId are a unique identifier of the machine on which it was generated. This is usually a hash of the machine's hostname.
- To provide uniqueness among different processes generating ObjectId concurrently on a single machine, the next two bytes are taken from the process identifier (PID) of the ObjectId -generating process.
- The last three bytes are simply an incrementing counter that is responsible for uniqueness within a second in a single process. This allows for up to $256^3= 16,777,261$ unique ObjectId to be generated per process in a single second.

## Mongo Shell

Mongo shell can be executed with ```mongo``` command. It is a javascript interpreter and provides some mongo specific functionality.

There is usually a file present in the home folder name ```.mongorc``` which is loaded every time mongo shell is started. We can use this file to initialize some utility function or declare some global variables.

# CUD of document

MongoDB (MongoShell/Mongo Library) provides three methods to modify content of a document, namely ```insert()```, ```update()``` and ```remove()```.

## Inserting documents
A single document can be inserted into a collection by using the insert() method of that collection.

For example to insert/create a user document in a user collection of ```blog``` database we could write the following code in mongoshell

In [None]:
var user_data = {
    "first_name": "user1",
    "last_name": "user1",
    "age": 10
};

use blog;
// Here blog is name of our database. 
// This statment will select blog db if it exists or it will create one.
// Note: use is not a valid javascript keyword, it is provided by mongo shell.

db.users.insert(user_data); 
// Here we insert a document into users collection.
// If users collection does not exists if will be created then document will be inserted. 

To insert a bulk of document in an atomic way we can use ```batchInsert``` method, as follows.

In [None]:
var user_list = [
    {
        "first_name": "user_1";
    },{
        "first_name": "user_2";
    },{
        "first_name": "user_3";
    }
];

db.users.batchInsert(user_list);

**NOTE:**

If there are an invalid document in an array, then only the document before the invalid document will be inserted in the array and rest will be ignored. For example.

In [None]:
db.foo.batchInsert([{"_id" : 0}, {"_id" : 1}, {"_id" : 1}, {"_id" : 2}]) 

For the above statement only document ```_id:0``` and ```_id:1``` will be created and scene the third document contain and error (Document with same id) the rest of the array will be ignored. To prevent this we can use the ```continueOnError``` option to continue after an insert failure.

**NOTE:**

MongoDB does minimal checks on data being inserted: it check's the document's basic structure and adds an ```_id``` field if one does not exist. One of the basic structure checks is size, all documents must be smaller than 16 MB. This is a somewhat arbitrary limit (and may be raised in the future); it is mostly to prevent bad schema design and ensure consistent performance.

## Removing Document

Document can be removed/deleted from the collection by using the remove method as follow.

In [None]:
db.users.remove({"first_name": "user_1"})
//Remove a document from user collection matching the criteria 

db.user.remove()
//Remove all document from collection.
//Note: this operation is much slower than actually drop a collection and recreating it.

## Updating Document

Document can be updated using the ```update()``` method. ```update()``` method takes two parameters: a query document, which locates documents to update, and a modifier document, which describes the changes to make to the documents found.

In [None]:
db.users.update({"first_name": "user_1"}, {"x": "y"})

//This method replaces the entire user_1 document with new document x,y.
//Note: Update doesn't updates the document it replaces the entire content.

### Update Modifier
Update modifiers are special keys that can be used to specify complex update operations, such as altering, adding, or removing keys, and even manipulating arrays and embedded documents. Following are the list of commonly used modifiers.

#### ```$set```
```$set``` sets the value of a field. If the field does not yet exist, it will be created. This is used to update a field in a document instead of replacing the entire document. $set can be used to update the value as well as the data type of the value. For example.

In [None]:
db.users.update({"_id" : ObjectId("4b253b067525f35f94b60a31")}, 
                    {"$set" : {"favorite book" : "War and Peace"}})
// Add/Update "favorite book" key to user.

db.users.update({"_id" : ObjectId("4b253b067525f35f94b60a31")}, 
                    {"$set" : {"favorite book" :  [ "War and Peace", "Calculus"] }})
// Update "favorite book" to an array 

#### ```$inc```
The ```$inc``` modifier can be used to increment/decrement the value for an existing key or to create a new key if it does not already exist. For example

In [None]:
db.users.update({"_id" : ObjectId("4b253b067525f35f94b60a31")}, {"$inc" : {"age" : 10}})
// If age key exits it will be increment by 10.
// If age key doesn't exists, $inc will add a age key and assign it value 10.

### Array Modifier

#### ```$push```
```$push``` adds elements to the end of an array if the array exists and creates a new array if it does not. For example.

In [None]:
db.users.update({"_id" : ObjectId("4b253b067525f35f94b60a31")}, {"$push" : {"friends" : { "first_name": "friend 1"}}})
// If friends array exits then it will be append new document to it.
// If friends array doesn't exists, $push will create a array with friends key and append new document to it.

This is the simple form of push, but you can use it for more complex array operations as well. You can push multiple values in one operation using the ```$each``` suboperator.

In [None]:
db.stock.ticker.update({"_id" : "GOOG"}, {"$push" : {"hourly" : {"$each" : [562.776, 562.790, 559.123]}}})

This would push three new elements onto the array.

If we only want the array to grow to a certain length, you can also use the $slice operator in conjunction with ```$push``` to prevent an array from growing beyond a certain size, effectively making a top N list of items.

In [None]:
db.movies.find({"genre" : "horror"},
    {"$push" : {"top10" : {
        
        "$each" : ["Nightmare on Elm Street", "Saw"],
        
        "$slice" : -10}}})

This example would limit the array to the last 10 elements pushed. Slices must always be negative numbers.

Finally, you can ```$sort``` before trimming, so long as you are pushing subobjects onto the array

In [None]:
db.movies.find({"genre" : "horror"},
    {"$push" : {"top10" : {

        "$each" : [{"name" : "Nightmare on Elm Street", "rating" : 6.6}, {"name" : "Saw", "rating" : 4.3}],

        "$slice" : -10,

        "$sort" : {"rating" : -1}}}})

This will sort all of the objects in the array by their rating field and then keep the first 10.

We can also use array as a set, by using the modifier ```$addToSet.``` For example When adding another email address, we can use ```$addToSet``` to prevent duplicates

In [None]:
db.users.update({"_id" : ObjectId("4b2d75476cc613d5ee930164")}, 
        {"$addToSet" : {"emails" : "joe@gmail.com"}})

#### ```$pop```

There are a few ways to remove elements from an array. If we want to treat the array like a queue or a stack, we can use ```$pop``` , which can remove elements from either end. ```{"$pop" : {"key" : 1}}``` removes an element from the end of the array. ```{"$pop" : {"key" : -1}}``` removes it from the beginning.

For example, suppose we have a list of things that need to be done but not in any specific order

In [None]:
db.lists.insert({"todo" : ["dishes", "laundry", "dry cleaning"]})

If we do the laundry first, we can remove it from the list with the following

In [None]:
db.lists.update({}, {"$pull" : {"todo" : "laundry"}})

## Changing size of document

When you start inserting documents into MongoDB, it puts each document right next to the previous one on disk. Thus, if a document gets bigger, it will no longer fit in the space it was originally written to and will be moved to another part of the collection. We can see this from example.

In [None]:
b.coll.insert({"x" :"a"})
db.coll.insert({"x" :"b"})
db.coll.insert({"x" :"c"})
db.coll.find();

// "_id" : ObjectId("507c3581d87d6a342e1c81d3"), "x" : "a" }
// "_id" : ObjectId("507c3583d87d6a342e1c81d4"), "x" :"b" }
// "_id" : ObjectId("507c3585d87d6a342e1c81d5"), "x" : "c" }

db.coll.update({"x" : "b"}, {$set: {"x" : "bbb"}})
db.coll.find()

// "_id" : ObjectId("507c3581d87d6a342e1c81d3"), "x" : "a" }
// "_id" : ObjectId("507c3585d87d6a342e1c81d5"), "x" : "c" }
// "_id" : ObjectId("507c3583d87d6a342e1c81d4"), "x" :"bbb" }

We can see this in action by creating a new collection with just a few documents and then making a document that is sandwiched between two other documents larger. It will be bumped to the end of the collection.

When MongoDB has to move a document, it bumps the collection's padding factor, which is the amount of extra space MongoDB leaves around new documents to give them room to grow. You can see the padding factor by running ```db.coll.stats()```. Before doing the update above, the "paddingFactor" field will be 1: allocate exactly the size of the document for each new document, as shown in Figure below.

![doc_size_1](/static/img/notes/books/databases/mongo/doc_size_1.png)

If you run it again after making one of the documents larger (as shown in Figure 3-2), you'll see that it has grown to around 1.5, each new document will be given half of its size in free space to grow.

![doc_size_2](/static/img/notes/books/databases/mongo/doc_size_2.png)

If subsequent updates cause more moves, the padding factor will continue to grow (although not as dramatically as it did on the first move). If there aren't more moves, the padding factor will slowly go down.

![doc_size_3](/static/img/notes/books/databases/mongo/doc_size_3.png)

## Upserts

An upsert is a special type of update. If no document is found that matches the update criteria, a new document will be created by combining the criteria and updated documents. If a matching document is found, it will be updated normally.

In [None]:
db.analytics.update({"url" : "/blog"}, {"$inc" : {"pageviews" : 1}}, true)

This code create a document is it doesn't exists, and if it exists then it will update the document.

## Updating Multiple documents

Updates, by default, update only the first document found that matches the criteria. If there are more matching documents, they will remain unchanged. To modify all of the documents matching the criteria, you can pass true as the fourth parameter to update.

In [None]:
db.users.update({"birthday" : "10/13/1978"},
                {"$set" : {"gift" : "Happy Birthday!"}}, false, true)

# Reading Documents.

Mongo provides a ```find()``` method to perform queries in MongoDB. Example are given below.

In [None]:
db.users.find({"age" : 27})
// Find user with age 27

db.users.find({"age" : 27}, {"username" : 1, "email" : 1})
// Specify which keys to return, return only username and email

## Query Criteria

### Query Conditionals
```$lt``` , ```$lte``` , ```$gt``` , and ```$gte``` are all comparison operators, corresponding to ```<, <=, >,``` and ```>=``` respectively. They can be combined to look for a range of values. For example, to look for users who are between the ages of 18 and 30, we can do this

In [None]:
db.users.find({"age" : {"$gte" : 18, "$lte" : 30}})

### OR Queries

There are two ways to do an OR query in MongoDB. ```$in``` can be used to query for a variety of values for a single key. ```$or``` is more general; it can be used to query for any of the given values across multiple keys. For example,

In [None]:
db.raffle.find({"ticket_no" : {"$in" : [725, 542, 390]}})

db.raffle.find({"$or" : [{"ticket_no" : 725}, {"winner" : true}]})


db.raffle.find({"$or" : [{"ticket_no" : {"$in" : [725, 542, 390]}},
                        {"winner" : true}]})

### ```$not```
```$not``` is a metaconditional: it can be applied on top of any other criteria. If we want, to return users with id_num's of 2, 3, 4, 5, 7, 8, 9, 10, 12, etc., we can use ```$not```

In [None]:
db.users.find({"id_num" : {"$not" : {"$mod" : [5, 1]}}})

## Querying Arrays

If you need to match arrays by more than one element, you can use ```$all```.

In [None]:
db.food.insert({"_id" : 1, "fruit" : ["apple", "banana", "peach"]})
db.food.insert({"_id" : 2, "fruit" : ["apple", "kumquat", "orange"]})
db.food.insert({"_id" : 3, "fruit" : ["cherry", "banana", "apple"]})

db.food.find({fruit : {$all : ["apple", "banana"]}})
// {"_id" : 1, "fruit" : ["apple", "banana", "peach"]}
// {"_id" : 3, "fruit" : ["cherry", "banana", "apple"]}

A useful conditional for querying arrays is ```$size``` , which allows you to query for arrays of a given size.

In [None]:
db.food.find({"fruit" : {"$size" : 3}})

```$slice``` operator can be used to return a subset of elements for an array key

In [None]:
db.blog.posts.findOne(criteria, {"comments" : {"$slice" : 10}})
// Return first 10 comments

db.blog.posts.findOne(criteria, {"comments" : {"$slice" : -10}})
// Return last 10 comments

db.blog.posts.findOne(criteria, {"comments" : {"$slice" : [23, 10]}})
// skip the first 23 elements and return the 24th through 33th

## Querying on Embedded Documents

We can query the embeded document by using the ```.``` (dot) syntax. For example.

In [None]:
/*
{
    "name" : {
        "first" : "Joe",
        "last" : "Schmoe"
    },
    "age" : 45
}
*/

db.people.find({"name.first" : "Joe", "name.last" : "Schmoe"})

## ```$where``` Queries

For queries that cannot be done any other way, there are ```$where``` clauses, which allow you to execute arbitrary JavaScript as part of your query. This allows you to do (almost) anything within a query.

In [None]:
db.foo.find({"$where" : function () {
        if(this['key'] == 'value'){
            return true;
        }else{
            return false;
        }

    }});

## Cursors

The database returns results from find using a cursor. The client-side implementations of cursors generally allow you to control a great deal about the eventual output of a query. You can limit the number of results, skip over some number of results, sort results by any combination of keys in any direction, and perform a number of other powerful operations.

If we assign the result of ```find()``` to a variable, it is stored in a form of a cursor which only request a particularly amount of data, mostly 4MB at a time.

## Database Commands

There is one very special type of query called a database command. Database commands do "everything else", from administrative tasks like shutting down the server and cloning databases to counting documents in a collection and performing aggregations.

In [None]:
db.runCommand({"drop" : "test"});
// dropping a collection is done via the " drop " database command

db.test.drop()
// The shell helper, which wraps the command and provides a simpler interface

### How Commands Work

A database command always returns a document containing the key ```ok``` . If ```ok``` is ```1``` , the command was successful; and if it is 0 , the command failed for some reason. If ```ok``` is ```0``` then an additional key will be present, ```errmsg``` . The value of ```errmsg``` is a string explaining why the command failed.

Commands in MongoDB are implemented as a special type of query that gets performed on the ```$cmd``` collection. When the MongoDB server gets a query on the ```$cmd``` collection, it handles it using special logic, rather than the normal code for handling queries.

# Special Collection and Index Types

## Explain function

Mongo shell has an ```explain()``` method, to see more internal information about a query.

In [None]:
db.users.find({"age" : 42}).explain()

// Output: 
//
// {
//   "cursor": "BtreeCursor age_1_username_1",
//   "isMultiKey": false,
//   "n": 8332,
//   "nscannedObjects": 8332,
//   "nscanned": 8332,
//   "nscannedObjectsAllPlans": 8332,
//   "nscannedAllPlans": 8332,
//   "scanAndOrder": false,
//   "indexOnly": false,
//   "nYields": 0,
//   "nChunkSkips": 0,
//   "millis": 91,
//   "indexBounds": {
//     "age": [
//       [
//         42,
//         42
//       ]
//     ],
//     "username": [
//       [
//         {
//           "$minElement": 1
//         },
//         {
//           "$maxElement": 1
//         }
//       ]
//     ]
//   },
//   "server": "ubuntu:27017"
// }

Here we get following information.

- **"cursor" : "BtreeCursor age_1_username_1"**

    BtreeCursor means that an index was used, specifically, the index on age and username: {"age" : 1, "username" : 1}. For no index query it is "BasicCursor".


- **"isMultiKey" : false**

    If this query used a multikey index


- **"n" : 8332**

    Number of documents returned by the query.


- **"nscannedObjects" : 8332**

    This is a count of the number of times MongoDB had to follow an index pointer to the actual document on disk. If the query contains criteria that is not part of the index or requests fields back that aren't contained in the index, MongoDB must look up the document each index entry points to.


- **"nscanned" : 8332**

    The number of index entries looked at if an index was used. If this was a table scan, it is the number of documents examined.


- **"scanAndOrder" : false**

    If MongoDB had to sort results in memory.


- **"indexOnly" : false**

    If MongoDB was able to fulfill this query using only the index entries. In this example, MongoDB found all matching documents using the index, which we know because "nscanned" is the same as "n" . However, the query was told to return every field in the matching documents and the index only contained the "age" and "username" fields. If we changed the query to have a second argument, {"_id" : 0, "age" : 1, "username" : 1} , then it would be covered by the index and "indexOnly" would be true .


- **"nYields": 0**

    The number of times this query yielded (paused) to allow a write request to proceed. If there are writes waiting to go, queries will periodically release their lock and allow them to do so.


- **"millis" : 91**

    The number of milliseconds it took the database to execute the query.


- **"indexBounds" : {…}**

    This field describes how the index was used, giving ranges of the index traversed. As the first clause in the query was an exact match, the index only needed to look at that value: 42. The second index key was a free variable, as the query didn't specify any restrictions to it
    
## Index

MongoDB has the same concept of index as the relational database. Index generally form a B-Tree on the key that is being indexed. Then mongodb uses the index when querying the document for that given key. A query that does not use an index is called a table scan (a term inherited from relational databases), which means that the server has to "look through the whole collection" to find a query's results. 

Index can be created using following method.

In [None]:
db.users.ensureIndex({"username" : 1})

### Unique Index

Unique indexes guarantee that each value will appear at most once in the index. For example, if we want to make sure no two documents can have the same value in the "username" key, we can create a unique index.

In [None]:
db.users.ensureIndex({"username" : 1}, {"unique" : true})

**NOTE:**

Unique indexes count null as a value, so we cannot have a unique index with more than one document missing the key.

### Sparse Indexes

As mentioned in an earlier section, unique indexes count null as a value, so we cannot have a unique index with more than one document missing the key. However, there are lots of cases where we may want the unique index to be enforced only if the key exists. If we have a field that may or may not exist but must be unique when it does, we can combine the unique option with the sparse option.

In [None]:
db.ensureIndex({"email" : 1}, {"unique" : true, "sparse" : true})

## Capped Collections

Normal collections in MongoDB are created dynamically and automatically grow in size to fit additional data. MongoDB also supports a different type of collection, called a capped collection, which is created in advance and is fixed in size. Capped collections behave like circular queues, if we're out of space, the oldest document will be deleted, and the new one will take its place.

Certain operations are not allowed on capped collections. Documents cannot be removed or deleted (aside from the automatic age-out described earlier), and updates that would cause documents to grow in size are disallowed. By preventing these two operations, we guarantee that documents in a capped collection are stored in insertion order and that there is no need to maintain a free list for space from removed documents.

To create a capped collection we can do the following.

In [None]:
db.createCollection("my_collection2", {"capped" : true, "size" : 100000, "max" : 100})

// Where size (required): Size in bytes for the collection.
// max (optional): Max number of documents.

## Tailable Cursors

Tailable cursors are a special type of cursor that are not closed when their results are exhausted. They were inspired by the tail -f command and, similar to the command, will continue fetching output for as long as possible. Because the cursors do not die when they run out of results, they can continue to fetch new results as documents are added to the collection. Tailable cursors can be used only on capped collections, since insert order is not tracked for normal collections.

Tailable cursors are often used for processing documents as they are inserted onto a "work queue" (the capped collection). Because tailable cursors will time out after 10 minutes of no results, it is important to include logic to re-query the collection if they die.

## Time-To-Live Indexes

Time-to-live (TTL) indexes allows us to set a timeout for each document. When a document reaches a preconfigured age, it will be deleted. This type of index is useful for caching problems like session storage.

## Full-Text Indexes

MongoDB has a special type of index for searching for text within documents. We can query for strings using exact matches and regular expressions, but these techniques have some limitations. Searching a large block of text for a regular expression is slow and it's tough to take linguistic issues into account (e.g., that "entry" should match "entries"). Full-text indexes give us the ability to search text quickly, as well as provide built-in support for multi-language stemming and stop words.

For example.

In [None]:
db.hn.ensureIndex({"title" : "text"})
// To run a search over the text, we first need to create a "text" index

db.runCommand({"text" : "hn", "search" : "ask hn"})
// This will match titles like "Ask Hn", "Show Hn"

## Geospatial Indexing

MongoDB has a few types of geospatial indexes. The most commonly used ones are ```2dsphere``` , for surface-of-the-earth-type maps, and 2d , for flat maps (and time series data).

```2dsphere``` allows us to specify points, lines, and polygons in ```GeoJSON``` format. A point is given by a two-element array, representing ```[ longitude , latitude ]```. For example,

In [None]:
{
    "name" : "New York City",
    "loc" : {
        "type" : "Point",
        "coordinates" : [50, 2]
    }
}

# GridFS

GridFS is a mechanism for storing large binary files in MongoDB.

**Advantage of GridFS**.

- Using GridFS can simplify our stack. If we are already using MongoDB, we might be able to use GridFS instead of a separate tool for file storage.

- GridFS will leverage any existing replication or autosharding that we've set up for MongoDB, so getting failover and scale-out for file storage is easier.

- GridFS can alleviate some of the issues that certain filesystems can exhibit when being used to store user uploads. For example, GridFS does not have issues with storing large numbers of files in the same directory.

- We can get great disk locality with GridFS, because MongoDB allocates data files in 2 GB chunks.

**Disadvantage of GridFS**.

- Slower performance: accessing files from MongoDB will not be as fast as going directly through the filesystem.

- We can only modify documents by deleting them and resaving the whole thing. MongoDB stores files as multiple documents so it cannot lock all of the chunks in a file at the same time.

## Working With GridFS

GridFS generally provides following method, list (List all files), delete (Delete a specific file), search (Search for a file), put (Add a file from filesystem), get (Save file from GridFS to filesystem).

## Under the hood.

GridFS is a lightweight specification for storing files that is built on top of normal MongoDB documents. The MongoDB server actually does almost nothing to “special-case” the handling of GridFS requests; all the work is handled by the client-side drivers and tools.

The basic idea behind GridFS is that we can store large files by splitting them up into chunks and storing each chunk as a separate document. Because MongoDB supports storing binary data in documents, we can keep storage overhead for chunks to a minimum. In addition to storing each chunk of a file, we store a single document that groups the chunks together and contains metadata about the file.

The chunks for GridFS are stored in their own collection. By default chunks will use the collection ```fs.chunks```, but this can be overridden. Within the chunks collection the structure of the individual documents is pretty simple.

In [None]:
{
    "_id" : ObjectId("..."),
    "n" : 0,
    "data" : BinData("..."),
    "files_id" : ObjectId("...")
}

Where,

- ```files_id``

    The ```_id``` of the file document that contains the metadata for this chunk.


- ```n```

    The chunk's position in the file, relative to the other chunks.


- ```data```

    The bytes in this chunk of the file.
   
The metadata for each file is stored in a separate collection, which defaults to ```fs.files```. With following keys.

- ```_id```

    A unique id for the file—this is what will be stored in each chunk as the value for the “files_id” key.


- ```length```

    The total number of bytes making up the content of the file.


- ```chunkSize```

    The size of each chunk comprising the file, in bytes. The default is 256K, but this can be adjusted if needed.


- ```uploadDate```

    A timestamp representing when this file was stored in GridFS.


- ```md5```

    An md5 checksum of this file’s contents, generated on the server side.

# References

- MongoDB: The definitive Guide, Kristina Chodorow ((link)[http://shop.oreilly.com/product/0636920028031.do])