# Python and MongoDB


<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/1024px-Python-logo-notext.svg.png" alt="Python Logo" width="100px">
<img src="http://s3.amazonaws.com/info-mongodb-com/_com_assets/media/mongodb-logo-rgb.jpeg" alt="MongoDB Logo" width="200px">

## Python Atlanta May 2016

Rick Copeland

# MongoDB

- What is MongoDB?
- Why use MongoDB?
- How does it integrate with Python?

# What is MongoDB

> MongoDB is an open-source, document database designed for ease of development and scaling.
> - https://docs.mongodb.com/manual/

## Open-Source

Yay!

## Open-source, document database

Documents don't mean Word. Documents, in this case, mean JSON.

## Open-source, document database designed for ease of development

 - Do you like the Python `dict` type? 
 - Would you like to be able to store them in a database easily? 
 
If the answers to those two questions are "yes," you might enjoy using MongoDB!


## Open-source, document database designed for ease of development and scaling

Obligatory... 

> MongoDB is web-scale.

![MongoDB is Web Scale](https://i.ytimg.com/vi/HdnDXsqiPYo/hqdefault.jpg)
https://www.youtube.com/watch?v=b2F-DItXtZs

# Why use MongoDB?

- Need a simple persistence layer
- Changing your schema a lot
- Deal with lots of polymorphic data
- Generally know the apps that will access the DB
- You need performance you're having trouble getting with a relational database

# How does it integrate with Python?

`pip install pymongo`

## Yeah, but what did you *really* do?

- Install docker-machine (https://docs.docker.com/machine/install-machine/)
- Install (and run) docker-pf (https://github.com/noseglid/docker-pf)
- `docker pull mongodb`
- `docker run mongodb --port 27017`
- `pip install pymongo`


# So let's get started

## First, a bit of terminology...

MongoDB is organized into *Databases*, *Collections*, and *Documents*, and can use *Indexes*.

| SQL      | MongoDB    |
|----------|------------|
| Database | Database   |
| Table    | Collection |
| Column   | Field      |
| Row      | Document   |
| Index    | Index      |

# Getting a connection to the server

We get a connection using the `pymongo.MongoClient()` object.

In [1]:
import pymongo
cli = pymongo.MongoClient()    # connect to localhost by default
cli.drop_database('pyatl')

# Accessing Databases and Collections

MongoDB, like Python, makes it easy to inspect data you're unfamiliar with.

In [2]:
cli.database_names()

[u'SilverServer',
 u'blog',
 u'enron',
 u'local',
 u'm101',
 u'school',
 u'students',
 u'test']

In [3]:
db = cli.enron
db.collection_names()

[u'messages']

In [4]:
db.messages

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'enron'), u'messages')

# Inserting data

1. Choose your database
2. Choose your collection
3. Call the `insert_one()` method with a `dict`

In [5]:
from datetime import datetime

db = cli.pyatl
db.meetings.insert_one({
        'date': datetime.utcnow(),
        'talks': [
            { 
                'title': 'Using Anaconda to Get Started with Data Science and Python',
                'speaker': 'Tony Fast',
                'python-version': None},
            {
                'title': 'Python and MongoDB',
                'speaker': 'Rick Copeland',
                'python-version': '2.7'}
            ]})

<pymongo.results.InsertOneResult at 0x10997c320>

# Finding data

On the collection object again, call `find()` or `find_one()`:

In [6]:
db.meetings.find()

<pymongo.cursor.Cursor at 0x10998c150>

In [7]:
for meeting in db.meetings.find():
    print meeting

{u'date': datetime.datetime(2016, 5, 12, 20, 44, 27, 73000), u'talks': [{u'python-version': None, u'speaker': u'Tony Fast', u'title': u'Using Anaconda to Get Started with Data Science and Python'}, {u'python-version': u'2.7', u'speaker': u'Rick Copeland', u'title': u'Python and MongoDB'}], u'_id': ObjectId('5734eb2bb26b933dcb67e147')}


In [8]:
db.meetings.find_one()

{u'_id': ObjectId('5734eb2bb26b933dcb67e147'),
 u'date': datetime.datetime(2016, 5, 12, 20, 44, 27, 73000),
 u'talks': [{u'python-version': None,
   u'speaker': u'Tony Fast',
   u'title': u'Using Anaconda to Get Started with Data Science and Python'},
  {u'python-version': u'2.7',
   u'speaker': u'Rick Copeland',
   u'title': u'Python and MongoDB'}]}

# Finding data (queries)

You can find documents in MongoDB by querying by example:

In [9]:
meeting = db.meetings.find_one()
db.meetings.find_one({'_id': meeting['_id']})

{u'_id': ObjectId('5734eb2bb26b933dcb67e147'),
 u'date': datetime.datetime(2016, 5, 12, 20, 44, 27, 73000),
 u'talks': [{u'python-version': None,
   u'speaker': u'Tony Fast',
   u'title': u'Using Anaconda to Get Started with Data Science and Python'},
  {u'python-version': u'2.7',
   u'speaker': u'Rick Copeland',
   u'title': u'Python and MongoDB'}]}

# Finding data (queries)

You can find documents in MongoDB by querying using the query language:

In [10]:
from datetime import date, time
today_min = datetime.combine(date.today(), time.min)
today_max = datetime.combine(date.today(), time.max)

In [11]:
for meeting in db.meetings.find({'date': {'$gte': today_min, '$lte': today_max}}):
    talks = meeting['talks']
    print [talk['title'] for talk in talks]

[u'Using Anaconda to Get Started with Data Science and Python', u'Python and MongoDB']


# Updating data (replace)

You can update data by overwriting a document using `replace_one()`:

In [12]:
db.names.insert_one({'name': 'Rick'})
name_doc = db.names.find_one()

In [13]:
name_doc['name'] = 'Richard'
db.names.replace_one({'_id': name_doc['_id']}, name_doc)
db.names.find_one()

{u'_id': ObjectId('5734eb2bb26b933dcb67e148'), u'name': u'Richard'}

# Updating data (modify)

In [14]:
meeting = db.meetings.find_one()
db.meetings.update_one(
    {'_id': meeting['_id']},
    {
        '$set': {'talks.1.speaker': 'Richard Copeland'}
    })
db.meetings.find_one()['talks'][1]


{u'python-version': u'2.7',
 u'speaker': u'Richard Copeland',
 u'title': u'Python and MongoDB'}

In [15]:
db.meetings.update_one(
    {'_id': meeting['_id']},
    {
        '$push': {'talks': {
                'speaker': 'Dan Rocco',
                'title': 'Something even more awesome',
                'python-version': ['2.7', '3.5']
            }}
    }
)
db.meetings.find_one()['talks']


[{u'python-version': None,
  u'speaker': u'Tony Fast',
  u'title': u'Using Anaconda to Get Started with Data Science and Python'},
 {u'python-version': u'2.7',
  u'speaker': u'Richard Copeland',
  u'title': u'Python and MongoDB'},
 {u'python-version': [u'2.7', u'3.5'],
  u'speaker': u'Dan Rocco',
  u'title': u'Something even more awesome'}]

# Indexes

Use them.

Seriously, though. Typical process:

1. Develop with tiny data set, observe that MongoDB is lightning-fast
2. Deploy to production with no users (yet), see that MongoDB is still lightning-fast
3. Users sign up
4. System quickly grinds to a standstill 
5. **MongoDB doesn't scale!!!! Don't use MongoDB!!!!**

Each cursor object has an `.explain()` method. Read up on its output, and use it. Things you don't want to see:

- Collection scan
- ScanAndOrder


# Aggregation

MongoDB includes a framework for applying transformations to documents in a pipeline.

In [16]:
db = cli.enron
curs = db.messages.aggregate(
    [
        {'$match': {'headers.To': 'bryan.hull@enron.com'}},
        {'$group': {
                '_id': '$headers.From',
                'count': {'$sum': 1}
            }},
        {'$sort': {'count': -1}},
        {'$limit': 5}
        ])
for result in curs:
    print result

{u'count': 171, u'_id': u'matthew.lenhart@enron.com'}
{u'count': 65, u'_id': u'veronica.espinoza@enron.com'}
{u'count': 64, u'_id': u'veronica.gonzalez@enron.com'}
{u'count': 46, u'_id': u'phillip.love@enron.com'}
{u'count': 24, u'_id': u'eric.bass@enron.com'}


# *Lots* and lots more...

- Indexing
- GeoSpatial queries
- Full-text queries
- Replication
- Sharding (partitioning)
- Schema design


... but really 99% of your problems can be solved by fixing your schema and adding the right index.

# Thank you

Any questions?