# MongoDB

## Some fiddling with setup is still required
You need a running MongoDB instance on the default host and port on your computer for this exercise. We will then interact with MongoDB with the help of a Python modul called `'pymongo'`. 


1. First install MongoDB. Depending on whether you use **anaconda** or **brew** type the following command on the terminal
```
conda install -c anaconda mongodb
```
```
brew install mongodb
```
In case this does not work for you, please followw the detailed instructions from the MongoDB website: https://docs.mongodb.com/manual/installation/

2. Then run the following command from the terminal to install the 'pymongo' package for Python. Depending on whether you used **anaconda** or **pip** to install your dependencies, run either of following commands on your terminal:
```
pip install pymongo
```
```
conda install -c anaconda pymongo
```

3. Before you start MongoDB for the first time, you need to create a directory to which the mongod process can write data. We later need to specify this directory when we start MongoDB. I named my director ``mongodb-data``:
```
mkdir ~/mongodb-data
```

4. Then, before you run the notebook, start your MongoDB instance by typing following command into your terminal:
```
mongod --dbpath ~/mongodb-data
```
You can stop it again with
```
CTRL + C
```


In case you run into problems, please post the steps you took, your error message, and a screenshot on Piazza.

## Let's get started

In [None]:
from pymongo import MongoClient

In [None]:
maxSevSelDelay = 1 # Assume 1ms maximum server selection delay
client = MongoClient(serverSelectionTimeoutMS = maxSevSelDelay)

In [None]:
# Verify that the instance is running
try:
    client.admin.command('ismaster') # The ismaster command is cheap and does not require auth
except Exception as ex:
    # ex
    print "We have a problem: the server is not running"

## Class examples

In [None]:
client.delete_database['test-database']

db = client['test-database']

In [None]:
users = db['user']

users.drop()

In [None]:
user1 = {"name": "Alice",
         "age" : 21,
         "status": "A",
         "groups": ["algorithms", "theory"]}

In [None]:
users.count()

In [None]:
uid = users.insert_one(user1).inserted_id
uid

In [None]:
uid = users.insert_one(
    {"name": "Bob",
     "age" : 18,
     "status": "B",
     "groups": ["databases", "cooking"]}
).inserted_id
uid

In [None]:
# List all of the collections in our database:
db.collection_names(include_system_collections=False)

In [None]:
users.count()

In [None]:
# return a single document (matching a query)
users.find_one({"name" : "Bob"})

In [None]:
user3 = {"name": "Charly",
         "age" : 22,
         "status": "A",
         "groups": ["databases", "cars"]}
user4 = {"name": "Dorothee",
         "age" : 16,
         "status": "A",
         "groups": ["cars", "sports"]}

In [None]:
result = users.insert_many([user3, user4])

users.count()

In [None]:
import pprint # allows to pretty print

# find users of age 18
for user in users.find({"age": 18}): 
    pprint.pprint(user)

In [None]:
# find users younger than 19
for user in users.find({"age": {"$lt": 19}}): 
    pprint.pprint(user)

In [None]:
# find names of users younger than 19
for user in users.find({"age": {"$lt": 19}}, projection={"name": 1, "_id" : 0}): 
    pprint.pprint(user)

For more examples see, e.g.:
http://api.mongodb.com/python/current/tutorial.html

## More examples

In [None]:
client.delete_database['example_db']

# operator overloading is cool!
db = client['example_db']
db = client.example_db

In [None]:
# collections are tables
collection = db['mycollection']

In [None]:
collection.count()

In [None]:
doc_id = collection.insert_one({"name": "Peter", "age" : 99})

In [None]:
print doc_id.inserted_id

In [None]:
collection.find_one()

In [None]:
collection.insert_one({"name": "Ruth", "age" : 93})

In [None]:
collection.find_one()

In [None]:
collection.find_one({"name" : "Ruth"})

In [None]:
collection.find_one({"name" : "bob"})

In [None]:
for r in collection.find({"name" : "Ruth"}):
    print r

In [None]:
collection.insert_one({"this_is": "bananas", "schemas" : "LOL"})

In [None]:
for r in collection.find():
    print r

In [None]:
collection.insert_one({"russian" : {"nesting" : {"dolls" : "rock"}}})

In [None]:
collection.find_one("russian")

In [None]:
rec = collection.find_one({"russian":{"$exists":True}})
print rec

In [None]:
rec['russian']

In [None]:
rec['russian']['nesting']['dolls']

# IMDB exercise
We try MongoDB for implementing the IMDB movie database. As a test run, we store a subset of the schema (Actors, Movies, Directors, and the connecting tables) and include a few tuples and queries. 

* We decided to store the movie data with a single collection (called movies). Each document is a movie with fields that include the attributes of amovie, a list of actors (including the actor attributes, and a list of roles they play in that movie), and a list of directors (including their respective attributes).

* When you look for the attributes in the IMDB movie database, ignor the attributes movie id, director id, and actor id. We needed those for our PK FK relationships. Here now we don't neet them anymore.

For the following problems you need to issue appropriate SQL queries over your local IMDB movie database to find out the missing attribute values (e.g., what is the name of actor with id `538826`, etc..

**a)** Create a new movie collection and make sure that it is empty before you start adding your documents.

**b)** Create an entry for the movie with movie id 476084, including all its attributes (like its name), together with all its directors, and three of its actors, namely those with actor ids 538826, 1794091, 1810514 as found in the IMDB movie database. For each of the actors, don't forget to include their attributes (like fname) and all roles they play in that movie.

**c)** Create an entry for the movie with movie id 433969, together with all of its 3 directors as listed in our IMDB movie database. Ignore all actors in the movie, but include all movie attributes.

**d)** Give a query to get the name of all movies and their directors who were made before 2010