# MongoDB
Please read the instructions carefully, and post questions (including the steps you took and the errors you encountered) on Piazza if anything is unclear.

## Submission Instructions

We will walk through some examples, and then answer questions (marked with "**a)**", "**b)**", etc.)

To submit your solutions, copy/paste the contents of each answer cell to the corresponding answer box in Gradescope HW8.


## Setup Instructions

1. For this notebook, you need to install MongoDB and have a MongoDB server running.

 To install for POSIX, use an appropriate package manager, e.g.:  
    `brew install mongodb` (Mac)  
    `apt-get install mongodb` (Ubuntu)

 To install for Windows, see:  
 https://docs.mongodb.com/manual/installation/


2. Before you can run this notebook, you must start your MongoDB server from a terminal:

  *Before you start MongoDB for the first time*, you must create a directory to store your data. Here, we have made a folder at our home directory; you may put yours elsewhere.

 `mkdir ~/mongodb-data`

 Then, to start the MongoDB server running, we also specify the folder where we are storing data: `mongod --dbpath ~/mongodb-data`.

 You can stop it again with `CTRL + C`.  


3. Finally, you must also install a Python client library to interact with your local MongoDB server. Be sure you have run `pip install -r requirements.txt` with the up-to-date requirements file.


## Let's get started

In [None]:
from pymongo import MongoClient

In [None]:
maxSevSelDelay = 1 # Assume 1ms maximum server selection delay
client = MongoClient(serverSelectionTimeoutMS = maxSevSelDelay)

In [None]:
# Verify that the instance is running
try:
    client.admin.command('ismaster') # The ismaster command is cheap and does not require auth
except Exception as ex:
    # ex
    print("We have a problem: the server is not running")

## Class examples

In [None]:
client.delete_database['test-database']

db = client['test-database']

In [None]:
users = db['user']

users.drop()

In [None]:
user1 = {"name": "Alice",
         "age" : 21,
         "status": "A",
         "groups": ["algorithms", "theory"]}

In [None]:
users.count_documents({}) #We give an empty filter expression to count all documents

In [None]:
uid = users.insert_one(user1).inserted_id
uid

In [None]:
uid = users.insert_one(
    {"name": "Bob",
     "age" : 18,
     "status": "B",
     "groups": ["databases", "cooking"]}
).inserted_id
print(uid)

In [None]:
# List all of the collections in our database:
db.list_collection_names()

In [None]:
users.count_documents({})

In [None]:
# return a single document (matching a query)
users.find_one({"name" : "Bob"})

In [None]:
user3 = {"name": "Charly",
         "age" : 22,
         "status": "A",
         "groups": ["databases", "cars"]}
user4 = {"name": "Dorothee",
         "age" : 16,
         "status": "A",
         "groups": ["cars", "sports"]}

In [None]:
result = users.insert_many([user3, user4])

users.count_documents({})

In [None]:
from pprint import pprint # pretty printing library

# find users of age 18
for user in users.find({"age": 18}): 
    pprint(user)

In [None]:
# find users younger than 19
for user in users.find({"age": {"$lt": 19}}): 
    pprint(user)

In [None]:
# find names of users younger than 19
for user in users.find({"age": {"$lt": 19}}, projection={"name": 1, "_id" : 0}): 
    pprint(user)

For more examples see, e.g.:
http://api.mongodb.com/python/current/tutorial.html

## More examples

In [None]:
client.delete_database['example_db']

# operator overloading is cool!
db = client['example_db']
db = client.example_db

In [None]:
# collections are tables
try:
    db.drop_collection('mycollection')
except:
    print("Collection not found")
    raise
    
collection = db['mycollection']

In [None]:
collection.count_documents({})

In [None]:
doc_id = collection.insert_one({"name": "Peter", "age" : 99})

In [None]:
print(doc_id.inserted_id)

In [None]:
collection.find_one()

In [None]:
collection.insert_one({"name": "Ruth", "age" : 93})

In [None]:
collection.find_one()

In [None]:
collection.find_one({"name" : "Ruth"})

In [None]:
collection.find_one({"name" : "bob"})

In [None]:
for r in collection.find({"name" : "Ruth"}):
    print(r)

In [None]:
collection.insert_one({"fruit": "banana", "vegetable": "potato"})

In [None]:
for r in collection.find():
    print(r)

In [None]:
collection.insert_one({"russian" : {"nesting" : {"dolls" : "rock"}}})

In [None]:
print(collection.find_one("russian"))

In [None]:
rec = collection.find_one({"russian":{"$exists":True}})
print(rec)

In [None]:
rec['russian']

In [None]:
rec['russian']['nesting']['dolls']

# IMDB exercise
We try MongoDB for implementing the IMDB movie database. As a test run, we store a subset of the schema (Actors, Movies, Directors, and the connecting tables) and include a few tuples and queries. 

* Here, we choose to store the movie data with a single collection (called movies). Each document is a movie with fields that include the attributes of a movie, as well as a list of actors (including the actor attributes, and a list of roles they play in that movie), and a list of directors (including their respective attributes).

* When you look for the attributes in the IMDB movie database, ignor the attributes movie id, director id, and actor id. We needed those for our PK FK relationships. Here now we don't neet them anymore.

For the following problems you need to issue appropriate SQL queries over your local IMDB movie database to find out the missing attribute values (e.g., what is the name of actor with id `538826`, etc..

**a)** Create a new movie collection and make sure that it is empty to begin with by deleting all documents inside.

**b)** Insert an entry for the movie with movie id 476084. You should include all its attributes (like its name), together with all its directors, and three of its actors, namely those with actor ids 538826, 1794091, 1810514 as found in the IMDB movie database. For each of the actors, don't forget to include their attributes (like fname) and all roles they play in that movie.

**c)** Create an entry for the movie with movie id 433969, together with all of its 3 directors as listed in our IMDB movie database. Ignore all actors in the movie, but include all movie attributes.

**d)** Write a query to find the movie name and director names of all movies made before 2010.