# PyMongo!

## Connect to the Cluster!

In this activity, you will use Python to interact with a Mongo Database stored on a remote (Cloud) server! This is very realistic in terms of how you might interact with a NoSQL database like MongoDB.  It will be large and not stored on a single machine and certainly not on your local machine. 
We have a loaded a collection of documents about restaurants in New York City into a MongoDB cluster. First you will import the necessary modules, PyMongo and Pandas. Next you establish a connection to the cluster (either Morgan's or Dr. Rigas') using a connection string. Lastly you specify the database and collection within the cluster that we want to work with.

In [None]:
from PyMongo import MongoClient
import pandas as pd
client = MongoClient('mongodb+srv://MongoMarc:jyxkah-moqzAd-myqvi0@marc1-k2auc.mongodb.net/test?retryWrites=true')  #Dr.Rigas connection string
db = client.test
restaurants = db.restaurants

## Let's Query!

Querying is basically the same as you would in your command prompt. The only difference is that PyMongo doesn't support .pretty(). For a simple example: to find one of the Bronx restaurants run the cell below and see the output.

In [None]:
print(restaurants.find_one({"borough":"Bronx"}))

Now if we wanted to find ALL of the Bronx restaurants, we simply cannot print the query. It will create a cursor object. To see all of the query results, we need to use a for loop. See below. This example is limited to output the first 5 documents to save space, but if you wanted to see all of the restaurants, just remove the .limit(5).

In [None]:
for restaurant in restaurants.find({"borough":"Bronx"}).limit(5):
    print(restaurant)

This is not the prettiest to look at. So, we can make use of pandas dataframes! First we establish an empty list. We loop through the restaurants in the query and append them to the list. Finally, we make a dataframe out of the list of json objects and display our result. Run the cell below to see a dataframe of 15 of the Indian cuisine restaurants in Brooklyn in our database.

In [None]:
brooklynIndian = []
for rest in restaurants.find({"$and": [{"cuisine":"Indian"}, {"borough": "Brooklyn"}]}).limit(15):
    brooklynIndian.append(rest)
testDF = pd.DataFrame(brooklynIndian)
testDF

## Your turn! 

Use the example above to complete the following exercises. For documentation on Mongo querying visit https://docs.mongodb.com/manual/tutorial/query-documents/ and https://docs.mongodb.com/manual/tutorial/query-embedded-documents/

#1. In the cell below, fill in the missing details to display the fields restaurant_id, name, borough and cuisine, but exclude the ID field for the first 10 documents in the database.

In [None]:
for rest in restaurants.find({},{"restaurant_id" : 1,"name":1,"...,"_id":0}).limit(...):
print(rest)

#2. Find the first 10 restaurants who have received a _score less than_ 50.

In [None]:

print(rest)

#3. Return _only_ the name, cuisine, and borough for 20 documents that are _NOT IN_ the Bronx, Queens, or Staten Island (Hint: There is a Mongo method called $nin for "not in".)

In [None]:

print(rest)

#4. Due to a recent inventory mishap, a client needs the names and addresses for the Mexican restaurants north of the Central Park Zoo (longitude: 40.7678). Do not return the id. This client already visited the first 45, so skip those and sort the next 20 restaurants in ascending order by name. __Hint__: will need to make use of pymongo.ASCENDING. You can also use a a property of the address atribute, address.coord

In [None]:
print(rest)

#5. Did you know that MongoDB also supports regular expressions? We have a client that can't remember the name of a restaurant they rated. All this client knows is that it was an American cuisine restaurant in Manhattan whose name started with "Mad". We have a few of these in our database. We will create a dataframe of these restaurants to send to our client. Fill in the missing detail below to run this and print the result. 

In [None]:
mysteryRestaurant = []
for rest in restaurants.find({"name": {"$regex" : "^Mad"}, "cuisine": "American", "borough": "Manhattan"}):
    mysteryRestaurant.append(rest)
result = pd.dataframe(...)
result

Keep in mind: The flexibility of Mongo's schema can cause these dataframes to get very messy. Data cleaning will be a crucial step before analysis when using Python on top of MongoDB!