## Mongodb and Python
##### Mongo is a NOSQL database (Not Only SQL)
####   
##### Nosql databases don't use tables and rows like SQL, but offer a more object-oriented
#### view of the data. In the case of mongo, a database is a collection of documents, 
#### where each document is stored as a JSON object.
#### Hence it can easily store complex objects that would be hard to store in 
#### relational data bases
####   
#### Mongo also has the ability to have a cluster of servers, serving up a database,
#### where the documents are distributed across many servers, 
#### and retrieval/searching is done in parallel
#### Since the basic data object in mongo is a json object, the output of many API's
#### can be directly stored in MONGO. For instance, the YELP database is distributed as 
#### a number of JSON files, one JSON file for each type of entity
##### (business,tips,reviews,checkin, ...)
##### Each json file can be loaded into a database with a single command. mongoimport.

## Mongo and the YELP database
### Let's see how to connect to a mongo database using python

In [1]:
!sudo pip install mongo

[33mThe directory '/home/nwhite/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.[0m
[33mThe directory '/home/nwhite/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.[0m
[33mYou are using pip version 10.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
import pymongo

## WE NEED TO FIRST ESTABLISH an SSH Tunnel to bigdata to
## TUNNEL port 27017 from the localhost  to bigdata.stern.nyu.edu
### This is until NYU can open that port in the firewall.
So you need to open up a terminal window and type the following command (but with your team account)

ssh   -L 27107:bigdata.stern.nyu.edu:27017   DealingF18GB10@bigdata.stern.nyu.edu
####         
Then,
continue on in this notebook.

In [3]:
from pymongo import MongoClient
#client = MongoClient('bigdata.stern.nyu.edu',27017)
client = MongoClient('mongodb://localhost:27017')

In [4]:
db=client['yelp']


#### So far so good, we are connected to the mongo server on the bigdata.stern.nyu.edu database
#### Lets do a little test and create a database called 'myFirstMB' and add a collection called 'countries'

In [7]:
def get_db():
    from pymongo import MongoClient
    client = MongoClient('localhost:27017')
    db = client.myFirstMB
    return db

def add_country(db):
    db.countries.insert({"name" : "Canada"})
    
def get_country(db):
    return db.countries.find_one()

if __name__ == "__main__":

    db = get_db() 
    add_country(db)
    print( get_country(db))

{'_id': ObjectId('5c01b2e9400de000a33a164c'), 'name': 'Canada'}


  


In [8]:
for x in db.countries.find():
    print(x)

{'_id': ObjectId('5c01b2e9400de000a33a164c'), 'name': 'Canada'}
{'_id': ObjectId('5c01b399400de000a33a164d'), 'name': 'USA'}
{'_id': ObjectId('5c01b3b1400de000a33a164e'), 'name': 'USA'}
{'_id': ObjectId('5c01b3d2400de000a33a164f'), 'name': 'USA'}
{'_id': ObjectId('5c01b3d9400de000a33a1650'), 'name': 'USA'}
{'_id': ObjectId('5c01b3f8400de000a33a1651'), 'name': 'USA'}
{'_id': ObjectId('5c01b414400de000a33a1652'), 'name': 'USA'}
{'_id': ObjectId('5c01b427400de000a33a1653'), 'name': 'USA'}
{'_id': ObjectId('5c01b4a8400de000a33a1654'), 'name': 'USA'}
{'_id': ObjectId('5c02df0f400de000e52d57f1'), 'name': 'Canada'}
{'_id': ObjectId('5c02e047400de000e52d57f3'), 'name': 'Canada'}
{'_id': ObjectId('5c02e163400de000e52d57f5'), 'name': 'Canada'}
{'_id': ObjectId('5c054e4e400de0018d09afeb'), 'name': 'Canada'}
{'_id': ObjectId('5c054e65400de0018d09afed'), 'name': 'Canada'}


### Let's try the yelp database
### But it us very large

In [9]:
# switch to the yelp database
db=client.yelp

In [10]:
x=db.tip.find({'business_id':'FsDogwXYckKUgs5VUoPbJw'})
print(x)

<pymongo.cursor.Cursor object at 0x7f533a36cf60>


In [11]:
for i in x:
    print(i)

{'_id': ObjectId('5c0187e636bcfcbe7d0a079e'), 'text': 'Great deli', 'date': '2011-11-11', 'likes': 0, 'business_id': 'FsDogwXYckKUgs5VUoPbJw', 'user_id': 'blrWvPePSv87aU9hV1Zd8Q'}
{'_id': ObjectId('5c0187f236bcfcbe7d134f7b'), 'text': "How long does it typically take to get food here?  Been sitting for about 40 minutes... Nothing.  I'm starving.", 'date': '2012-07-30', 'likes': 0, 'business_id': 'FsDogwXYckKUgs5VUoPbJw', 'user_id': 'JGGdzfzTOqv2l--aAT4Qcw'}
{'_id': ObjectId('5c0187f236bcfcbe7d135a76'), 'text': 'Broasted Chicken takes 30 minutes to prepare fresh.  Worth the wait.', 'date': '2015-02-23', 'likes': 0, 'business_id': 'FsDogwXYckKUgs5VUoPbJw', 'user_id': 'heqccWi1Fn-Apli9o8eVQw'}
{'_id': ObjectId('5c0187f236bcfcbe7d135a7e'), 'text': 'Place is permanently closed.', 'date': '2016-10-31', 'likes': 0, 'business_id': 'FsDogwXYckKUgs5VUoPbJw', 'user_id': 'dpglkz_wX19VSsYnTX2_xQ'}


In [12]:

for business in db.businesses.find({'business_id':'FsDogwXYckKUgs5VUoPbJw'}):
    print(business)

{'_id': ObjectId('5c017cf236bcfcbe7da6cd33'), 'business_id': 'FsDogwXYckKUgs5VUoPbJw', 'name': "Frisch's", 'neighborhood': '', 'address': '5301 Grove Rd', 'city': 'Pittsburgh', 'state': 'PA', 'postal_code': '15236', 'latitude': 40.3592663, 'longitude': -80.0022905, 'stars': 3.5, 'review_count': 38, 'is_open': 1, 'attributes': {'Alcohol': 'beer_and_wine', 'Ambience': "{'romantic': False, 'intimate': False, 'classy': False, 'hipster': False, 'divey': False, 'touristy': False, 'trendy': False, 'upscale': False, 'casual': True}", 'BikeParking': 'True', 'BusinessAcceptsCreditCards': 'True', 'BusinessParking': "{'garage': False, 'street': False, 'validated': False, 'lot': True, 'valet': False}", 'Caters': 'True', 'GoodForKids': 'True', 'GoodForMeal': "{'dessert': False, 'latenight': False, 'lunch': True, 'dinner': False, 'breakfast': False, 'brunch': False}", 'HasTV': 'True', 'NoiseLevel': 'average', 'OutdoorSeating': 'True', 'RestaurantsAttire': 'casual', 'RestaurantsDelivery': 'True', 'Res

In [13]:
PA=db.business.find({'state':'PA'})
for business in PA:
    print(business)

NameError: name 'y' is not defined