# On MongoDB and NoSQL Databases

In [None]:
import json
import requests
import pandas as pd
# pip install pymongo OR conda install -c anaconda pymongo
import pymongo

![nosql](https://pragatisoftware.com/wp-content/uploads/2017/04/An-Introduction-to-NoSQL-1.jpg)

![mongodb](https://nakedsecurity.sophos.com/wp-content/uploads/sites/2/2017/01/mongodb.png?resize=780,408)

SQL is the paradigmatic tool for working with relational databases, where the data has the form of tables that relate to each other with foreign keys etc. But we have already seen examples of data that doesn't take this form. Sometimes we have long-form pieces of text. Sometimes we have JSON objects that have lots of nested levels. Or JSON objects that have keys with values that are often long or often missing. Or maybe we have pictures or sound files that are part of a record. In such cases it's often good to turn to NoSQL databases.

There are several options for NoSQL databases:

- [Cassandra](http://cassandra.apache.org/)
- [Couchbase](https://www.couchbase.com/)
- [Riak](https://riak.com/index.html)

But MongoDB seems to be the most common.

## Agenda

SWBAT:

- Use mongodb to read and to manipulate JSON data;
- Use `pymongo` to do the same in a Python interpreter;
- Use MongoDB Atlas to interact with JSON data in the cloud.

## MongoDB Installation

The basic installation with Homebrew goes like this:

```
brew tap mongodb/brew
brew install mongodb-community
```

Notes:

- You may need to adjust your security settings. <br/>
- There is a known problem with installing mongo on Catalina. The mongo installation looks to set up a data directory at `/data/db`, but Catalina's directory structure is a bit different, and the path to the data directory should be `/System/Volumes/Data/data/db`.

So Catalina (or later) users should try:

```
sudo mkdir -p /System/Volumes/Data/data/db
sudo chown -R `id -un` /System/Volumes/Data/data/db
```
***
Then you'll need to start the Mongo Daemon:
```
mongod
```
Catalina users:
```
mongod --dbpath=/System/Volumes/Data/data/db
```
Windows users see [here](https://docs.mongodb.com/manual/tutorial/install-mongodb-on-windows/).

If you're having trouble installing MongoDB, consider using MongoDB Atlas, the cloud-hosted version of MongoDB. (See the end of the current notebook.)

## What Is Mongo?

Mongo is one of the leading tools for working with *non-relational* databases.

With Mongo we should at least be able to Create, Read, Update, Delete: the four basic functions of persistent storage.

[This site](https://www.tutorialspoint.com/mongodb/index.htm) is an excellent resource on Mongo. (See also the documentation [here](https://docs.mongodb.com/manual/tutorial/).) Let's check it out!

- Overview: Terminology: SQL vs. NoSQL (Not only SQL)
- Advantages: NoSQL
- Data Modeling: Example
- Queries: Equivalents of SQL 'WHERE', 'AND', and 'OR'
- Aggregation: Equivalents of SQL 'GROUPBY'

## Mongo in the Terminal

Let's try a few simple commands.

1. Run `mongo` to launch the program!
2. Now run `help` to see some mongo hints.
3. Let's run `show dbs` to show the databases to which we're currently connected.
4. To use or create a particular database, simply type `use` followed by the name of the database.
5. Once we're accessing a particular database, we can list its collections by running `show collections`.

## Loading Data

Let's use both:

- data that we already have in JSON form; and
- data that we acquire from an API.

### JSON Data

In [None]:
with open('data/burgers.json', 'r') as f:
    burgers = json.load(f)

In [None]:
burgers

### Foursquare API

In [None]:
url = 'https://api.foursquare.com/v2/venues/explore'
with open('.secrets/credentials.json') as f:
    params = json.load(f)

In [None]:
params['v'] = '20201201'
params['ll'] = '47.608, -122.336',
params['query'] = 'pizza',
params['intent'] = 'browse',
params['radius'] = 10000,
params['limit'] = 100

In [None]:
response = requests.get(url=url, params=params)
data = json.loads(response.text)

In [None]:
data.keys()

In [None]:
type(data['response'])

In [None]:
data['response'].keys()

After some exploration ...

In [None]:
data['response']['groups'][0]['items']

Let's grab the value for 'venue' in each establishment in this list:

In [None]:
info = []

for store in data['response']['groups'][0]['items']:
    info.append(store['venue'])

foursq_df = pd.DataFrame(info)

In [None]:
foursq_df.sample(10)

## Putting into Mongo

We could do all we need in the terminal, but we can also make use of `pymongo`, which is a Python package that interfaces with mongo databases!

In [None]:
#!conda install pymongo

client = pymongo.MongoClient('mongodb://127.0.0.1:27017')

In [None]:
client.list_database_names()

In [None]:
db = client['foursquare']

In [None]:
db.list_collection_names()

In [None]:
db.create_collection('sea_pizza')

In [None]:
db.list_collection_names()

In [None]:
db['sea_pizza'].insert_many(info)

In [None]:
db['sea_pizza'].inserted_ids

In [None]:
pizza = db['sea_pizza']

In [None]:
pizza.find({})[0]

In [None]:
pizza.find({'name': 'Big Mario\'s Pizza'})[0]

### Updating

In [None]:
big_m = {'name': 'Big Mario\'s Pizza'}

In [None]:
pizza.update_one(big_m, {'$set': {'greg_rating': 'five stars'}})

In [None]:
pizza.find({'name': 'Big Mario\'s Pizza'})[0]

### Filtering

We can specify either the keys/values we want displayed (with '1') or the keys/values we do NOT want displayed (with '0').

In [None]:
for eatery in pizza.find({}, {'name': 1}):
    print(eatery)

The '\_id' key is the only one whose value (0 or 1) can be different from the rest.

In [None]:
for eatery in pizza.find({}, {'_id': 0, 'name': 1, 'location': 1}):
    print(eatery)

### Sorting

In [None]:
pizza.find({}, {'_id': 0, 'name': 1, 'location': 1}).sort('name')[0]

### Aggregating

Try this one yourselves!

### Some Useful `pymongo` Methods

`.find()`
`.delete_one()`
`.update()`
`.update_many()`
`.insert_one()`
`.insert_many()`

## MongoDB Atlas: MongoDB in the Cloud

MongoDB Atlas is your ticket to MongoDB in the cloud! This may be desirable if you're running into space issues with large databases or if you need to collaborate with others on a project.

Here I'll walk through simple first steps to setting up MongoDB Atlas:

1. Start here: https://www.mongodb.com/cloud/atlas
2. Click on “Start Free”
3. You’ll supply your email, first and last name, and a password at the registration site (https://www.mongodb.com/cloud/atlas/register)
4. Now click “Create a Cluster”
5. Select “Starter Clusters”
6. Configure Cluster
7. Select AWS as provider and “Oregon” as region
8. Leave Cluster Tier as is
9. Leave Additional Settings as is
10. Edit Cluster Name
11. Wait for your Cluster to be built

To connect: <br/>
12. Click on ‘Connect’
13. Whitelist your connection IP address using your existing IP Address
14. Name it: Laptop on WeWork wifi
15. Create a mongoDB User
16. Choose a connection method
17. Click on ‘Connect Your Application’
18. Choose your driver version
19. Driver: Python
20. Version: 3.6 or later
21. Add your connection string into your application code
22. Click on Full Driver Example
23. Replace <password> with the password for the <dbUser> user.

In the terminal, be sure you’re in the learn-env conda environment before typing: <br/>
`conda install pymongo` <br/>
`conda install dnspython`


- To share your database with a team member, follow these steps: <br/>
    Database Access → Click on “Add New User” <br/>
    Assign each team member the following: <br/>
    - A user name
    - A password
    
Send your team member the following: <br/>
    Their user name you assigned to them <br/>
    Their password you assigned to them <br/>
    Your mongodb connection string

In [None]:
# Let's do a quick demo of adding data to a cluster on MongoDB Atlas!

# import pymongo

In [None]:
#!pip install dnspython

In [None]:
# Remember to add your current IP address to the access list!
# Go to Security -> Network Access (on the left control panel on
# cloud.mongodb.com)

with open('.secrets/atlas.json') as f:
    password = json.load(f)

client = pymongo.MongoClient("mongodb+srv://gadamico:" + password['phrase'] +\
"@gregcluster200204-7ckf3.mongodb.net/test?retryWrites=true&w=majority")
db = client.test

In [None]:
client

In [None]:
db.list_collection_names()

In [None]:
db.people.find({})[0]

In [None]:
import datetime
personDocument = {
  "name": { "first": "Charles", "last": "Babbage" },
  "birth": datetime.datetime(1791, 12, 26),
  "death": datetime.datetime(1871, 10, 18),
  "contribs": [ "computer", "difference engine"]
}

db.people.insert_one(personDocument)

In [None]:
db.people.find({})[1]