# Intro to Mongo
1. SQL vs noSQL
2. Installation
3. Create a database
4. Adding a document to a collection
5. Retrieving documents from a collection
6. Deleting a document
7. Modifying a document
8. Practical application

## SQL vs noSQL
Why? When?

## Installation
Install mongodb

### Windows
- https://docs.mongodb.com/manual/installation/


### OS X / linux
Using [Homebrew ](https://brew.sh/)

- `brew tap mongodb/brew`
- `brew install mongodb-community`
- `brew services start mongodb/brew/mongodb-community`

### Pymongo
`!conda install pymongo -y`

In [1]:
!conda install pymongo -y

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/suneelchakravorty/opt/anaconda3

  added / updated specs:
    - pymongo


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pymongo-3.11.3             |   py38h23ab428_0         1.2 MB
    ------------------------------------------------------------
                                           Total:         1.2 MB

The following packages will be UPDATED:

  pymongo                             3.11.2-py38h23ab428_0 --> 3.11.3-py38h23ab428_0



Downloading and Extracting Packages
pymongo-3.11.3       | 1.2 MB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done


In [16]:
from pymongo import MongoClient

In [17]:
client = MongoClient("mongodb://localhost:27017/")

In [18]:
client.list_database_names()

['admin', 'config', 'local']

## Create a database

In [21]:
db = client["great_db"]

Check the database names again.

In [23]:
client.list_database_names()

['admin', 'config', 'local']

## Create a collection

In [24]:
db.list_collection_names()

[]

In [25]:
recs = db["recommendations"]

Check the collections again.

In [26]:
db.list_collection_names()

[]

## Inserting a document

Let's create a dictionary

In [27]:
rec = {"brewery_id": 1415, "name": "O'Brien Ale House"}

`insert_one`

In [28]:
x = recs.insert_one(rec)

`inserted_id`

In [29]:
x.inserted_id

ObjectId('60830d1b4e24523d97869d9d')

`insert_many`

In [38]:
to_insert = [
    {"brewery_id": 1415, "name": "O'Brien Ale House"},
    {"brewery_id": 1416, "name": "Another Pub"},
    {"brewery_id": 1417, "name": "ABC Bar"},
]    

In [39]:
y = recs.insert_many(to_insert)

`inserted_ids`

In [40]:
y.inserted_ids

[ObjectId('60830d8d4e24523d97869d9e'),
 ObjectId('60830d8d4e24523d97869d9f'),
 ObjectId('60830d8d4e24523d97869da0')]

### What happens when...
- You specify the `_id`?
- What if it matches an existing `_id`?

## Data Retrieval

In [41]:
result = recs.find_one()

`find_one`

In [42]:
result

{'_id': ObjectId('60830d1b4e24523d97869d9d'),
 'brewery_id': 1415,
 'name': "O'Brien Ale House"}

`find`

In [43]:
results = recs.find()
for result in results:
    print(result)

{'_id': ObjectId('60830d1b4e24523d97869d9d'), 'brewery_id': 1415, 'name': "O'Brien Ale House"}
{'_id': ObjectId('60830d8d4e24523d97869d9e'), 'brewery_id': 1415, 'name': "O'Brien Ale House"}
{'_id': ObjectId('60830d8d4e24523d97869d9f'), 'brewery_id': 1416, 'name': 'Another Pub'}
{'_id': ObjectId('60830d8d4e24523d97869da0'), 'brewery_id': 1417, 'name': 'ABC Bar'}


`{}, {"name": 0}`

In [44]:
results = recs.find({}, {"name": 1})
for result in results:
    print(result)

{'_id': ObjectId('60830d1b4e24523d97869d9d'), 'name': "O'Brien Ale House"}
{'_id': ObjectId('60830d8d4e24523d97869d9e'), 'name': "O'Brien Ale House"}
{'_id': ObjectId('60830d8d4e24523d97869d9f'), 'name': 'Another Pub'}
{'_id': ObjectId('60830d8d4e24523d97869da0'), 'name': 'ABC Bar'}


Combining 0s and 1s?

## Querying
Equality

In [47]:
recs.find_one({"name": "ABC Bar"})

{'_id': ObjectId('60830d8d4e24523d97869da0'),
 'brewery_id': 1417,
 'name': 'ABC Bar'}

In [50]:
res = recs.find({"name": {"$gt": "O"}})
for r in res:
    print(r)

{'_id': ObjectId('60830d1b4e24523d97869d9d'), 'brewery_id': 1415, 'name': "O'Brien Ale House"}
{'_id': ObjectId('60830d8d4e24523d97869d9e'), 'brewery_id': 1415, 'name': "O'Brien Ale House"}


Starts with

`{ "$gt": "S"}`

Regex

`count_documents`

In [52]:
recs.count_documents({"name": {"$gt": "O"}})

2

`.sort()`

- -1
- 1

In [58]:
for r in recs.find().sort("name", 1):
    print(r)

{'_id': ObjectId('60830d8d4e24523d97869da0'), 'brewery_id': 1417, 'name': 'ABC Bar'}
{'_id': ObjectId('60830d8d4e24523d97869d9f'), 'brewery_id': 1416, 'name': 'Another Pub'}
{'_id': ObjectId('60830d1b4e24523d97869d9d'), 'brewery_id': 1415, 'name': "O'Brien Ale House"}
{'_id': ObjectId('60830d8d4e24523d97869d9e'), 'brewery_id': 1415, 'name': "O'Brien Ale House"}


## Delete 

In [70]:
x = recs.delete_one({"brewery_id": 1415})

`delete_one`

In [71]:
x.raw_result

{'n': 1, 'ok': 1.0}

In [72]:
x.deleted_count

1

`delete_many`

## Update

In [85]:
recs.update_one({"name": "ABC Bar"}, {'$set': {'name': 'ABCD Bar'}})

<pymongo.results.UpdateResult at 0x7ff2cd1db0c0>

`.update_one(query, new_values_dict)`

`.update_many`

`.drop()`

In [77]:
db.list_collection_names()

['recommendations']

In [79]:
client.list_database_names()

['admin', 'config', 'great_db', 'local']

In [81]:
recs.drop()

In [82]:
db.list_collection_names()

[]

# Mini-Lab
MongoDB is great for data does not easily conform to a tabular, SQL-esque structure. For example documents, or highly variable JSON.

Also, if there is a bunch of nested data that we don't care to query on but want to store and access in some form.

Let's go through such a case.

### Part 1: MongoDB as Cache datastore
Create a function `smart_request` that will first check if we have made that request by searching in Mongo by the target URL and then if not make the request and save the result in Mongo. If it's already there, then return it.

You'll need to create a collection for this.

In [94]:
import requests 

caches = db['caches']

def smart_request(url):
    res = caches.find_one({"url": url})
    if not res:
        response = requests.get(url)
        data = response.json()
        result = {"url": url, "data": data}
        caches.insert_one(result)
        return data
    print("Fetching from cache...")
    return res["data"]

In [90]:
test_url = "https://hacker-news.firebaseio.com/v0/item/8863.json?print=pretty"

In [120]:
for rec in caches.find():
    print(rec)
    print()

{'_id': ObjectId('6083155b4e24523d97869da1'), 'url': 'https://hacker-news.firebaseio.com/v0/item/8863.json?print=pretty', 'data': {'by': 'dhouston', 'descendants': 71, 'id': 8863, 'kids': [9224, 8917, 8952, 8958, 8884, 8887, 8869, 8940, 8908, 9005, 8873, 9671, 9067, 9055, 8865, 8881, 8872, 8955, 10403, 8903, 8928, 9125, 8998, 8901, 8902, 8907, 8894, 8870, 8878, 8980, 8934, 8943, 8876], 'score': 104, 'time': 1175714200, 'title': 'My YC app: Dropbox - Throw away your USB drive', 'type': 'story', 'url': 'http://www.getdropbox.com/u/2/screencast.html'}}

{'_id': ObjectId('60832ccd4e24523d97869da2'), 'date_modified': datetime.datetime(2021, 4, 23, 16, 23, 41, 992000)}

{'_id': ObjectId('60832e094e24523d97869da5'), 'url': 'https://hacker-news.firebaseio.com/v0/item/8864.json?print=pretty', 'data': {'by': 'yaacovtp', 'id': 8864, 'parent': 8700, 'text': "New restaurants need anonymous and confidential reviews more than web 2.0 startups do. Aren't 1000's of blogs already writing about the lates

In [95]:
smart_request(test_url)

Fetching from cache...


{'by': 'dhouston',
 'descendants': 71,
 'id': 8863,
 'kids': [9224,
  8917,
  8952,
  8958,
  8884,
  8887,
  8869,
  8940,
  8908,
  9005,
  8873,
  9671,
  9067,
  9055,
  8865,
  8881,
  8872,
  8955,
  10403,
  8903,
  8928,
  9125,
  8998,
  8901,
  8902,
  8907,
  8894,
  8870,
  8878,
  8980,
  8934,
  8943,
  8876],
 'score': 104,
 'time': 1175714200,
 'title': 'My YC app: Dropbox - Throw away your USB drive',
 'type': 'story',
 'url': 'http://www.getdropbox.com/u/2/screencast.html'}

### Part 2: Invalidate cache if it has been more than a day
Modify the above function to check WHEN the record was last modified (HINT: you will need to track this date now) and if a day has transpired, then make the request and update it.

You can manually alter the last modified date to a much older date in order to check that your function works.

In [96]:
import datetime

In [116]:
current_date = datetime.datetime.now()

def is_within(current_date, target_date, hours=24):
    diff_hours = (current_date - target_date).total_seconds() / (60 * 60)
    return diff_hours <= hours


import requests 

caches = db['caches']

def smart_request(url):
    res = caches.find_one({"url": url})
    now = datetime.datetime.now()

    is_expired = res and res.get('date_modified') and is_within(now, res.get('date_modified'), hours=1/60)
    if is_expired:
        caches.delete_one({"url": url})

    # If not in the cache or cache is out of date
    if not res or is_expired:
        response = requests.get(url)
        data = response.json()
        result = {"url": url, "data": data, "date_modified": now}
        caches.insert_one(result)
        return data

    print("Fetching from cache...")
    return res["data"]

In [117]:
test_url = "https://hacker-news.firebaseio.com/v0/item/8864.json?print=pretty"
res = smart_request(test_url)

In [118]:
res

{'by': 'yaacovtp',
 'id': 8864,
 'parent': 8700,
 'text': "New restaurants need anonymous and confidential reviews more than web 2.0 startups do. Aren't 1000's of blogs already writing about the latest web startup?",
 'time': 1175714547,
 'type': 'comment'}

In [119]:
caches.find_one({"url": test_url})

{'_id': ObjectId('60832e094e24523d97869da5'),
 'url': 'https://hacker-news.firebaseio.com/v0/item/8864.json?print=pretty',
 'data': {'by': 'yaacovtp',
  'id': 8864,
  'parent': 8700,
  'text': "New restaurants need anonymous and confidential reviews more than web 2.0 startups do. Aren't 1000's of blogs already writing about the latest web startup?",
  'time': 1175714547,
  'type': 'comment'},
 'date_modified': datetime.datetime(2021, 4, 23, 16, 28, 56, 352000)}

In [None]:
'date_modified': datetime.datetime(2021, 4, 23, 16, 28, 2, 654000)}

### Part 3: Populate our DB with HN API
Write code to hit the HN API and populate an Authors collection, based on the top stories.

### Part 4: API Endpoint
Write a Bottle API endpoint that returns the top 5 highest ranked authors, as far as karma points, from our dataset.