# Document-Based Stores (MongoDB)

# Introduction:

### MongoDB:
   <a href="https://www.mongodb.com/">MongoDB</a> is a general purpose, document-based, distributed database built for modern application developers and for the cloud era. MongoDB stores data in JSON-like documents, which makes the database very flexible and scalable. <br/>
   
<img src="https://infinapps.com/wp-content/uploads/2018/10/mongodb-logo.png" width ="125" height="75">


### MongoDb Hieraracy: 
<img src="https://cdn.educba.com/academy/wp-content/uploads/2019/04/MongoDB-chart2.jpg" width ="400" >

### PyMongo python Driver

- Python needs a MongoDB driver to access the MongoDB database.
- <b>'Pymongo'</b> documentation: https://api.mongodb.com/python/current/tutorial.html 
- Install the 'pymongo' Python driver:
```
pip install pymongo
```

- The first thing that we need to do in order to establish a connection is import the MongoClient class.

In [1]:
from pymongo import MongoClient
from random import randint
from pprint import pprint

import warnings
warnings.filterwarnings('ignore')

# First Steps with MongoDB, CRUD Operations

## CREATE

#### Creating a Database
- Unlike SQL databases, databases and collections in MongoDB only have to be named to be created. 

- To create a database in MongoDB, start by creating a MongoClient object, then specify a connection URL with the correct ip address and the name of the database you want to create.
- MongoDB will create the database if it does not exist, and make a connection to it.
 

In [2]:
mongo = "localhost"

# NOTE: if you are running this notebook in docker you need to 
# refer to the container name "mongodb://mongo:27017/"
# mongo = "mongo"

# we use the MongoClient to communicate with the running database instance.
myclient = MongoClient(
                    "mongodb://"+mongo+":27017/",  
                    username='admin',
                    password='admin') #Mongo URI format
mydb = myclient["customer_db"]
# Or you can use the attribute style 
# mydb = myclient.customer_db

- Note: you can also specify the host and/or port using:
```python 
client = MongoClient('mongo', 27017)
```

<b style="color:red"> Important Note</b>: In MongoDB, a database is not created until it gets content!

###### You can check if a database exist by listing all databases in you system:
- Note That: 'moviedb' DB is not created yet!!

In [3]:
myclient.list_database_names()

['admin', 'config', 'customer_db', 'local']

#### Creating a Collection
- To create a collection in MongoDB, use database object and specify the name of the collection you want to create.
- MongoDB will create the collection if it does not exist.

In [4]:
customer_coll = mydb["customers"]

#### Check if the collection is created !

In [5]:
mydb.list_collection_names()

['customers']

#### Check if the DB itself is created !

In [6]:
myclient.list_database_names()

['admin', 'config', 'customer_db', 'local']

- This means that Mongo is following a <em>lazy</em> creation approach.
    - That is the database and corresponding collection are actually only created when a document is inserted.

### Insert into the collection
- To insert a single record, or more accurately a document as it is called in MongoDB, into a collection, we use the **insert_one()** method.

In [7]:
first_customer_doc = {"name": "Jane", "age": 25, "gender": "female"}
first_customer = mydb.customers.insert_one(first_customer_doc)

- Each document is allocated a unique identifier which can be accessed via the **inserted_id** attribute.

In [8]:
first_customer.inserted_id

ObjectId('68e761b74c46bffe50ebeb4c')

**Notes about Document_IDs:** 
- Although these identifiers look pretty random, there is actually a wel defined structure.
    - The first 8 characters (4 bytes) are a timestamp
    - followed by a 6 character machine identifier
    - then a 4 character process identifier
    - and finally a 6 character counter.
    
- <font color='red'> Note to consider</font>:
    - Instead of creating the default _id(s) here, we can use the _id as our given IDs in the Dataset
    - it's better to stick to the automatically created mongo IDs in order to scale well.
    - However, sometimes you badly want to prettify the never-ending ObjectID value.
        - Then, you should consider using an appropriate atomic increment strategy.
   ```javascript  
   db.coll.insert({_id:"myUniqueValue", a:1, b:1}) ```

#### Insert Multiple Documents
- To insert multiple documents into a collection in MongoDB, we use the <code>insert_many()</code> method.
- The first parameter of the <code>insert_many()</code> method is a list containing dictionaries with the data you want to insert.

In [9]:
customer_List = [
  { "name": "Karim", "age":14, "gender":"male"},
  { "name": "Kate","age":75, "gender":"female"},
  { "name": "Riccardo","age":34, "gender":"male", "phone": 474612}
]
customers = mydb.customers.insert_many(customer_List)
# print documents in a collection
print(customers.inserted_ids)
for document in mydb.customers.find().limit(5):
  print(document)

[ObjectId('68e761b74c46bffe50ebeb4d'), ObjectId('68e761b74c46bffe50ebeb4e'), ObjectId('68e761b74c46bffe50ebeb4f')]
{'_id': ObjectId('68e66a6d721a60d40e0243d2'), 'name': 'Jane', 'age': 25, 'gender': 'female'}
{'_id': ObjectId('68e66ae2721a60d40e0243d3'), 'name': 'Karim', 'age': 14, 'gender': 'male'}
{'_id': ObjectId('68e66ae2721a60d40e0243d4'), 'name': 'Kate', 'age': 75, 'gender': 'female'}
{'_id': ObjectId('68e66ae2721a60d40e0243d5'), 'name': 'Riccardo', 'age': 34, 'gender': 'male', 'phone': 474612}
{'_id': ObjectId('68e66b19721a60d40e0243d8'), 'name': 'Riccardo', 'age': 34, 'gender': 'male', 'phone': 474612}


- Notice that the last document has this "**phone**" feild, even we didn't specify that for the other documents.

## READ (Querying our Data)

- **find_one** method is just one in a series of find statements that support querying MongoDB data.

####  Get the first customer (document) in the customers collection

In [10]:
first_customer= mydb.customers.find_one()
pprint(first_customer)

{'_id': ObjectId('68e66a6d721a60d40e0243d2'),
 'age': 25,
 'gender': 'female',
 'name': 'Jane'}


#### Find a specific document using find
Typically, we use unique id if we want a specific document.

- Find the customer with the name 'Jane'

In [11]:
jane =mydb.customers.find_one({"name": "Jane"})
print(jane)

{'_id': ObjectId('68e66a6d721a60d40e0243d2'), 'name': 'Jane', 'age': 25, 'gender': 'female'}


- **find method** : The find() method returns all occurrences in the selection.
    - More precisely, it returns a **cursor** which can be used to iterate over the results.
    - A cursor is an iterable and can be used to neatly access the query results.

- **Notes**:
    - The second parameter of the find() method is an object describing which fields to include in the result.

    - This parameter is optional, and if omitted, all fields will be included in the result.

####  List all the customers (documents) in the customers collection

In [12]:
#corresponds to the SQL query "SELECT * FROM customer_tbl"
allCustomers=mydb.customers.find({}) #we can ignore the empty '{}' doc.
for customer in allCustomers:
    print(customer)

{'_id': ObjectId('68e66a6d721a60d40e0243d2'), 'name': 'Jane', 'age': 25, 'gender': 'female'}
{'_id': ObjectId('68e66ae2721a60d40e0243d3'), 'name': 'Karim', 'age': 14, 'gender': 'male'}
{'_id': ObjectId('68e66ae2721a60d40e0243d4'), 'name': 'Kate', 'age': 75, 'gender': 'female'}
{'_id': ObjectId('68e66ae2721a60d40e0243d5'), 'name': 'Riccardo', 'age': 34, 'gender': 'male', 'phone': 474612}
{'_id': ObjectId('68e66b19721a60d40e0243d8'), 'name': 'Riccardo', 'age': 34, 'gender': 'male', 'phone': 474612}
{'_id': ObjectId('68e66b3a721a60d40e0243d9'), 'name': 'Karim', 'age': 14, 'gender': 'male'}
{'_id': ObjectId('68e66b3a721a60d40e0243da'), 'name': 'Kate', 'age': 75, 'gender': 'female'}
{'_id': ObjectId('68e66b3a721a60d40e0243db'), 'name': 'Riccardo', 'age': 34, 'gender': 'male', 'phone': 474612}
{'_id': ObjectId('68e761b74c46bffe50ebeb4c'), 'name': 'Jane', 'age': 25, 'gender': 'female'}
{'_id': ObjectId('68e761b74c46bffe50ebeb4d'), 'name': 'Karim', 'age': 14, 'gender': 'male'}
{'_id': ObjectId

#### MongoDB Projections
- Notes:
    - The **_id** field (returned by default)
    - Your projection **must** include only the fields you want to have. Not the one you don't want.
    - Exception for **_id**, you can specify to not include it.

#### Get the name, age fields only of the customers

In [13]:
allCustomers=mydb.customers.find({}, {"_id": 0, "name": 1, "age": 1}).limit(5)
for customer in allCustomers:
    print(customer)

{'name': 'Jane', 'age': 25}
{'name': 'Karim', 'age': 14}
{'name': 'Kate', 'age': 75}
{'name': 'Riccardo', 'age': 34}
{'name': 'Riccardo', 'age': 34}


#### Get the the customers with name 'Jane' or 'Kate'
- Hint: use the **"$or"** operator

In [14]:
janeOrKate = mydb.customers.find(
    {"$or": [{"name":"Jane"}, 
             {"name":"Kate"} 
            ] } )



for customer in janeOrKate:
    print(customer)

{'_id': ObjectId('68e66a6d721a60d40e0243d2'), 'name': 'Jane', 'age': 25, 'gender': 'female'}
{'_id': ObjectId('68e66ae2721a60d40e0243d4'), 'name': 'Kate', 'age': 75, 'gender': 'female'}
{'_id': ObjectId('68e66b3a721a60d40e0243da'), 'name': 'Kate', 'age': 75, 'gender': 'female'}
{'_id': ObjectId('68e761b74c46bffe50ebeb4c'), 'name': 'Jane', 'age': 25, 'gender': 'female'}
{'_id': ObjectId('68e761b74c46bffe50ebeb4e'), 'name': 'Kate', 'age': 75, 'gender': 'female'}


- Similarly, we can use `$and`, `$not` , `$nor` operators to join or negate query clauses/conditions.

#### Get the the customers with name 'Jane' and age is 25

In [15]:
janeAnd25 = mydb.customers.find({
    "$and": [{"name": "Jane"}, {"age": 25}]
})

for customer in janeAnd25:
    print(customer)

{'_id': ObjectId('68e66a6d721a60d40e0243d2'), 'name': 'Jane', 'age': 25, 'gender': 'female'}
{'_id': ObjectId('68e761b74c46bffe50ebeb4c'), 'name': 'Jane', 'age': 25, 'gender': 'female'}


#### Find customers with age older than 16 
- we use <code> "$gt"</code> paramater.

In [16]:
adult_customers = mydb.customers.find({
    "age": {"$gte": 16}
})

for customer in adult_customers:
    print(customer["name"],customer["age"])

Jane 25
Kate 75
Riccardo 34
Riccardo 34
Kate 75
Riccardo 34
Jane 25
Kate 75
Riccardo 34


- Obviously, we can also use `$lt (<)` </code>,  `$gt (>)`,  `$lte (<=)`,  `$gte (>=)`

### Sorting the Results
#### Get all Customers, sorted by age descending

In [17]:
customers_Sorted = mydb.customers.find().sort("age", -1)
for customer in customers_Sorted:
    print(customer)

{'_id': ObjectId('68e66ae2721a60d40e0243d4'), 'name': 'Kate', 'age': 75, 'gender': 'female'}
{'_id': ObjectId('68e66b3a721a60d40e0243da'), 'name': 'Kate', 'age': 75, 'gender': 'female'}
{'_id': ObjectId('68e761b74c46bffe50ebeb4e'), 'name': 'Kate', 'age': 75, 'gender': 'female'}
{'_id': ObjectId('68e66ae2721a60d40e0243d5'), 'name': 'Riccardo', 'age': 34, 'gender': 'male', 'phone': 474612}
{'_id': ObjectId('68e66b19721a60d40e0243d8'), 'name': 'Riccardo', 'age': 34, 'gender': 'male', 'phone': 474612}
{'_id': ObjectId('68e66b3a721a60d40e0243db'), 'name': 'Riccardo', 'age': 34, 'gender': 'male', 'phone': 474612}
{'_id': ObjectId('68e761b74c46bffe50ebeb4f'), 'name': 'Riccardo', 'age': 34, 'gender': 'male', 'phone': 474612}
{'_id': ObjectId('68e66a6d721a60d40e0243d2'), 'name': 'Jane', 'age': 25, 'gender': 'female'}
{'_id': ObjectId('68e761b74c46bffe50ebeb4c'), 'name': 'Jane', 'age': 25, 'gender': 'female'}
{'_id': ObjectId('68e66ae2721a60d40e0243d3'), 'name': 'Karim', 'age': 14, 'gender': 'ma

* Clearly, in order to sort ascending, we would use 1 

#### Get the count of the customers in your DB

In [18]:
customer_count1 = mydb.customers.count_documents({})
customer_count2 = list(mydb.customers.aggregate([
    {"$count": "total"}
]))[0]["total"]

print(customer_count1)
print(customer_count2)

12
12


#### Get only the first 2 customers name and age in your DB sorted by the age ascending. 

In [19]:
two_customers= mydb.customers.find({}, {"_id":0,
                                       "name":1,
                                       "age":1}).limit(2).sort([("age",1)])
for customer in two_customers:
    print(customer)

{'name': 'Karim', 'age': 14}
{'name': 'Karim', 'age': 14}


### Aggregations

#### For each gender, get the average of ages

In [20]:
agg_result= mydb.customers.aggregate([
    {  "$group": {"_id":{"gender": "$gender"},
                  "average": {"$avg":"$age"} }}])
for gen_Avg_age in agg_result:
    print(gen_Avg_age)

{'_id': {'gender': 'female'}, 'average': 55.0}
{'_id': {'gender': 'male'}, 'average': 25.428571428571427}


#### Get the count of customers for each gennder

In [21]:
agg_result= mydb.customers.aggregate([
    {
        "$group": {"_id": {"gender": "$gender"}, "total": {"$sum": 1}}
    }
])
for gen_count in agg_result:
    print(gen_count)

{'_id': {'gender': 'female'}, 'total': 5}
{'_id': {'gender': 'male'}, 'total': 7}


## Update

- The update_one() method is used to modify an existing document.
- A compound document is passed as the argument to update_one()

     - The first part of which is used to match those documents to which the change is to be applied.
     - If you want to update something that was not matched, nothing will be modified then! 
     - The second part gives the details of the change.

- To update many documents we use update_many()

#### Update the age of Kate to be 25 instead of 75

In [22]:
kate =mydb.customer.find_one({"name":"Kate"}, {"name":1, "age":1, "_id":0}) 
print(kate)

mydb.customer.update_one({"name":"Kate"},{"$set": {"age":25}  })

kate =mydb.customer.find_one({"name":"Kate"}, {"name":1, "age":1, "_id":0}) 
print(kate)

None
None


#### What will happen if we don't specify the $set operator?!

In [27]:
kate =mydb.customers.find_one({"name":"Kate"}, {"name":1, "age":1, "_id":0}) 
print(kate)

mydb.customers.update_one({"name":"Kate"},{"$set": {"age":25}})

kate =mydb.customers.find_one({"name":"Kate"}, {"name":1, "age":1, "_id":0}) 
print(kate)

{'name': 'Kate', 'age': 75}
{'name': 'Kate', 'age': 25}


- The example above uses the `$set` modifier.
- There are a number of other modifiers available like `$inc`, `$mul`, `$rename` and `$unset`.

#### Remove the attribute "gender" from the customer "Riccardo"

In [28]:
ricc =mydb.customers.find_one({"name":"Riccardo"}, {"_id":0}) 
print(ricc)

mydb.customers.update_one({"name":"Riccardo"},{"$unset": {"gender":1}  })

ricc =mydb.customers.find_one({"name":"Riccardo"}, {"_id":0}) 
print(ricc)

{'name': 'Riccardo', 'age': 34, 'gender': 'male', 'phone': 474612}
{'name': 'Riccardo', 'age': 34, 'phone': 474612}


#### Adding the "address" attribute to an existing document 

In [29]:
jane =mydb.customers.find_one({"name":"Jane"}, {"_id":0}) 
print(jane)

address= {
    "street": "UUS 70",
    "county":"Tartu",
    "country":"Estonia"
}
mydb.customers.update_one({"name":"Jane"},
                         {"$set": {"address" :address }}, 
                         upsert=True)

jane =mydb.customers.find_one({"name":"Jane"}, {"_id":0}) 
print(jane)

{'name': 'Jane', 'age': 25, 'gender': 'female'}
{'name': 'Jane', 'age': 25, 'gender': 'female', 'address': {'street': 'UUS 70', 'county': 'Tartu', 'country': 'Estonia'}}


#### Update the address of "Jane" changing the street to be 'Narva 99'

In [31]:
jane =mydb.customers.find_one({"name":"Jane"}, {"_id":0}) 
print(jane)


mydb.customers.update_one(
    {"name": "Jane", "address": {"$exists": True}},
    {"$set": {"address.street": "Narva 99"}}
)

jane =mydb.customers.find_one({"name":"Jane"}, {"_id":0}) 
print(jane)

{'name': 'Jane', 'age': 25, 'gender': 'female', 'address': {'street': 'UUS 70', 'county': 'Tartu', 'country': 'Estonia'}}
{'name': 'Jane', 'age': 25, 'gender': 'female', 'address': {'street': 'Narva 99', 'county': 'Tartu', 'country': 'Estonia'}}


## Delete

- The functions delete_one(), and delete_many() are used to delete document(s) fromt MongoDB Collections.

#### Delete all the male customers from the DB!

In [32]:
print("\n //////////////////BEFORE//////")
for cust in mydb.customers.find({}):
    print(cust)
mydb.customers.delete_many({"gender":"male"})

print("\n //////////////////AFTER//////")

for cust in mydb.customers.find({}):
    print(cust)


 //////////////////BEFORE//////
{'_id': ObjectId('68e66a6d721a60d40e0243d2'), 'name': 'Jane', 'age': 25, 'gender': 'female', 'address': {'street': 'Narva 99', 'county': 'Tartu', 'country': 'Estonia'}}
{'_id': ObjectId('68e66ae2721a60d40e0243d3'), 'name': 'Karim', 'age': 14, 'gender': 'male'}
{'_id': ObjectId('68e66ae2721a60d40e0243d4'), 'name': 'Kate', 'age': 25, 'gender': 'female'}
{'_id': ObjectId('68e66ae2721a60d40e0243d5'), 'name': 'Riccardo', 'age': 34, 'phone': 474612}
{'_id': ObjectId('68e66b19721a60d40e0243d8'), 'name': 'Riccardo', 'age': 34, 'gender': 'male', 'phone': 474612}
{'_id': ObjectId('68e66b3a721a60d40e0243d9'), 'name': 'Karim', 'age': 14, 'gender': 'male'}
{'_id': ObjectId('68e66b3a721a60d40e0243da'), 'name': 'Kate', 'age': 75, 'gender': 'female'}
{'_id': ObjectId('68e66b3a721a60d40e0243db'), 'name': 'Riccardo', 'age': 34, 'gender': 'male', 'phone': 474612}
{'_id': ObjectId('68e761b74c46bffe50ebeb4c'), 'name': 'Jane', 'age': 25, 'gender': 'female'}
{'_id': ObjectId(

## THANK YOU