# Week 4: Document-Based Stores (MongoDB)

# Introduction:

### MongoDB:
   <a href="https://www.mongodb.com/">MongoDB</a> is a general purpose, document-based, distributed database built for modern application developers and for the cloud era. MongoDB stores data in JSON-like documents, which makes the database very flexible and scalable. <br/>
   
<img src="https://infinapps.com/wp-content/uploads/2018/10/mongodb-logo.png" width ="125" height="75">


### MongoDb Hieraracy: 
<img src="https://cdn.educba.com/academy/wp-content/uploads/2019/04/MongoDB-chart2.jpg" width ="400" >

### PreLab

#### 1. Install MongoDB on Windows

- We install it on windows using the MSI version (https://www.mongodb.com/try/download/community?tck=docs_server), cutomize the installation to "c:/mongodb"
- Add the "data/db"  and "logs"" dirs into the installation directory which you already customized.
- From the CMD "As administrator", configure the logs and databases directories, and start the mongoDB service:
    -  from the "bin "directory run the following command>>> mongod --directorydb --dpath c:\mongodb\data\db --logpath c:\mongodb\log\mongo.log --logappend --rest --install 

- Now we can run the mongodb service 
    - net start mongodb
- Putting your mongoDBHome/bin to the enviroment variables Paths:
    - so you can run the Shell of MongoDb using the command '>mongo'


#### For Linux users (ubuntu):
- Follow the instructions in this [tutourial](https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/) to install MongoDB 4.4 Community Edition on LTS (long-term support) releases of Ubuntu Linux using the apt package manager.
- Genrally Speaking, you can also follow this [link](https://docs.mongodb.com/manual/administration/install-on-linux/) to install MongoDB Community Edition for supported Linux systems. 





#### 2. PyMongo python Driver

- Python needs a MongoDB driver to access the MongoDB database.
- <b>'Pymongo'</b> documentation: https://api.mongodb.com/python/current/tutorial.html 
- Install the 'pymongo' Python driver:
```
pip install pymongo
```

In [None]:
! pip install pymongo

- The first thing that we need to do in order to establish a connection is import the MongoClient class.

In [None]:
from pymongo import MongoClient
from random import randint
from pprint import pprint

import warnings
warnings.filterwarnings('ignore')

# First Steps with MongoDB, CRUD Operations

## CREATE

#### Creating a Database
- Unlike SQL databases, databases and collections in MongoDB only have to be named to be created. 

- To create a database in MongoDB, start by creating a MongoClient object, then specify a connection URL with the correct ip address and the name of the database you want to create.
- MongoDB will create the database if it does not exist, and make a connection to it.
 

In [None]:
#we use the MongoClient to communicate with the running database instance.
myclient = MongoClient("mongodb://localhost:27017/") #Mongo URI format
mydb = myclient["customer_db"]

#Or you can use the attribute style 
#mydb = myclient.customer_db

- Note: you can also specify the host and/or port using:
```python 
client = MongoClient('localhost', 27017)
```

<b style="color:red"> Important Note</b>: In MongoDB, a database is not created until it gets content!

###### You can check if a database exist by listing all databases in you system:
- Note That: 'moviedb' DB is not created yet!!

In [None]:
myclient.list_database_names()

#### Creating a Collection
- To create a collection in MongoDB, use database object and specify the name of the collection you want to create.
- MongoDB will create the collection if it does not exist.

In [None]:
customer_coll = mydb["customer"]

#### Check if the collection is created !

In [None]:
mydb.list_collection_names()

#### Check if the DB itself is created !

In [None]:
myclient.list_database_names()

- This means that Mongo is following a <em>lazy</em> creation approach.
    - That is the database and corresponding collection are actually only created when a document is inserted.

### Insert into the collection
- To insert a single record, or more accurately a document as it is called in MongoDB, into a collection, we use the **insert_one()** method.

In [None]:
first_customer_doc = {"name": "Jane", "age": 25, "gender": "female"}
first_customer = customer_coll.insert_one(first_customer_doc)

- Each document is allocated a unique identifier which can be accessed via the **inserted_id** attribute.

In [None]:
first_customer.inserted_id

**Notes about Document_IDs:** 
- Although these identifiers look pretty random, there is actually a wel defined structure.
    - The first 8 characters (4 bytes) are a timestamp
    - followed by a 6 character machine identifier
    - then a 4 character process identifier
    - and finally a 6 character counter.
    
- <font color='red'> Note to consider</font>:
    - Instead of creating the default _id(s) here, we can use the _id as our given IDs in the Dataset
    - it's better to stick to the automatically created mongo IDs in order to scale well.
    - However, sometimes you badly want to prettify the never-ending ObjectID value.
        - Then, you should consider using an appropriate atomic increment strategy.
   ```javascript  
   db.coll.insert({_id:"myUniqueValue", a:1, b:1}) ```

#### Insert Multiple Documents
- To insert multiple documents into a collection in MongoDB, we use the <code>insert_many()</code> method.
- The first parameter of the <code>insert_many()</code> method is a list containing dictionaries with the data you want to insert.

In [None]:
customer_List = [
  { "name": "Karim", "age":14, "gender":"male"},
  { "name": "Kate","age":75, "gender":"female"},
  { "name": "Riccardo","age":34, "gender":"male", "phone": 474612}
]
customers = ...#YOUR CODE HERE

#print list of the _id values of the inserted documents:
print(...)#YOUR CODE HERE

- Notice that the last document has this "**phone**" feild, even we didn't specify that for the other documents.

## READ (Querying our Data)

- **find_one** method is just one in a series of find statements that support querying MongoDB data.

####  Get the first customer (document) in the customers collection

In [None]:
first_customer= mydb.customer.find_one()
pprint(first_customer)

#### Find a specific document using find
Typically, we use unique id if we want a specific document.

- Find the customer with the name 'Jane'

In [None]:
jane =... #YOUR CODE HERE
print(jane)

- **find method** : The find() method returns all occurrences in the selection.
    - More precisely, it returns a **cursor** which can be used to iterate over the results.
    - A cursor is an iterable and can be used to neatly access the query results.

- **Notes**:
    - The second parameter of the find() method is an object describing which fields to include in the result.

    - This parameter is optional, and if omitted, all fields will be included in the result.

####  List all the customers (documents) in the customers collection

In [None]:
#corresponds to the SQL query "SELECT * FROM customer_tbl"
allCustomers=mydb.customer.find({}) #we can ignore the empty '{}' doc.
for customer in allCustomers:
    print(customer)

#### MongoDB Projections
- Notes:
    - The **_id** field (returned by default)
    - Your projection **must** include only the fields you want to have. Not the one you don't want.
    - Exception for **_id**, you can specify to not include it.

#### Get the name, age fields only of the customers

In [None]:
allCustomers=...#YOUR CODE HERE
for customer in allCustomers:
    print(customer)

#### Get the the customers with name 'Jane' or 'Tarun'
- Hint: use the **"$or"** operator

In [None]:
janeOrTarun=mydb.customer.find(
    {"$or": [{"name":"Jane"}, 
             {"name":"Tarun"} 
            ] } )

for customer in janeOrTarun:
    print(customer)

- Similarly, we can use `$and`, `$not` , `$nor` operators to join or negate query clauses/conditions.

#### Get the the customers with name 'Jane' and age is 25

In [None]:
janeOrTarun=...#YOUR CODE HERE

for customer in janeOrTarun:
    print(customer)

#### Find customers with age older than 16 
- we use <code> "$gt"</code> paramater.

In [None]:
adult_customers=...#YOUR CODE HERE

for customer in adult_customers:
    print(customer["name"],customer["age"])

- Obviously, we can also use `$lt (<)` </code>,  `$gt (>)`,  `$lte (<=)`,  `$gte (>=)`

### Sorting the Results
#### Get all Customers, sorted by age descending

In [None]:
customers_Sorted=...#YOUR CODE HERE
for customer in customers_Sorted:
    print(customer)

* Clearly, in order to sort ascending, we would use 1 

#### Get the count of the customers in your DB

In [None]:
#YOUR CODE HERE

#### Get only the first 2 customers name and age in your DB sorted by the age ascending. 

In [None]:
two_customers= mydb.customer.find({}, {"_id":0,
                                       "name":1,
                                       "age":1}).limit(2).sort([("age",1)])
for customer in two_customers:
    print(customer)

### Aggregations

#### For each gender, get the average of ages

In [None]:
agg_result= mydb.customer.aggregate([
    {  "$group": {"_id":{"gender": "$gender"},
                  "average": {"$avg":"$age"} }}])
for gen_Avg_age in agg_result:
    print(gen_Avg_age)

#### Get the count of customers for each gennder

In [None]:
agg_result= ...#YOUR CODE HERE
for gen_count in agg_result:
    print(gen_count)

## Update

- The update() method is used to modify existing documents. - A compound document is passed as the argument to update()
     - The first part of which is used to match those documents to which the change is to be applied.
     - If you want to update something that was not matched, nothing will be modified then! 
     - The second part gives the details of the change.

#### Update the age of Tarun to be 25 instead of 75

In [None]:
tarun =mydb.customer.find_one({"name":"Tarun"}, {"name":1, "age":1, "_id":0}) 
print(tarun)

mydb.customer.update({"name":"Tarun"},{"$set": {"age":25}  })

tarun =mydb.customer.find_one({"name":"Tarun"}, {"name":1, "age":1, "_id":0}) 
print(tarun)

#### What will happen if we don't specify the $set operator?!

In [None]:
tarun =mydb.customer.find_one({"name":"Tarun"}, {"name":1, "age":1, "_id":0}) 
print(tarun)

mydb.customer.update({"name":"Tarun"},{"age":25})

tarun =mydb.customer.find_one({"name":"Tarun"}, {"name":1, "age":1, "_id":0}) 
print(tarun)

- The example above uses the `$set` modifier.
- There are a number of other modifiers available like `$inc`, `$mul`, `$rename` and `$unset`.

#### Remove the attribute "gender" from the customer "Riccardo"

In [None]:
ricc =mydb.customer.find_one({"name":"Riccardo"}, {"_id":0}) 
print(ricc)

mydb.customer.update({"name":"Riccardo"},{"$unset": {"gender":1}  })

ricc =mydb.customer.find_one({"name":"Riccardo"}, {"_id":0}) 
print(ricc)

#### Adding the "address" attribute to an existing document 

In [None]:
jane =mydb.customer.find_one({"name":"Jane"}, {"_id":0}) 
print(jane)

address= {
    "street": "UUS 70",
    "county":"Tartu",
    "country":"Estonia"
}
mydb.customer.update_one({"name":"Jane"},
                         {"$set": {"address" :address }}, 
                         upsert=True)

jane =mydb.customer.find_one({"name":"Jane"}, {"_id":0}) 
print(jane)

#### Update the address of "Jane" changing the street to be 'Narva 99'

In [None]:
jane =mydb.customer.find_one({"name":"Jane"}, {"_id":0}) 
print(jane)


#YOUR CODE HERE


jane =mydb.customer.find_one({"name":"Jane"}, {"_id":0}) 
print(jane)

## Delete

- The functions delete_one(), and delete_many() are used to delete document(s) fromt MongoDB Collections.

#### Delete all the male customers from the DB!

In [None]:
print("\n //////////////////BEFORE//////")
for cust in mydb.customer.find({}):
    print(cust)
mydb.customer.delete_many({"gender":"male"})

print("\n //////////////////AFTER//////")

for cust in mydb.customer.find({}):
    print(cust)

## THANK YOU