# Introduction:

### MongoDB:
   <a href="https://www.mongodb.com/">MongoDB</a> is a general purpose, document-based, distributed database built for modern application developers and for the cloud era. MongoDB stores data in JSON-like documents, which makes the database very flexible and scalable. <br/>
   
<img src="https://infinapps.com/wp-content/uploads/2018/10/mongodb-logo.png" width ="125" height="75">


### MongoDb Hieraracy: 
<img src="https://cdn.educba.com/academy/wp-content/uploads/2019/04/MongoDB-chart2.jpg" width ="400" >

### PreLab

#### 1. Install MongoDB on Windows

- We install it on windows using the MSI version (https://www.mongodb.com/try/download/community?tck=docs_server), cutomize the installation to "c:/mongodb"
- Add the "data/db"  and "logs"" dirs into the installation directory which you already customized.
- From the CMD "As administrator", configure the logs and databases directories, and start the mongoDB service:
    -  from the "bin "directory run the following command>>> mongod --directorydb --dpath c:\mongodb\data\db --logpath c:\mongodb\log\mongo.log --logappend --rest --install 

- Now we can run the mongodb service 
    - net start mongodb
- Putting your mongoDBHome/bin to the enviroment variables Paths:
    - so you can run the Shell of MongoDb using the command '>mongo'

#### 2. PyMongo python Driver

- Python needs a MongoDB driver to access the MongoDB database.
- <b>'Pymongo'</b> documentation: https://api.mongodb.com/python/current/tutorial.html 
- Install the 'pymongo' Python driver:
```
pip install pymongo
```

In [None]:
! pip install pymongo

In [5]:
from pymongo import MongoClient
from random import randint
from pprint import pprint

import warnings
warnings.filterwarnings('ignore')

### First Steps in MongoDB

#### Task 1 Create a simple MongoDB out of this relational model

This is  a toy DB about movies and actors who played roles in these movies. This DB is consisted of  

- A "Person" table who has a unique id, and a name fields.

- Another "Movie" table that has a unique id, a title, a country where it was made, and a year when it was released.

- There is (m-n) or "many-many" relationship between these two tables (i.e basically, many actors can act in many movies, and the movie include many actors)
- Therefore, we use the "Roles" table in which we can deduct which person has acted in which movie, and what role(s) they played.


<img src="RDBSchema.png" alt="3" border="0">

#### Creating a Database

- To create a database in MongoDB, start by creating a MongoClient object, then specify a connection URL with the correct ip address and the name of the database you want to create.
- MongoDB will create the database if it does not exist, and make a connection to it.
 

In [273]:
myclient = MongoClient("mongodb://localhost:27017/")
mydb = myclient["moviedb"]

<b> Important Note</b>: In MongoDB, a database is not created until it gets content!

###### You can check if a database exist by listing all databases in you system:
- Note That: 'moviedb' DB is not created yet!!

In [274]:
print(client.list_database_names())

['admin', 'business', 'config', 'local', 'movielens', 'mycustomers']


#### Creating a Collection
- To create a collection in MongoDB, use database object and specify the name of the collection you want to create.
- MongoDB will create the collection if it does not exist.

In [275]:
personcol = mydb["person"]

In [276]:
print(mydb.list_collection_names())

[]


#### Insert Into Collection
- To insert a record, or document as it is called in MongoDB, into a collection, we use the insert_one() method.

<font color='red'> "Note to consider", Instead of creating the default _id(s) here, we can use the _id as our given IDs in the Dataset 

In [277]:
mydict = { "id": 1, "name": "Charlie Sheen" }

x = personcol.insert_one(mydict)

In [278]:
#check the created Collection Again after adding a document/record
print(mydb.list_collection_names())

#OR

collist = mydb.list_collection_names()
if "person" in collist:
    print("The collection exists.")

['person']
The collection exists.


In [279]:
#check the created DBs Again after adding creating the collection 'Person'

print(client.list_database_names())

#OR

dblist = myclient.list_database_names()
if "moviedb" in dblist:
    print("The database exists.")

['admin', 'business', 'config', 'local', 'moviedb', 'movielens', 'mycustomers']
The database exists.


#### Insert Multiple Documents
- To insert multiple documents into a collection in MongoDB, we use the <code>insert_many()</code> method.
- The first parameter of the <code>insert_many()</code> method is a list containing dictionaries with the data you want to insert:


In [280]:
# id,name
# 1,Charlie Sheen
# 2,Michael Douglas
# 3,Martin Sheen
# 4,Morgan Freeman

personList = [
  { "id": 2, "name": "Michael Douglas"},
  { "id": 3, "name": "Martin Sheen"},
  { "id": 4, "name": "Morgan Freeman"}
]

x = personcol.insert_many(personList)

#print list of the _id values of the inserted documents:
print(x.inserted_ids)

[ObjectId('5f4be5636877049b11af810b'), ObjectId('5f4be5636877049b11af810c'), ObjectId('5f4be5636877049b11af810d')]


#### Creating rest of Collections

In [281]:
restcols = ["movie","roels"]

for col in restcols:
    mydb[col]

#### Inserting data into the movie Collection


In [282]:
# 1,Wall Street,USA,1987
# 2,The American President,USA,1995
# 3,The Shawshank Redemption,USA,1994


moviecol = mydb["movie"]

movieList = [
  { "id": 1, "title": "Wall Street", "country":"USA","year":1987},
  { "id": 2, "title": "The American President", "country":"USA","year":1995},
  { "id": 3, "title": "The Shawshank Redemption", "country":"USA","year":1994},
]

x = moviecol.insert_many(movieList)

#print list of the _id values of the inserted documents:
print(x.inserted_ids)

[ObjectId('5f4be5696877049b11af810e'), ObjectId('5f4be5696877049b11af810f'), ObjectId('5f4be5696877049b11af8110')]


In [283]:
#Roles
# personId,movieId,role
# 1,1,Bud Fox
# 4,1,Carl Fox
# 3,1,Gordon Gekko
# 4,2,A.J. MacInerney
# 3,2,President Andrew Shepherd
# 5,3,Ellis Boyd 'Red' Redding

rolesCol = mydb["roles"]

roleList = [
  { "personId": 1, "movieId": 1, "role":["Bud Fox"]},
  { "personId": 2, "movieId": 1, "role":["Carl Fox"]},
  { "personId": 3, "movieId": 1, "role":["Gordon Gekko"]},
  { "personId": 2, "movieId": 2, "role":["A.J. MacInerney"]},
  { "personId": 3, "movieId": 2, "role":["President Andrew Shepherd"]},
  { "personId": 4, "movieId": 3, "role":["Ellis Boyd 'Red' Redding"]}
]

x = rolesCol.insert_many(roleList)

#print list of the _id values of the inserted documents:
print(x.inserted_ids)

[ObjectId('5f4be56c6877049b11af8111'), ObjectId('5f4be56c6877049b11af8112'), ObjectId('5f4be56c6877049b11af8113'), ObjectId('5f4be56c6877049b11af8114'), ObjectId('5f4be56c6877049b11af8115'), ObjectId('5f4be56c6877049b11af8116')]


#### Another Way of Modeling this M-N model in Mongo would be using the Forien Keys 


* Movies


```[

{
	"_id": 1,
	"title":"Wall Street",
	"country":"USA",
	"year":1987,
	"persons":[1,2]
},

{
	"_id": 2,
	"title":"The American President",
	"country":"USA",
	"year":1995,
	"persons":[2]
}]
```
* Actors

```
[{
    "_id": 1,
    "name": "Charlie Sheen",
    "movies":[
    {"role": "Bud Fox", "movie_id":1}
    ]
},

{
    "_id": 2,
    "name": "Micheal Douglas",
    "movies":[
    {"role": "Gordon Geko", "movie_id":1},
    {"role": "President Andrew Shepherd", "movie_id":2}
    ]
}

] ```


##### GET the Persons Who Acted in Wall Street

```
var WallSt= db.movies.findOne({title:"Wall Street"});
db.persons.find({_id: {$in: WallSt.persons}}).toArray();
```

##### GET the Movies  in which "Micheal Douglas" has played a role in
* Use of map function is the key here !!!! 

```
var Douglas = db.persons.findOne({name:"Micheal Douglas"})
db.movies.find({_id:{$in : Douglas.movies.map(item=>item.movie_id)   }  }).pretty();
```

## Querying our Data

### Get Persons in your Mongo DB

- find_one method is just one in a series of find statements that support querying MongoDB data.


In [24]:
firstActor = mydb.person.find_one()
print(firstActor)

{'_id': ObjectId('5f4b6bd76877049b11af80fc'), 'id': 1, 'name': 'Charlie Sheen'}


- find method : The find() method returns all occurrences in the selection.
- Notes:
    - The second parameter of the find() method is an object describing which fields to include in the result.

    - This parameter is optional, and if omitted, all fields will be included in the result.


In [30]:
allActors=mydb.person.find()
for actor in allActors:
    print(actor)

{'_id': ObjectId('5f4b6bd76877049b11af80fc'), 'id': 1, 'name': 'Charlie Sheen'}
{'_id': ObjectId('5f4b6e366877049b11af80fd'), 'id': 2, 'name': 'Michael Douglas'}
{'_id': ObjectId('5f4b6e366877049b11af80fe'), 'id': 3, 'name': 'Martin Sheen'}
{'_id': ObjectId('5f4b6e366877049b11af80ff'), 'id': 4, 'name': 'Morgan Freeman'}


#### get persons with names start with 'C' letter

In [453]:
# use regex to get only 'Charlie Sheen'

#you can use your query as like that object
query={ "name": {'$regex':'^C' } }

#then you pass the query to the find func.
cActors =mydb.person.find(query)
for actor in cActors:
    pprint(actor)

{'_id': ObjectId('5f4be55a6877049b11af810a'), 'id': 1, 'name': 'Charlie Sheen'}


### Sorting the Results
#### Get All Movies , sorted from recent to old

In [68]:
moviesSorted=mydb.movie.find().sort([("year",-1)])
for movie in moviesSorted:
    print(movie)

{'_id': ObjectId('5f4b72ac6877049b11af8101'), 'id': 2, 'title': 'The American President', 'country': 'USA', 'year': 1995}
{'_id': ObjectId('5f4b72ac6877049b11af8102'), 'id': 3, 'title': 'The Shawshank Redemption', 'country': 'USA', 'year': 1994}
{'_id': ObjectId('5f4b72ac6877049b11af8100'), 'id': 1, 'title': 'Wall Street', 'country': 'USA', 'year': 1987}


In [84]:
#Projections get only the title and the year sorting by the year field
moviesSorted=mydb.movie.find({}, {"title":1,"year":1, "_id":0}).sort([("year",-1)])
for movie in moviesSorted:
    pprint(movie)

{'title': 'The American President', 'year': 1995}
{'title': 'The Shawshank Redemption', 'year': 1994}
{'title': 'Wall Street', 'year': 1987}


### Filtering the Results
#### Get All Movies released in the 90s (after year (1990) and before 2000) ordered from old to recent.

In [99]:
#use $gt and $lt 
nineteesMovies= mydb.movie.find( {'year':{'$gt': 1990,'$lt':2000} })
for movie in nineteesMovies:
    print(movie)

{'_id': ObjectId('5f4b72ac6877049b11af8101'), 'id': 2, 'title': 'The American President', 'country': 'USA', 'year': 1995}
{'_id': ObjectId('5f4b72ac6877049b11af8102'), 'id': 3, 'title': 'The Shawshank Redemption', 'country': 'USA', 'year': 1994}


### Querying from multiple tables

#### Get Movies and Actors from your  "movies" DB

##### Hint : use the '$lookup' operator

In [450]:
# x =mydb.movie.aggregate([{
    
#     '$lookup':{
#         "from":"roles",
#         "localField":"id",
#         "foreignField":"movieId",
#         "as": "actors_movies"
#     }}

# ])

# for z in x:
#     pprint(z)

In [309]:
#This is the one we need!
for z in mydb.roles.aggregate([
  {"$lookup":{
    "from":"person",
    "localField":"personId",
    "foreignField":"id",
    "as":"actor_roles"
  }},
    
    {"$lookup":{
        
    "from":"movie",
    "localField":"movieId",
    "foreignField":"id",
    "as":"movie_roles" 
        
    }},
    
{
    "$project":
    {
        "_id":0,
        "actor_roles.name":1,
        "movie_roles.title":1
    }
}

]) :
    print(z.get('actor_roles')[0].get('name'), ":", z.get('movie_roles')[0].get('title'))

Charlie Sheen : Wall Street
Michael Douglas : Wall Street
Martin Sheen : Wall Street
Michael Douglas : The American President
Martin Sheen : The American President
Morgan Freeman : The Shawshank Redemption


### Aggregations in MongoDB

#### Get count of "Movies" in your MongoDB

In [121]:
moviesCount=mydb.movie.find().count()
print("Movies Count: ",moviesCount)

Movies Count:  3


In [285]:
for z in mydb.person.aggregate([
  {"$lookup":{
    "from":"roles",
    "localField":"id",
    "foreignField":"personId",
    "as":"actor_roles"
  }},
  {"$project":{
    "_id":0,
    "name":1,
    "Count":{'$size':'$actor_roles'} #size here to get the size of the actor_roles array
  }}
]) :
    print(z)

{'name': 'Charlie Sheen', 'Count': 1}
{'name': 'Michael Douglas', 'Count': 2}
{'name': 'Martin Sheen', 'Count': 2}
{'name': 'Morgan Freeman', 'Count': 1}


#### In this DB, List the movies that every Actor Played

In [286]:
for z in mydb.roles.aggregate([
  {"$lookup":{
    "from":"person",
    "localField":"personId",
    "foreignField":"id",
    "as":"actor_roles"
  }},
    
    {"$lookup":{
        
    "from":"movie",
    "localField":"movieId",
    "foreignField":"id",
    "as":"movie_roles" 
        
    }},
    
    {"$group":
    
    {"_id":"$actor_roles.name", "movies": {"$push":"$movie_roles.title"}}
    }

]) :
    print(z)

{'_id': ['Michael Douglas'], 'movies': [['Wall Street'], ['The American President']]}
{'_id': ['Morgan Freeman'], 'movies': [['The Shawshank Redemption']]}
{'_id': ['Martin Sheen'], 'movies': [['Wall Street'], ['The American President']]}
{'_id': ['Charlie Sheen'], 'movies': [['Wall Street']]}


### Updating MongoDB Data

* hint: use <code>$set</code> function, if the set key is not there it will be added, if exists it will be overwritten.

In [None]:
# Show the Movie before Updating 

print("'Wall Street' movie  Before updating the document: ")
wallStreet=mydb.movie.find_one({"title":"Wall Street"})
pprint(wallStreet)

#update the year the wallstreet movie was released in 
updatedWallStreet = mydb.movie.update_one({'_id' : wallStreet.get('_id') }, {'$set': {'year': 2000}})

print("\n 'Wall Street' movie  After updating the document: ")
wallStreet=mydb.movie.find_one({"title":"Wall Street"})
pprint(wallStreet)


### Delete Documents From a Collection
- Much like the other command discussed so far the delete_one and delete_many command takes a query that matches the document to delete as the first parameter.

- For example, if you wanted to delete all documents in the reviews collection where the category was “Bar Food” issue the following:

In [247]:
#This will delete all the persons with names start with 'M' letter.
mydb.person.delete_many({'name': {"$regex":"^M"} })
for x in mydb.person.find():
    print(x)

{'_id': ObjectId('5f4b6bd76877049b11af80fc'), 'id': 1, 'name': 'Charlie Sheen'}


In [360]:
# db=client.business
# # Issue the serverStatus command and print the results
# serverStatusResult=db.command("serverStatus")
# pprint(serverStatusResult)

### Excercise
* create a new  MongoDB with the name 'movrevs'
* use the following code to create a new collection with the name 'reviews'
    * every random review has the following structure:
        * <b>movie_title</b>
        * <b>rating (From 1 to 5)</b>
        * <b>production_company</b>

In [420]:
titles = ['Killers','Breaking','Bright Places','Lost','King','Game','State', 'Tears', 'Big','City','Dark Night', 'Rise','Redemption', 'Bad','Guys','Lazy', 'Thrones']
company_type = ['Comombia','Western Movie','Fox Inc','Rise Corporation','Universal Pictures','20th Century Studios']

movReviews=[]

for x in range(1, 501):
    movrev = {
        'title' : titles[randint(0, (len(titles)-1))] + ' ' + titles[randint(0, (len(titles)-1))],
        'rating' : randint(1, 5),
        'producer' :company_type[randint(0, (len(company_type)-1))]
    }
    
    print(movrev)
    movReviews.append(movrev)

#print(len(movReviews))
#print(movReviews[1])

{'title': 'Breaking Lost', 'rating': 1, 'producer': 'Universal Pictures'}
{'title': 'Big Big', 'rating': 5, 'producer': 'Fox Inc'}
{'title': 'City State', 'rating': 4, 'producer': 'Universal Pictures'}
{'title': 'Breaking State', 'rating': 3, 'producer': '20th Century Studios'}
{'title': 'Lazy Killers', 'rating': 4, 'producer': 'Fox Inc'}
{'title': 'Rise Rise', 'rating': 2, 'producer': 'Universal Pictures'}
{'title': 'Redemption Guys', 'rating': 1, 'producer': 'Fox Inc'}
{'title': 'King Killers', 'rating': 5, 'producer': 'Comombia'}
{'title': 'Dark Night Rise', 'rating': 1, 'producer': '20th Century Studios'}
{'title': 'Rise Redemption', 'rating': 4, 'producer': 'Rise Corporation'}
{'title': 'Rise Tears', 'rating': 1, 'producer': 'Universal Pictures'}
{'title': 'Bad Game', 'rating': 5, 'producer': 'Universal Pictures'}
{'title': 'Killers Guys', 'rating': 4, 'producer': 'Rise Corporation'}
{'title': 'Bright Places Bad', 'rating': 3, 'producer': 'Universal Pictures'}
{'title': 'Dark Nigh

##### Create the DB and the collection alongside with inserting documents in this collection.

In [None]:
#Step 1: Connect to MongoDB - Note: Change connection string as needed
client = MongoClient(port=27017)

#create the DB 'movrevs'
movrevsDb=client.movrevs
#Step 2: Create sample data

for idx, movrev in enumerate(movReviews):
    #Step 3: Insert business object directly into MongoDB via isnert_one
    result=movrevsDb.reviews.insert_one(movrev)
    #Step 4: Print to the console the ObjectID of the new document
    print('Created {0} of 500 as {1}'.format(idx,result.inserted_id))

#Step 5: Tell us that you are done
print('finished creating 500 movie reviews')

In [379]:
# #Step 1: Connect to MongoDB - Note: Change connection string as needed
# client = MongoClient(port=27017)

# db=client.business
# #Step 2: Create sample data
# names = ['Kitchen','Animal','State', 'Tastey', 'Big','City','Fish', 'Pizza','Goat', 'Salty','Sandwich','Lazy', 'Fun']
# company_type = ['LLC','Inc','Company','Corporation']
# company_cuisine = ['Pizza', 'Bar Food', 'Fast Food', 'Italian', 'Mexican', 'American', 'Sushi Bar', 'Vegetarian']
# for x in range(1, 501):
#     business = {
#         'name' : names[randint(0, (len(names)-1))] + ' ' + names[randint(0, (len(names)-1))]  + ' ' + company_type[randint(0, (len(company_type)-1))],
#         'rating' : randint(1, 5),
#         'cuisine' : company_cuisine[randint(0, (len(company_cuisine)-1))] 
#     }
#     #Step 3: Insert business object directly into MongoDB via isnert_one
#     result=db.reviews.insert_one(business)
#     #Step 4: Print to the console the ObjectID of the new document
#     print('Created {0} of 500 as {1}'.format(x,result.inserted_id))
# #Step 5: Tell us that you are done
# print('finished creating 500 business reviews')

#### find the first occuernce of a review with a rating =5 

In [380]:
fivestarFirst = movrevsDb.reviews.find_one({'rating': 5})
print(fivestarFirst)

{'_id': ObjectId('5f4cc0f76877049b11af8310'), 'title': 'Bad Redemption', 'rating': 5, 'producer': 'Western Movie'}


#### Check if there is a movie with a title 'City Rise' 

In [449]:

x= movrevsDb.reviews.find_one({'title': 'Bad Redemption' },{'_id': 0})
print(x)

{'title': 'Bad Redemption', 'rating': 5, 'producer': '20th Century Studios'}


#### get the count of reviews with a rating =5 

In [422]:
fivestarcount = movrevsDb.reviews.find({'rating': 5}).count()
print(fivestarcount)

106


#### get the 'sum' of each rating occurance across all data grouped by rating, sorted by the 'rating'

In [386]:
print('\nThe sum of each rating occurance across all data grouped by rating ')
stargroup=movrevsDb.reviews.aggregate(
# The Aggregation Pipeline is defined as an array of different operations
[
# The first stage in this pipe is to group data
{ '$group':
    { '_id': "$rating",
     "count" : 
                 { '$sum' :1 }
    }
},
# The second stage in this pipe is to sort the data
{"$sort":  { "_id":1}
}
# Close the array with the ] tag             
] )

# Print the result
for group in stargroup:
    print(group)


The sum of each rating occurance across all data grouped by rating 
{'_id': 2, 'count': 102}
{'_id': 3, 'count': 109}
{'_id': 4, 'count': 95}
{'_id': 5, 'count': 106}


#### get the 'Average' rating of movies produced by each company, sorted by the average 'rating'

In [444]:
for x in movrevsDb.reviews.aggregate([
    
    { '$group':
    { '_id': {
        'producer':"$producer",
        'rating':"$rating"
    },
     "countofMovies" : { '$sum' :1 }
    }}
    
]):
    print('[ {0} WITH {1}] -> {2}'.format(x.get('_id').get('producer'), x.get('_id').get('rating'), x.get ('countofMovies')))

[ 20th Century Studios WITH 5] -> 1
[ Comombia WITH 3] -> 22
[ Comombia WITH 2] -> 18
[ Western Movie WITH 5] -> 23
[ Western Movie WITH 3] -> 22
[ Comombia WITH 4] -> 21
[ Western Movie WITH 4] -> 17
[ Rise Corporation WITH 3] -> 16
[ Fox Inc WITH 3] -> 26
[ Universal Pictures WITH 3] -> 23
[ Rise Corporation WITH 2] -> 23
[ Comombia WITH 5] -> 24
[ Universal Pictures WITH 5] -> 23
[ Fox Inc WITH 4] -> 10
[ Fox Inc WITH 5] -> 14
[ Rise Corporation WITH 4] -> 26
[ Universal Pictures WITH 2] -> 21
[ Western Movie WITH 2] -> 16
[ Fox Inc WITH 2] -> 24
[ Universal Pictures WITH 4] -> 21
[ Rise Corporation WITH 5] -> 21


#### Get the number of the movies by producer and rating

In [429]:
for x in movrevsDb.reviews.aggregate([
    
    { '$group':
        { '_id': "$producer",
         "avgRating" : 
                     { '$avg' :"$rating" }
        }},
    
    {"$sort":  { "avgRating":-1}
    }
    
]):
    print(x)

{'_id': '20th Century Studios', 'avgRating': 5.0}
{'_id': 'Western Movie', 'avgRating': 3.6025641025641026}
{'_id': 'Comombia', 'avgRating': 3.6}
{'_id': 'Rise Corporation', 'avgRating': 3.5232558139534884}
{'_id': 'Universal Pictures', 'avgRating': 3.522727272727273}
{'_id': 'Fox Inc', 'avgRating': 3.189189189189189}


#### update all the first occurence of a review with a rating =5, change the "producer" to be '20th Century Studios'

In [383]:
result = movrevsDb.reviews.update_one({'_id' : fivestarFirst.get('_id') }, {'$set': {'producer': '20th Century Studios'}})
print('Number of documents modified : ' + str(result.modified_count))

print("\n after update:\n",movrevsDb.reviews.find_one({'_id' : fivestarFirst.get('_id') }))

Number of documents modified : 1

 after update:
 {'_id': ObjectId('5f4cc0f76877049b11af8310'), 'title': 'Bad Redemption', 'rating': 5, 'producer': '20th Century Studios'}


#### Notice that the original document did not have the “likes” field and an update allowed us to easily add the field to the document. This ability to dynamically add keys without the hassle of costly Alter_Table statements is the power of MongoDB’s flexible data model. It makes rapid application development a reality.

##### Delete all movie documents in the 'reviews' collection where the rating less than 2

In [385]:
lowRatedMovies = movrevsDb.reviews.find({'rating': {'$lt':2} }).count()
print(lowRatedMovies)

# This will delete all the documents with a rating less than 2
result = movrevsDb.reviews.delete_many({'rating': {'$lt':2}})
print(result.deleted_count)

88
88


#### Drop the collection reviews from the database bisiness

In [334]:
movrevsDb.reviews.drop()