# NoSQL: Basic queries on non-relational databases

## NoSQL: Review

A NoSQL database is designed to handle data that are not structured in tabular relations. The data can be stored in key-value pairs (like Python dictionaries), documents (JSON), or graphs.

<img src="../assets/nosql_vs_sql.jpeg"  width = 500 px></img>

Unlike relational databases (SQL), NoSQL does not (necessarily) employ tables and rows. Also they are much more flexible since the structure of the entries is not predefined. Therefore you can add data without defining anything (even not the table itself).

[This article](https://www.integrate.io/blog/the-sql-vs-nosql-difference/) gives a recap of the biggest differences between SQL and NoSQL database systems:
|SQL|NoSQL|
|---|---|
|SQL databases are relational| NoSQL databases are non-relational. In NoSQL you cannot make relationships between tables (`JOIN`)|
|SQL databases are table-based |NoSQL databases are document, key-value, graph, or wide-column stores.|
|SQL databases use structured query language and have a predefined schema.| NoSQL databases have dynamic schemas for unstructured data.|
|SQL databases are vertically scalable (by adding processing power)|NoSQL databases are horizontally scalable (by adding servers/machines).|
|SQL databases are better for multi-row transactions|NoSQL is better for unstructured data like documents or JSON.|

In the image, below you will find examples of popular SQL and NoSQL databases. 

<img src="../assets/popular_examples_nosql_sql.jpeg" width =500px></img>

In the next section, we'll cover an example using MongoDB. You will see that we can translate many of the basic SQL queries you know.

## MongoDB

<img src="https://upload.wikimedia.org/wikipedia/fr/thumb/4/45/MongoDB-Logo.svg/527px-MongoDB-Logo.svg.png" />

A lot of companies provide NoSQL architectures. One of the most popular is [MongoDB](https://www.mongodb.com/). A MongoDB database contains collections (tables) of documents (entries). The documents are stored in the JSON format which is very convenient to handle with Python!

The syntax of the queries is based on JavaScript. They basically look like python dictionaries.

Before diving in some exercises, you can have a look to [this quick intro](https://www.mongodb.com/docs/manual/tutorial/query-documents/) about MongoDB queries.

### Creating the database

We have created and filled a MongoDB database for you. You probably know already the database. It is the list of country leaders that you have used already in the Wikipedia project.

You can set it up by deploying the Docker image we have pre-built:

In [2]:
# You can run it from this notebook with:
!docker-compose up -d

# Or in your terminal with:
# docker-compose up -d

 Container mongodb  Running
 Container 02-nosql-mongo-seed-1  Created
 Container 02-nosql-mongo-seed-1  Starting
 Container 02-nosql-mongo-seed-1  Started


### Conneting to the database

For using MongoDB through Python you will need to install the `pymongo` library.

In [3]:
from pymongo import MongoClient

# Creation of a MongoDB Client (by giving the host and the port)
client = MongoClient(host="localhost", port=27017)

# Instantiation of the database
db = client["becode"]

# Let's see which collections are in the database
db.list_collection_names()

['leaders']

### Basic Queries

**1. Show the first leader of the `leaders` collection by using the `find_one` method**

The corresponding `SQL` query would be:

```sql
SELECT * FROM leaders LIMIT 1;
```

In [4]:
db["leaders"].find_one()

{'_id': ObjectId('66bc9c5657b75da88ded5ec6'),
 'id': 'Q7747',
 'first_name': 'Vladimir',
 'last_name': 'Putin',
 'birth_date': datetime.datetime(1952, 10, 7, 0, 0),
 'death_date': None,
 'place_of_birth': 'Saint Petersburg',
 'wikipedia_url': 'https://ru.wikipedia.org/wiki/%D0%9F%D1%83%D1%82%D0%B8%D0%BD,_%D0%92%D0%BB%D0%B0%D0%B4%D0%B8%D0%BC%D0%B8%D1%80_%D0%92%D0%BB%D0%B0%D0%B4%D0%B8%D0%BC%D0%B8%D1%80%D0%BE%D0%B2%D0%B8%D1%87',
 'start_mandate': datetime.datetime(2000, 5, 7, 0, 0),
 'end_mandate': datetime.datetime(2008, 5, 7, 0, 0),
 'country': 'ru'}

**2. Show the first leader of the collection whose country is Belgium**

To that purpose we will use the query as first parameter of the `find_one` function. The query is formatted as a dictionary.

In SQL it would be like adding a `WHERE` condition to the query:

```sql
SELECT * FROM leaders WHERE country = 'be' LIMIT 1;
```

In [5]:
db["leaders"].find_one({"country": "be"})

{'_id': ObjectId('66bc9c5657b75da88ded5ef5'),
 'id': 'Q12978',
 'first_name': 'Guy',
 'last_name': 'Verhofstadt',
 'birth_date': datetime.datetime(1953, 4, 11, 0, 0),
 'death_date': None,
 'place_of_birth': 'Dendermonde',
 'wikipedia_url': 'https://nl.wikipedia.org/wiki/Guy_Verhofstadt',
 'start_mandate': datetime.datetime(1999, 7, 12, 0, 0),
 'end_mandate': datetime.datetime(2008, 3, 20, 0, 0),
 'country': 'be'}

**3. Select some fields to display**

Let's use the same query by displaying only the `first_name` and the `last_name` of the leader. It corresponds to a `SELECT` in SQL.

We will use the [project](https://www.mongodb.com/docs/manual/tutorial/project-fields-from-query-results/) as the second parameter of the function. It is also formatted as a dictionary whose the key contains the targeted field and the value is `1` if we want the field to be displayed

The corresponding SQL query would be:

```sql
SELECT first_name, last_name FROM leaders WHERE country = 'be' LIMIT 1;
```

In [6]:
db["leaders"].find_one({"country": "be"}, {"first_name": 1, "last_name": 1})

{'_id': ObjectId('66bc9c5657b75da88ded5ef5'),
 'first_name': 'Guy',
 'last_name': 'Verhofstadt'}

We can also decide to not display a field. In that case we put `0` as value for the dictionary.

In [7]:
db["leaders"].find_one({"country": "be"}, {"wikipedia_url": 0, "id": 0})

{'_id': ObjectId('66bc9c5657b75da88ded5ef5'),
 'first_name': 'Guy',
 'last_name': 'Verhofstadt',
 'birth_date': datetime.datetime(1953, 4, 11, 0, 0),
 'death_date': None,
 'place_of_birth': 'Dendermonde',
 'start_mandate': datetime.datetime(1999, 7, 12, 0, 0),
 'end_mandate': datetime.datetime(2008, 3, 20, 0, 0),
 'country': 'be'}

**4. Find the distinct countries**

The SQL equivalent:

```sql
SELECT DISTINCT country FROM leaders;
```

In [8]:
db["leaders"].find().distinct("country")

['be', 'fr', 'ma', 'ru', 'us']

**5. Find all the leaders who are still alive**

We can assume that they have no `death_date`, isn't?

In [9]:
db["leaders"].find({"death_date": None}, {"last_name":1, "first_name":1, "country":1})

<pymongo.cursor.Cursor at 0x2a9e5a269c0>

As you see the `find` command returns a cursor. No worries we can process it by using a simple Python loop!

In [10]:
for leader in db["leaders"].find({"death_date": None}, {"last_name":1, "first_name":1, "country":1}):
    print(f"{leader['first_name']} {leader['last_name']} ({leader['country']})")

Vladimir Putin (ru)
Mohammed None (ma)
Barack Obama (us)
George Bush (us)
Joe Biden (us)
Bill Clinton (us)
Mohammed None (ma)
Dmitry Medvedev (ru)
Mohammed None (ma)
Donald Trump (us)
Jimmy Carter (us)
Guy Verhofstadt (be)
Yves Leterme (be)
Herman Van Rompaey (be)
Mark Eyskens (be)
Elio Di Rupo (be)
Charles Michel (be)
Sophie Wilmès (be)
François Hollande (fr)
Alexander De Croo (be)
Nicolas Sarkozy (fr)
Emmanuel Macron (fr)
Dmitry Medvedev (ru)
Mohammed None (ma)
Barack Obama (us)
Bill Clinton (us)
George Bush (us)
Joe Biden (us)
Mohammed None (ma)
Vladimir Putin (ru)
Mohammed None (ma)
Donald Trump (us)
Guy Verhofstadt (be)
Yves Leterme (be)
Herman Van Rompaey (be)
Jimmy Carter (us)
Elio Di Rupo (be)
Alexander De Croo (be)
Mark Eyskens (be)
Charles Michel (be)
François Hollande (fr)
Sophie Wilmès (be)
Nicolas Sarkozy (fr)
Emmanuel Macron (fr)
Mohammed None (ma)
Barack Obama (us)
George Bush (us)
Bill Clinton (us)
Mohammed None (ma)
Joe Biden (us)
Vladimir Putin (ru)
Mohammed None (ma)

**6. Let's now insert the leader of tomorrow: you?**

As you know, MongoDB is flexible. It means that you can add entries although some fields are missing. Let's give a try:

In [11]:
you = {
    'first_name': 'ADD HERE YOUR FIRST NAME',
    'last_name': 'ADD HERE YOUR LAST NAME',
    'birth_date': 'ADD HERE YOUR BIRTH DATE',
    'country': 'ADD HERE YOUR COUNTRY CODE'
}
db["leaders"].insert_one(you)

InsertOneResult(ObjectId('66be43da4ec6f98db27e563e'), acknowledged=True)

In [12]:
me = {
    'first_name' : 'Mehmet',
    'last_name' : 'Batar',
    'birth_date' : '22.12.1991',
    'country' : 'tr'
}
db['leaders'].insert_one(me)

InsertOneResult(ObjectId('66be43da4ec6f98db27e563f'), acknowledged=True)

Let's have a look to your data!

In [13]:
db["leaders"].find_one({"first_name": "ADD HERE YOUR FIRST NAME"})

{'_id': ObjectId('66be43da4ec6f98db27e563e'),
 'first_name': 'ADD HERE YOUR FIRST NAME',
 'last_name': 'ADD HERE YOUR LAST NAME',
 'birth_date': 'ADD HERE YOUR BIRTH DATE',
 'country': 'ADD HERE YOUR COUNTRY CODE'}

In [14]:
db['leaders'].find_one({'first_name': 'Mehmet'})

{'_id': ObjectId('66be43da4ec6f98db27e563f'),
 'first_name': 'Mehmet',
 'last_name': 'Batar',
 'birth_date': '22.12.1991',
 'country': 'tr'}

We can observe two things here:
- A field `_id` has been automatically added by Mongo. This field is incremental. That means that it will always be higher than the previous element of the collection.
- Some fields are missing (the `place_of_birth` for instance). This is a property of NoSQL. All fields are not mandatory!

We can check missing values by using the query `{"$exists":False}`:

In [15]:
for leader in db["leaders"].find({"place_of_birth":{"$exists":False}}):
    print(leader)

{'_id': ObjectId('66be43da4ec6f98db27e563e'), 'first_name': 'ADD HERE YOUR FIRST NAME', 'last_name': 'ADD HERE YOUR LAST NAME', 'birth_date': 'ADD HERE YOUR BIRTH DATE', 'country': 'ADD HERE YOUR COUNTRY CODE'}
{'_id': ObjectId('66be43da4ec6f98db27e563f'), 'first_name': 'Mehmet', 'last_name': 'Batar', 'birth_date': '22.12.1991', 'country': 'tr'}


### Update data

Since your place of birth is missing in the data, let's add it now. The update function has two main arguments:
- a query that will select the entries to update
- an update operation. As always, it is formatted as a dictionary

In [16]:
db["leaders"].update_one({"first_name": "ADD HERE YOUR FIRST NAME"}, {"$set": {"place_of_birth": "ADD HERE YOUR PLACE OF BIRTH"}})

UpdateResult({'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

In [17]:
db['leaders'].update_one({'first_name': 'Mehmet'}, {"$set": {'place_of_birth': 'Izmir'}})

UpdateResult({'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

#### Remove data

Your dream is over ;-) Since you will not be a leader of tomorrow, we will remove you from the collection. The `delete` (or `delete_one`) function has one main argument: the query that will select the entries to be removed.

In [18]:
db["leaders"].delete_one({"first_name": "ADD HERE YOUR FIRST NAME"})

DeleteResult({'n': 1, 'ok': 1.0}, acknowledged=True)

In [19]:
db['leaders'].delete_one({'first_name': 'Mehmet'})

DeleteResult({'n': 1, 'ok': 1.0}, acknowledged=True)

## Your Turn!

Based on your knowledge and some Google search try to create the following queries:

- Remove the leaders who have an empty or null (`None`) last name
- Display all unique first names from the collection
- Transform all the dates of the dataset by a datetime object (they are currently strings which is not a good practice). You can use a python script that interacts with the DB instead of doing everything in a single query
- Display the 10 older leaders ordered by their birth date (search for how to sort and to use limits in MongoDB)
- Create a Python script that computes the numbers of leaders by country
- Do the same by using a MongoDB [aggregation pipeline](https://www.mongodb.com/docs/manual/aggregation/)

# ******************************************************************** 

- Remove the leaders who have an empty or null (`None`) last name


In [20]:
for leader in db['leaders'].find().limit(5):
    print(leader)

{'_id': ObjectId('66bc9c5657b75da88ded5ec6'), 'id': 'Q7747', 'first_name': 'Vladimir', 'last_name': 'Putin', 'birth_date': datetime.datetime(1952, 10, 7, 0, 0), 'death_date': None, 'place_of_birth': 'Saint Petersburg', 'wikipedia_url': 'https://ru.wikipedia.org/wiki/%D0%9F%D1%83%D1%82%D0%B8%D0%BD,_%D0%92%D0%BB%D0%B0%D0%B4%D0%B8%D0%BC%D0%B8%D1%80_%D0%92%D0%BB%D0%B0%D0%B4%D0%B8%D0%BC%D0%B8%D1%80%D0%BE%D0%B2%D0%B8%D1%87', 'start_mandate': datetime.datetime(2000, 5, 7, 0, 0), 'end_mandate': datetime.datetime(2008, 5, 7, 0, 0), 'country': 'ru'}
{'_id': ObjectId('66bc9c5657b75da88ded5ec7'), 'id': 'Q1951892', 'first_name': 'Mohammed', 'last_name': 'None', 'birth_date': None, 'death_date': None, 'place_of_birth': 'None', 'wikipedia_url': 'https://ar.wikipedia.org/wiki/%D9%85%D8%AD%D9%85%D8%AF_%D8%A7%D9%84%D8%B4%D9%8A%D8%AE', 'start_mandate': datetime.datetime(1544, 1, 1, 0, 0), 'end_mandate': datetime.datetime(1554, 1, 1, 0, 0), 'country': 'ma'}
{'_id': ObjectId('66bc9c5657b75da88ded5ec8'), 'i

In [21]:
leader = db['leaders'].find_one()
print(leader)

{'_id': ObjectId('66bc9c5657b75da88ded5ec6'), 'id': 'Q7747', 'first_name': 'Vladimir', 'last_name': 'Putin', 'birth_date': datetime.datetime(1952, 10, 7, 0, 0), 'death_date': None, 'place_of_birth': 'Saint Petersburg', 'wikipedia_url': 'https://ru.wikipedia.org/wiki/%D0%9F%D1%83%D1%82%D0%B8%D0%BD,_%D0%92%D0%BB%D0%B0%D0%B4%D0%B8%D0%BC%D0%B8%D1%80_%D0%92%D0%BB%D0%B0%D0%B4%D0%B8%D0%BC%D0%B8%D1%80%D0%BE%D0%B2%D0%B8%D1%87', 'start_mandate': datetime.datetime(2000, 5, 7, 0, 0), 'end_mandate': datetime.datetime(2008, 5, 7, 0, 0), 'country': 'ru'}


In [22]:
db['leaders'].delete_many({'last_name': None})



DeleteResult({'n': 0, 'ok': 1.0}, acknowledged=True)

In [23]:
count = db['leaders'].count_documents({})
print(f"Total number of documents: {count}")

Total number of documents: 411


- Display all unique first names from the collection

In [24]:
unique_first_names = db['leaders'].distinct('first_name')
print("Unique First Names: ")
for name in unique_first_names:
    print(name)


Unique First Names: 
Abraham
Achille
Adolphe
Alain
Albert
Alexander
Alexandre
Aloïs
Andrew
Auguste
Barack
Barthélemy
Benjamin
Bill
Boris
Camille
Charles
Chester
Clément
Dmitry
Donald
Dwight
Edmond
Elio
Emmanuel
Felix
Franklin
Frans
François
Frédéric
Félix
Gaston
George
Georges
Gerald
Guy
Gérard
Harry
Hassan
Henri
Henry
Herbert
Herman
Hubert
Jacques
James
Jean
Jean-Baptiste
Jean-Luc
Jimmy
Joe
John
Joseph
Jules
Leo
Louis
Lyndon
Léon
Marie
Mark
Martin
Millard
Mohammed
Napoléon
Nicolas
Patrice
Paul
Paul-Henri
Paul-Émile
Pierre
Pieter
Prosper
Raymond
René
Richard
Ronald
Rutherford
Sophie
Stephen
Sylvain
Theodore
Thomas
Théodore
Ulysses
Valéry
Vincent
Vladimir
Walthère
Warren
Wilfried
William
Woodrow
Yves
Zachary
Émile
Étienne



- Transform all the dates of the dataset by a datetime object (they are currently strings which is not a good practice). You can use a python script that interacts with the DB instead of doing everything in a single query


In [25]:
sample_document = db['leaders'].find_one()

for field, value in sample_document.items():
    print(f"Field: {field}, Type: {type(value)}")

Field: _id, Type: <class 'bson.objectid.ObjectId'>
Field: id, Type: <class 'str'>
Field: first_name, Type: <class 'str'>
Field: last_name, Type: <class 'str'>
Field: birth_date, Type: <class 'datetime.datetime'>
Field: death_date, Type: <class 'NoneType'>
Field: place_of_birth, Type: <class 'str'>
Field: wikipedia_url, Type: <class 'str'>
Field: start_mandate, Type: <class 'datetime.datetime'>
Field: end_mandate, Type: <class 'datetime.datetime'>
Field: country, Type: <class 'str'>


In [26]:
from datetime import datetime

def convert_date(date_str):
        if date_str:
            try:
                return datetime.strptime(date_str, '%Y-%m-%d')
            except ValueError:
                print(f"Date format error for value: {date_str}")
                return None
        return None

In [27]:

for doc in db['leaders'].find():
    birth_date = convert_date(doc.get('birth_date', ''))
    death_date = convert_date(doc.get('death_date', ''))
    start_mandate = convert_date(doc.get('start_mandate', ''))
    end_mandate = convert_date(doc.get('end_mandate', ''))

    update_query = {
        'birth_date' : birth_date,
        'death_date' : death_date,
        'start_mandate': start_mandate,
        'end_mandate' : end_mandate 
    }

    db['leaders'].update_one({'_id' : doc['_id']}, {'$set': update_query})
print('Date transformation completed.')

TypeError: strptime() argument 1 must be str, not datetime.datetime

In [None]:
from datetime import datetime

def convert_date(date_str):
    if date_str:
        try:
            return datetime.strptime(date_str, '%Y-%m-%d')
        except ValueError:
            print(f"Date format error for value: {date_str}")
            return None
    return None

for doc in db['leaders'].find():
    birth_date = convert_date(doc.get('birth_date', ''))
    death_date = convert_date(doc.get('death_date', ''))
    start_mandate = convert_date(doc.get('start_mandate', ''))
    end_mandate = convert_date(doc.get('end_mandate', ''))

    update_query = {
        'birth_date': birth_date,
        'death_date': death_date,
        'start_mandate': start_mandate,
        'end_mandate': end_mandate
    }

    db['leaders'].update_one({'_id': doc['_id']}, {'$set': update_query})

print('Date transformation completed.')


TypeError: strptime() argument 1 must be str, not datetime.datetime

In [None]:
# Güncellenmiş verileri kontrol et
sample_document = db['leaders'].find_one()
for field, value in sample_document.items():
    print(f"Field: {field}, Value: {value}, Type: {type(value)}")


Field: _id, Value: 66bc9c5657b75da88ded5ec6, Type: <class 'bson.objectid.ObjectId'>
Field: id, Value: Q7747, Type: <class 'str'>
Field: first_name, Value: Vladimir, Type: <class 'str'>
Field: last_name, Value: Putin, Type: <class 'str'>
Field: birth_date, Value: 1952-10-07 00:00:00, Type: <class 'datetime.datetime'>
Field: death_date, Value: None, Type: <class 'NoneType'>
Field: place_of_birth, Value: Saint Petersburg, Type: <class 'str'>
Field: wikipedia_url, Value: https://ru.wikipedia.org/wiki/%D0%9F%D1%83%D1%82%D0%B8%D0%BD,_%D0%92%D0%BB%D0%B0%D0%B4%D0%B8%D0%BC%D0%B8%D1%80_%D0%92%D0%BB%D0%B0%D0%B4%D0%B8%D0%BC%D0%B8%D1%80%D0%BE%D0%B2%D0%B8%D1%87, Type: <class 'str'>
Field: start_mandate, Value: 2000-05-07 00:00:00, Type: <class 'datetime.datetime'>
Field: end_mandate, Value: 2008-05-07 00:00:00, Type: <class 'datetime.datetime'>
Field: country, Value: ru, Type: <class 'str'>


- Display the 10 older leaders ordered by their birth date (search for how to sort and to use limits in MongoDB)


In [None]:
oldest_leaders = db['leaders'].find().sort('birth_date', 1).limit(10)

for leader in oldest_leaders:
    print(leader)

{'_id': ObjectId('66bc9c5657b75da88ded5ec7'), 'id': 'Q1951892', 'first_name': 'Mohammed', 'last_name': 'None', 'birth_date': None, 'death_date': None, 'place_of_birth': 'None', 'wikipedia_url': 'https://ar.wikipedia.org/wiki/%D9%85%D8%AD%D9%85%D8%AF_%D8%A7%D9%84%D8%B4%D9%8A%D8%AE', 'start_mandate': datetime.datetime(1544, 1, 1, 0, 0), 'end_mandate': datetime.datetime(1554, 1, 1, 0, 0), 'country': 'ma'}
{'_id': ObjectId('66bcc1b058ed3e93a6e3ea2c'), 'id': 'Q334782', 'first_name': 'Mohammed', 'last_name': 'None', 'birth_date': None, 'death_date': None, 'place_of_birth': 'None', 'wikipedia_url': 'https://ar.wikipedia.org/wiki/%D8%A7%D9%84%D9%82%D8%A7%D8%A6%D9%85_%D8%A8%D8%A3%D9%85%D8%B1_%D8%A7%D9%84%D9%84%D9%87_%D8%A7%D9%84%D8%B3%D8%B9%D8%AF%D9%8A', 'start_mandate': None, 'end_mandate': '1517-01-01', 'country': 'ma'}
{'_id': ObjectId('66bcc1b058ed3e93a6e3ea32'), 'id': 'Q11806', 'first_name': 'John', 'last_name': 'Adams', 'birth_date': None, 'death_date': '1826-07-04', 'place_of_birth': 'Br

In [None]:
oldest_leaders = db['leaders'].find(
    {'birth_date': {'$ne': None}},
    {'first_name': 1, 'last_name': 1, 'birth_date': 1, 'country': 1}
).sort('birth_date', 1).limit(10)


for leader in oldest_leaders:
    print(f"Name: {leader.get('first_name', 'N/A')} {leader.get('last_name', 'N/A')}")
    print(f"Birth Date: {leader.get('birth_date', 'N/A')}")
    print(f"Country: {leader.get('country', 'N/A')}")
    print('-' * 25)

Name: George Washington
Birth Date: 1732-02-22
Country: us
-------------------------
Name: James Madison
Birth Date: 1751-03-16
Country: us
-------------------------
Name: James Monroe
Birth Date: 1758-04-28
Country: us
-------------------------
Name: Andrew Jackson
Birth Date: 1767-03-15
Country: us
-------------------------
Name: John Adams
Birth Date: 1767-07-11
Country: us
-------------------------
Name: William Harrison
Birth Date: 1773-02-09
Country: us
-------------------------
Name: Martin Van Buren
Birth Date: 1782-12-05
Country: us
-------------------------
Name: Zachary Taylor
Birth Date: 1784-11-24
Country: us
-------------------------
Name: Étienne Gerlache
Birth Date: 1785-12-26
Country: be
-------------------------
Name: John Tyler
Birth Date: 1790-03-29
Country: us
-------------------------


In [None]:
sample = db['leaders'].find_one({'birth_date': {'$ne': None}})


print(f"{sample.get('birth_date')}")
print(f"{type(sample.get('birth_date'))}")

1952-10-07 00:00:00
<class 'datetime.datetime'>


In [None]:
docs = db['leaders'].find({'birth_date': {'$ne': None}}).limit(10)

for doc in docs:
    birth_date = doc.get('birth_date')
    print(f"{birth_date}, {type(birth_date)}")


# print(f"{sample.get('birth_date')}")
# print(f"{type(sample.get('birth_date'))}")

1952-10-07 00:00:00, <class 'datetime.datetime'>
1732-02-22 00:00:00, <class 'datetime.datetime'>
1961-08-04 00:00:00, <class 'datetime.datetime'>
1809-02-12 00:00:00, <class 'datetime.datetime'>
1946-07-06 00:00:00, <class 'datetime.datetime'>
1909-08-10 00:00:00, <class 'datetime.datetime'>
1942-11-20 00:00:00, <class 'datetime.datetime'>
1946-08-19 00:00:00, <class 'datetime.datetime'>
1882-01-30 00:00:00, <class 'datetime.datetime'>
1913-07-14 00:00:00, <class 'datetime.datetime'>


- Create a Python script that computes the numbers of leaders by country

In [None]:
from pymongo import MongoClient

country_counts = {}

for leader in db['leaders'].find():
    country = leader.get('country', 'Unkown')
    if country not in country_counts:
        country_counts[country] = 0
    country_counts[country] += 1

# for i in country_counts:
#     print(i)

for  country, count in country_counts.items():
    print(f"Country: {country}, Number of Leaders: {count}")



Country: ru, Number of Leaders: 6
Country: ma, Number of Leaders: 10
Country: us, Number of Leaders: 90
Country: be, Number of Leaders: 106
Country: fr, Number of Leaders: 62


- Do the same by using a MongoDB [aggregation pipeline](https://www.mongodb.com/docs/manual/aggregation/)


In [28]:
from pymongo import MongoClient

country_counts = {}

for leader in db['leaders'].find():
    country = leader.get('country', 'Unkown')
    if country not in country_counts:
        country_counts[country] = 0
    country_counts[country] += 1

sorted_country_leader_count = sorted(country_counts.items(), key = lambda x: x[1], reverse = True)

for  country, count in sorted_country_leader_count:
    print(f"Country: {country}, Number of Leaders: {count}")



Country: be, Number of Leaders: 159
Country: us, Number of Leaders: 135
Country: fr, Number of Leaders: 93
Country: ma, Number of Leaders: 15
Country: ru, Number of Leaders: 9


In [29]:
pipeline = [
    {
        '$group':{
            '_id': '$country',
            'count': { '$sum': 1 }
        }
    },
    {
        '$sort': {'count': -1}
    }
]

result = db['leaders'].aggregate(pipeline)

for item in result:
    print(f"Country: {item['_id']}, Number of Leaders: {item['count']}")




Country: be, Number of Leaders: 159
Country: us, Number of Leaders: 135
Country: fr, Number of Leaders: 93
Country: ma, Number of Leaders: 15
Country: ru, Number of Leaders: 9


## Resources:
* [NoSQL Concepts (DataCamp)](https://www.datacamp.com/courses/nosql-concepts)
* [Introduction to MongoDB using Python (DataCamp)](https://www.datacamp.com/courses/introduction-to-using-mongodb-for-data-science-with-python)
* [Getting started with MongoDB](https://docs.mongodb.com/manual/tutorial/)
* [Python MongoDB Tutorial](https://www.mongodb.com/blog/post/getting-started-with-python-and-mongodb)
* [Introduction to MongoDB Learning Path](https://learn.mongodb.com/learning-paths/introduction-to-mongodb)
* [Build an App With Python, Flask, and MongoDB to Track UFOs](https://www.mongodb.com/developer/languages/python/flask-app-ufo-tracking/)