# MongoDB with Python

---

<img src="images/mongodb_logo.png" width="80%">

**MongoDB** is an open-source document database that provides high performance, high availability and automatic scaling. MongoDB is written in C++.

MongoDB stores data in the form of documents, which are JSON-like field and value pairs. Documents are analogous to structures in programming languages that associate keys with values (e.g. dictionaries, hashes, maps, and associative arrays). _Documents are analogous to one row of a table in relational databases_. Formally, MongoDB documents are BSON documents. _BSON_ is a binary representation of JSON with additional type information. In the documents, the value of a field can be any of the BSON data types, including other documents, arrays, and arrays of documents. 

MongoDB stores all documents in collections. A collection is a group of related documents that have a set of shared common indexes. _Collections are analogous to a table in relational databases_. A collection exists within a single database. Collections do not enforce a schema. Documents within a collection can have different fields. Typically, all documents in a collection are of similar or related purpose.

Documents have dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection's documents may hold different types of data

<img src="images/document.jpg">

### Advantages of MongoDB over RDBMS

* Schema less : MongoDB is document database in which one collection holds different different documents. Number of fields, content and size of the document can be differ from one document to another.
* Structure of a single object is clear.
* No complex joins.
* Deep query-ability. MongoDB supports dynamic queries on documents using a document-based query language that's nearly as powerful as SQL.
* Tuning.
* Ease of scale-out: MongoDB is easy to scale.
* Conversion / mapping of application objects to database objects not needed.
* Uses internal memory for storing the (windowed) working set, enabling faster access of data
* Flexible schema - supports hierarchical data structure.
* Oriented toward programmers - it supports associative arrays such as php arrays, python dictionaries, JSON objects, Ruby hash etc.
* Lots of MongoDB Drivers and Client Libraries 
* Drivers in MongoDB are used for connectivity between client applications and the database. For example, if we have a Python program and we want to connect to MongoDB, then we need to download and integrate the Python driver so that the program can work with the MongoDB database.
* Flexible deployment.
* Documents correspond to native data types in many programming languages.
* Dynamic schema supports fluent polymorphism. 

---
## Installing MongoDB

### <span style="color: blue"><u>Windows OS</u></span>

1. **_Download MongoDB:_**
    
    Download MongoDB from the [MongoDB downloads page](https://www.mongodb.org/downloads?_ga=1.49600800.705977082.1452086983#production). Choose Windows 32 bits or 64 bits. Unzip, extracts to your prefer location, for example: "C:\mongodb\"
    
2. **_Configuration File:_**

    Create a MongoDB config file, it’s just a text file, for example: "d:\mongodb\mongo.config"

        ##store data here
        dbpath=C:\mongodb\data

        ##all output go here
        logpath=C:\mongodb\log\mongo.log

        ##log read and write operations
        diaglog=3

    _Note_: MongoDB need a folder (data directory) to store its data. By default, it will store in "C:\data\db"", create this folder manually. MongoDB won’t create it for you. You can also specify an alternate data directory with `--dbpath` option.
    
3. **_Run MongoDB server:_**

    Use `mongod.exe --config C:\mongodb\mongo.config` to start MongoDB server:

        C:\mongodb\bin>mongod --config C:\mongodb\mongo.config
        all output going to: C:\mongodb\log\mongo.log
        
4. **_Connect to MongoDB:_**

    Uses mongo.exe to connect to the started MongoDB server.

        C:\mongodb\bin>mongo
        MongoDB shell version: 3.2.1
        connecting to: test
        > //mongodb shell
        
5. **_MongoDB as Windows Service:_**

    Add MongoDB as Windows Service, so that MongoDB will start automatically following each system restart:
            
        C:\mongodb\bin> mongod --config C:\mongodb\mongo.config --install

    To start MongoDB Service

        net start MongoDB

    To stop MongoDB Service

        net stop MongoDB
    To remove MongoDB Service

        C:\mongodb\bin>mongod --remove

### <span style="color: blue"><u>Linux</u></span>

The following steps are written for Ubuntu. Other types of Linux platform may have slightly other approach to installation (see [official site](https://docs.mongodb.org/manual/installation/)).

1. **_Importing the Public Key:_**
    
    MongoDB is already included in Ubuntu package repositories, but the official MongoDB repository provides most up-to-date version and is the recommended way of installing the software. Ubuntu ensures the authenticity of software packages by verifying that they are signed with GPG keys, so we first have to import they key for the official MongoDB repository:

        sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10

2. **_Creating a List File:_**

    Next, we have to add the MongoDB repository details so APT will know where to download the packages from. Issue the following command to create a list file for MongoDB:
    
        echo "deb http://repo.mongodb.org/apt/ubuntu "$(lsb_release -sc)"/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list

    After adding the repository details, we need to update the packages list.

        sudo apt-get update

3. **_Installing and Verifying MongoDB:_**

    Now we can install the MongoDB package itself.

        sudo apt-get install -y mongodb-org

    This command will install several packages containing latest stable version of MongoDB along with helpful management tools for the MongoDB server. After package installation MongoDB will be automatically started. You can check this by running the following command.

        service mongod status

    If MongoDB is running, you'll see an output like this (with a different process ID).

        mongod start/running, process 1611

    You can also stop, start, and restart MongoDB using the service command (e.g. `service mongod stop`, `server mongod start`).


### <span style="color: blue"><u>Mac OS

1. **_Download MongoDB:_**

    Get MongoDB from official website, extracts it :

        $ cd ~/Download
        $ tar xzf mongodb-osx-x86_64-3.2.1.tgz
        $ sudo mv mongodb-osx-x86_64-3.2.1 /usr/local/mongodb
        
2. **_MongoDB Data:_**
    
    By default, MongoDB write/store data into the "/data/db" folder, you need to create this folder manually and assign proper permission.

        $ sudo mkdir -p /data/db
        $ whoami
        mkyong
        $ sudo chown mkyong /data/db

3. **_Add mongodb/bin to `$PATH`:_**

    Create a `~/.bash_profile` file and assign "/usr/local/mongodb/bin" to `$PATH` environment variable, so that you can access Mongo’s commands easily.

        $ cd ~
        $ pwd
        /Users/mkyong
        $ touch .bash_profile
        $ vim .bash_profile

        export MONGO_PATH=/usr/local/mongodb
        export PATH=$PATH:$MONGO_PATH/bin

        ##restart terminal

        $ mongo -version
        MongoDB shell version: 3.2.1
        
4. **_Start MongoDB:_**

    Start MongoDB with mongod and make a simple mongo connection with mongo.

    Terminal 1

        $ mongod
        MongoDB starting : pid=34022 port=27017 dbpath=/data/db/ 64-bit host=mkyong.local
        //...
        waiting for connections on port 27017

    Terminal 2

        $ mongo
        MongoDB shell version: 3.2.1
        connecting to: test
        > show dbs
        local	(empty)

    _Note_: If you don’t like the default "/data/db" folder, just specify an alternate path with `--dbpath`
        
        $ mongod --dbpath /any-directory

5. **_Auto Start MongoDB:_**

    To auto start mongoDB, create a launchd job on Mac.

        $ sudo vim /Library/LaunchDaemons/mongodb.plist

The instruction of installing MongoDB on different platforms is also described in details on [official site](https://docs.mongodb.org/manual/installation/) of MongoDB. 

---

# Interaction of Python and MongoDB through PyMongo Library

PyMongo is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python. The most recommended way of intalling PyMongo on your computer is using pip

    pip install pymongo
    
Before starting make sure that you run [mongod](https://docs.mongodb.org/manual/reference/program/mongod/#bin.mongod) process:

* **For Windows OS** (we suppose, that you have installed MongoDB in the folder `C:\mongodb`):
    
    `C:\mongodb\bin\mongod.exe`
    
    
* **For Linux:**
    
    
    sudo service mongod start
    
    
* **For Mac OS:**
   
   `mongod`
   
This material contains only base MongoDB commands and basic usage of PyMongo. More information you can find on [official site of MongoDB](https://docs.mongodb.org/manual/).

### Connection to the server

The first step when working with PyMongo is to create a `MongoClient` to the running mongod instance:

In [None]:
import pymongo
from pymongo import MongoClient
from pymongo.errors import ConnectionFailure

print "pymongo.__version__ =", pymongo.__version__, '\n'

# Connection to Mongo DB
try:
    client = MongoClient()
    print "Connected successfully to", client.address
    print "MongoDB version:", client.server_info()['version']
except ConnectionFailure, e:
    print "Could not connect to MongoDB:", e 

# We can also specify the host and port explicitly, as follows:
# `client = MongoClient('localhost', 27017)`
# or MongoDB URI format:
# `client = MongoClient('mongodb://localhost:27017/')`
    
client

### Quick start

MongoDB creates databases and collections automatically for you if they don't exist already. A single instance of MongoDB can support multiple independent databases. When working with PyMongo you access databases using attribute style access:

In [None]:
db = client.my_database   
# If your database name is such that using attribute style access won’t work (like db-name), 
# you can use dictionary style access instead 
# db = client['my-database']
db

In [None]:
# To know which databases are available:
client.database_names()

We have already created one new database. Why didn't show up with the above command? Well, databases with no collections or with empty collections will not show up with `database_names()`. Same goes when we try to list empty collections in a database.

A collection is a group of documents stored in MongoDB, and can be thought of as roughly the equivalent of a table in a relational database. Getting a collection in PyMongo works the same as getting a database:

In [None]:
collection = db.my_collection
collection

In [None]:
# To see all NOT empty collections
db.collection_names()

> However one must be careful when trying to get existing collections. For example, if you have a collection `db.user` and you type `db.usr` this is clearly a mistake. Unlike an RDBMS, MongoDB won't protect you from this class of mistake.

To **insert** some data into MongoDB, all we need to do is create a dict and call `insert_one()` or `insert_many()` methods on the collection object:

In [None]:
collection.insert_one({"name": "Frodo", "surname": "Baggins", "age": 20})
collection.insert_many(
    [
        {"name": "Rocky", "surname": "Balboa", "age": 38},
        {"name": "Luke", "surname": "Skywalker", "age": 32}
    ]
)

When a document is inserted a special key, `_id`, is automatically added if the document doesn’t already contain an `_id` key. The value of `_id` must be unique across the collection. `insert_one()` returns an instance of `InsertOneResult`.

In [None]:
# Let's see collections now
db.collection_names()

In [None]:
# Get full name of collection including database name
db.my_collection.full_name

In [None]:
# Select the last created document
my_document = collection.find_one()
my_document

You may **delete** documents, collections and databases. `delete_one()` or `remove()` delete a single document, `delete_many()` deletes one or more documents.

In [None]:
# Delete a document
collection.delete_one(my_document)

In [None]:
# Delete a collection
db.drop_collection('my_collection')

In [None]:
# Delete a database
client.drop_database("my_database")

In [None]:
# Let;s see whether database was removed:
client.database_names()

### Bacis usage

Let's create a new database "cinema" with the collection "movies", where we will collect data about films, its actors, director, etc. in each new document. So, at first let's create the document for the film ["Forrest Gump"](https://en.wikipedia.org/wiki/Forrest_Gump). The following picture shows how a SQL table may be transformed to a MongoDB document, i.e. how the relationship one-to-many can be realized.  

<img src="images/forrest_gump.jpg">

In [None]:
db = client.cinema
collection = db.movies

In [None]:
forrest_gump_dict = {
    "title": "Forrest Gump", 
    "released": 1994,
    "duration_min": 142,
    "country": "USA",
    "lang": "English",
    "persons": [
        {
            "name": "Tom Hanks",
            "born": 1956,
            "country": "USA",
            "relation": "ACTED_IN",
            "role": "Forrest Gump"
        },
        {
            "name": "Gary Sinise",
            "born": 1955,
            "country": "USA",
            "relation": "ACTED_IN",
            "role": "Lieutenant Dan Taylor"
        },
        {
            "name": "Robert Zemeckis",
            "born": 1952,
            "country": "USA",
            "relation": "DIRECTED",
        }
    ]
}

collection.insert_one(forrest_gump_dict)

The `find_one()` method **selects and returns** a single document from a collection and returns that document (or None if there are no matches). It is useful when you know there is only one matching document, or are only interested in the first match:

In [None]:
forrest_gump = collection.find_one()
forrest_gump

To **get more than a single document** as the result of a query we use the `find()` method. `find()` returns a `Cursor` instance, which allows us to iterate over all matching documents.

JSON file "data/movies.json" contains data for a few other movies. Let's read it and insert data to `movies` collection:

In [None]:
import json

# Read JSON file and collect its data into dict `data`
with open('data/movies.json', 'r') as f:    
    data = json.load(f)

collection.insert_many(data["movies"])

To **know how many documents** match a query we can perform a `count()` operation: 

In [None]:
collection.count()

In [None]:
db.collection_names(include_system_collections=False)
# `include_system_collections=False` miss "local" collection

In [None]:
# Print movie titles 
for movie in collection.find():
    print movie['title']

MongoDB queries are represented as JSON-like structure, just like documents. To build a query, you just need to specify a dictionary with the properties you wish the results to match. For example, this query will match all documents in the "movies" collection with `"country" == "USA"`:

In [None]:
for movie in collection.find({"country": "USA"}):
    print '{}, {}'.format(movie['title'], movie["country"])

Let's see what is a country of that movie ("Taxi"), which was not included in the above list:

In [None]:
print collection.find_one({"title": "Taxi"})['country']

Queries can also use **special query operators**. These operators include `gt` (`>`), `gte` (`>=`), `lt` (`<`), `lte` (`<=`), `ne` (`!=`), `in` (in array), `nin` (not in array), `exists`, `size` (for arrays), `not`, `or` and many more. The full list of operators of this kind you  may find [here](https://docs.mongodb.org/manual/reference/operator/query/). The following queries show the use of some of these operators:

In [None]:
for movie in collection.find({"released": {"$gte":1999}}):
    print '{}, {}'.format(movie['title'], movie["released"])

In [None]:
q = {
    "$or":[
        {"country": "USA", "released": {"$not": {"$in": [1994, 2010]}}},
        {"continuation": {"$exists": True}, "persons": {"$size": 1}}
    ]
}
for movie in collection.find(q):
    print '{}, {}'.format(movie['title'], movie["released"])

MongoDB can sort query results for you on the server-side. Especially if you are sorting results on a property which has an index, it can sort these far more efficiently than your client program can:

In [None]:
for movie in collection.find().sort([("released", pymongo.DESCENDING)]):
    print '{}, {}'.format(movie['title'], movie["released"])

The above queries are not very optimal when you have large result sets. Pymongo have a `limit()` and `skip()` methods which let you fetch a limited number of results or miss some of them:

In [None]:
for movie in collection.find().sort([("released", pymongo.ASCENDING)]).limit(3):
    print '{}, {}'.format(movie['title'], movie["released"])

In [None]:
for movie in collection.find().sort([("released", pymongo.ASCENDING)]).skip(2).limit(2):
    print '{}, {}'.format(movie['title'], movie["released"])

`distinct()` method allows returning only unigue items for some field:

In [None]:
print "All present languages:"
for movie in collection.find():
    print movie["lang"]

print "\nOnly unique languages:"
for lang in collection.find().distinct("lang"):
    print lang

PyMongo can **update documents** in a number of different ways. Let's start for adding a new document to our collection.

Now we can use the `update()` method to modify the document:

In [None]:
forrest_gump

In [None]:
forrest_gump.update({"box_office_Mdol": 177.9})
forrest_gump

If a field just exists, you can change it value like for a Python dict:

In [None]:
forrest_gump["box_office_Mdol"] = 677.9
forrest_gump

### Updating commands:

The `update()` method replaces the whole document so be careful! If instead we want to modify specific fields of the document we can use MongoDB's update operators like `set`, `inc`, `push`, `pull` and many [more](https://docs.mongodb.org/manual/reference/operator/update/) together with `replace_one()`, `update_one()` or `update_many()` methods.

**Update operator `set`:**

This statement updates in the document in collection where field matches value1 by replacing the value of the field field1 with value2. This operator will add the specified field or fields if they do not exist in this document or replace the existing value of the specified field(s) if they already exist.

In [None]:
collection.update_one({"title": "Taxi"}, {"$set":{"box_office_Mdol": 10, "continuation": None}})
collection.find_one({"title": "Taxi"})

By default MongoDB only modifies the first document that matches the query. If you want to modify all documents that match the query add `multi=True`.

**Update operator `inc`:**

The `inc` operator increments a value by a specified amount if field is present in the document. If the field does not exist, `inc` sets field to the number value.

In [None]:
collection.update_one({"title": "Taxi"}, {"$inc":{"box_office_Mdol": 100}})
# Look at how the value of "box_office_Mdol" changed
collection.find_one({"title": "Taxi"})

**Update operator `unset`:**

The `unset` operator deletes a particular field. If documents match the initial query but do not have the field specified in the unset operation, there the statement has no effect on the document.

In [None]:
collection.update_one({"title": "Taxi"}, {"$unset":{"box_office_Mdol": ""}})
collection.find_one({"title": "Taxi"})

**Update operator `rename`:**

The `rename` operator updates the name of a field. The new field name must differ from the existing field name.

In [None]:
collection.update_one({"title": "Taxi"}, {"$rename":{"lang": "language"}})
collection.find_one({"title": "Taxi"})

**Update operator `push`:**

The `push` operator appends a specified value to an array. Be aware of the following behaviors:

* If the field specified in the push statement (e.g. `{$push: {field: value1}}`) does not exist in the matched document, the operation adds a new array with the specified field and value (e.g. `value1`) to the matched document.

* The operation will fail if the field specified in the push statement is not an array. `$push` does not fail when pushing a value to a non-existent field.

* If value1 is an array itself, push appends the whole array as an element in the identified array. To add multiple items to an array, use `pushAll`.

In [None]:
# The next command will generate an error because "language" field is not an array
collection.update_one({"title": "Taxi"}, {"$push":{"language": "Hindi"}})
collection.find_one({"title": "Taxi"})

In [None]:
collection.update_one({"title": "Taxi"}, 
                      {"$push":{"persons": {'name': 'Frédéric Diefenthal', 'born': 1968, 'country': 'France'} }})
collection.find_one({"title": "Taxi"})

**Update operator `pop`:**

The `pop` operator removes the first or last element of an array. Pass `pop` a value of 1 to remove the last element in an array and a value of -1 to remove the first element of an array. Be aware of the following pop behaviors:

* The `pop` operation fails if field is not an array.
* `pop` will successfully remove the last item in an array. field will then hold an empty array.

In [None]:
# Let's create a new array field in the document for "Taxi" film 
import random 

collection.update_one({"title": "Taxi"}, {"$set":{"array": 
                                                  [{"item1": random.randint(0,10), "item2": random.choice('abcdef')} 
                                                   for i in range(5)]
                                                }})
collection.find_one({"title": "Taxi"})

In [None]:
collection.update_one({"title": "Taxi"}, {"$pop":{"array": 1}})
collection.find_one({"title": "Taxi"})

**Update operator `pull`:**

The `pull` operator removes all instances of a value from an existing array. If the value existed multiple times in the field array, `pull` would remove all instances of this value in this array. It is very handy when you exactly what value you want to remove.

In [None]:
collection.update_one({"title": "Taxi"}, {"$set":{"episodes": ["Taxi", "Taxi 2", "Taxi 3", "Taxi 4"] }})
collection.find_one({"title": "Taxi"})

In [None]:
collection.update_one({"title": "Taxi"}, {"$pull":{"episodes": "Taxi 3"}})
collection.find_one({"title": "Taxi"})

**Update operator `addToSet`:**

The `addToSet` operator adds a value to an array only if the value is not in the array already. If the value is in the array, addToSet returns without modifying the array. Otherwise, `addToSet` behaves the same as push.

In [None]:
collection.update_one({"title": "Taxi"}, {"$addToSet":{"episodes": "Taxi 2"}})
collection.find_one({"title": "Taxi"})

In [None]:
collection.update_one({"title": "Taxi"}, {"$addToSet":{"episodes": "Taxi 3"}})
collection.find_one({"title": "Taxi"})

**Update operator $:**

The positional $ operator identifies an element in an array field to update without explicitly specifying the position of the element in the array. The positional operator, when used with the `update()` method and acts as a placeholder for the first match of the update query selector.

In [None]:
# Update "Taxi 2" to "Taxi 5"
collection.update_one({"title": "Taxi", "episodes": "Taxi 2"}, {"$set":{"episodes.$": "Taxi 5"}})
collection.find_one({"title": "Taxi"})

In [None]:
# Use the positional $ operator to update the value of the "item2" field to zero 
# in the embedded document with the "item1" less than 9:
collection.update_one({"title": "Taxi", "array.item1": {"$lt": 9}}, {"$set":{"array.$.item2": 0}})
collection.find_one({"title": "Taxi"})
# As you may see only one value was updated

### Aggregation:

Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. 

`aggregate()` method performs an aggregation using the aggregation framework on this collection. The `aggregate()` method accepts as its argument an array of stages, where each stage, processed sequentially, describes a data processing step.

Following are the basic pipeline operators and let us make use of these operators over the sample data which we created. We are not going to discuss about Map-Reduce in this post.

* `$match`: this is similar to `find_one()` or `find_many()` methods and SQL's `WHERE` clause; basically this filters the data which is passed on to the next operator. There can be multiple `$match` operators in the pipeline.
* `$unwind`: this is used to unwind document that are using arrays; when using an array the data is kind of pre-joinded and this operation will be undone with this to have individual documents again. 
* `$group`: the group pipeline operator is similar to the SQL's `GROUP BY` clause; is equivalent to `group()` method. The full list of Group Accumulator Operators you can find [here](https://docs.mongodb.org/manual/reference/operator/aggregation/group/).
* `$skip`: with this it is possible to skip forward in the list of documents for a given amount of documents; is equivalent to `skip()` method mentioned above.
* `$limit`: this limits the amount of documents to look at by the given number starting from the current position; is equivalent to `limit()` method.
* `$sort`: sorts the documents; is equivalent to `sort()` method.
* `$project`: used to select some specific fields from a collection.

In [None]:
list(collection.aggregate([{"$match": {"title": "Taxi"}}]))
# The same as collection.find_one({"title": "Taxi"})

In [None]:
# Find all movies on English where Tom Hanks was acked in
list(collection.aggregate([
            {"$match": {"lang": "English"}},
            {"$match": {"persons.name": "Tom Hanks"}}
        ]))

In [None]:
# Calculate movies amount with different languages using $group
list(collection.aggregate([
            {"$group": {"_id": "$lang", "count": {"$sum": 1}, "titles": {"$push": "$title"}}}
        ]))

In [None]:
# Include only "title" and "name" of "persons" and sort them by  
list(collection.aggregate([
            {"$project": {"title": 1, "persons.name": 1, "released": 1, "size": {"$size": "$persons"}}},
            {"$sort": {"released": -1,   # descending order
                       "size": 1}},      # ascending order
            {"$limit": 4}
        ]))

### Some helpfull commands:

1\) If you need to get some statistics about your databases.

In [None]:
db.command({'dbstats': 1})

2\) To get collection statistics use the collstats command:

In [None]:
db.command({'collstats': 'movies'})

3\) `mongodump` is a utility for creating a binary export of the contents of a database. 

To dump your database for backup you call one of these commands on your terminal

    mongodump --db <database_name> --collection <collection_name>

This command will make a dump of given database in JSON and BSON formats. To import your backup file to mongodb you can use the following command on your terminal

    mongorestore --db <database_name> <path_to_bson_file>

You can also use gzip for taking backup of one collection and compressing the backup on the fly:

    mongodump --db <database_name> --collection <collection_name> --out - | gzip > <dump_name>.gz

or with a date in the file name:

    mongodump --db <database_name> --collection <collection_name> --out - | gzip > dump_`date "+%Y-%m-%d"`.gz

---
>### Exersice:
> Many Internet companies, such as Facebook, Google, and Twitter provides
Application Programming Interfaces (or API's) that you can use to build your
own applications. 

> An API is a set of  programming instructions and standards for accessing web
based software applications. A wrapper is an API client, that are commonly used
to wrap the API into easy to use functions by doing the API calls itself.

> The large list of API's which Python provides wrappers are available [here](http://www.pythonapi.com/).

> We have met with API's in the previous lessons and the following tasks are based on usage of Twitter API (including streaming). We are using it here, because requests returning by queries to Twitter API are available in JSON format particularly. As you know now, it is a very comfortable file format for conversion and storage data to the MongoDB.

> **1\.** Create a new MongoDB database "twitter" with two new collections "tweets" and "users". The collection "tweets" will contain data about last appeared 10000 tweets with hashtags _"BigData"_ and _"DataScience"_. The collection "users" will contain data about users which created the tweet, i.e. authors. 

> **2\.** Using `requests` URL queries or `tweepy`'s `search()` method collect last appeared 10000 tweets with hashtags _"BigData"_ and _"DataScience"_. It may be easily done as follows:

>    `tweets = []`<br></br>
>    `last_id = 0`<br></br>

>    `while len(tweets) < 10000:`<br></br>
>    <span style="margin-left:2em"></span>`response = api.search(q=['BigData', 'DataScience'], since_id=last_id, count=100)`<br></br>
>    <span style="margin-left:2em"></span>`last_id = str(results[-1].id)`<br></br>
>    <span style="margin-left:2em"></span>`tweets.extend(response)`

> where `api = tweepy.API(auth)`. Try go get this result also using `requests` library.

> **3\.** Collection "tweets" should contain the following fields from available tweet fields:

>    * `'created_at'`;
>    * `'author_id'` (corresponds to author.id);
>    * `'author_name'` (corresponds to autor.name);
>    * `'retweet_count'`;
>    * `'id'`;
>    * `'lang'`;
>    * `'source'`;
>    * `'text'`.

> Necessary fields in the collection "users":

>    * `'created_at'`;
>    * `'id'`;
>    * `'name'`;
>    * `'description'`;
>    * `'followers_count'`;
>    * `'friends_count'`;
>    * `'lang'`;
>    * `'profile_image_url'`;
>    * `'profile_banner_url'`;
>    * `'location'`;
>    * `'time_zone'`;
>    * `'tweets'` (is an array of tweets ids from "tweets" colection).

> On this step you need fill both collection with respect data. Pay your attention, one user could create more than one tweet from "tweets" collection. IDs of all these tweets should be written to `'tweets'` field for the respective user.

> **4\.** Check that both collections contain only unique tweets and users. <br></br>
> _**Hint:**_ You may use, for example, method `distinct()`.

> **5\.** Create a new collection "bigdata\_tweets\_#date1#\_#date2#" with tweets that contain only "#BigData" hashtag, are written in English, where not retweeted and was created during the last 30 minutes. #date1# corresponds the full date in format 'YYYY-MM-DD hh:mm:ss' of the first created tweet and #date2# is the full date of last created tweet. <br></br>
> _**Hint:**_ The value of `'created_at'` field has the form and type of datetime.datetime object. 

> **6\.** Find TOP 5 tweets (from "tweets" collection) with the largest amount of retweets for each language. If there a few tweets with the same retweets amount, sort them by `"created_at"` in descending order. Display its text, author name and date of creation.

> **7\.** For each timezone find the user with maximal average value of friends and followers who pointed out "en" or "es" or "fr" in the field `"lang"`. Display his name, avatar and the list of tweets (text and date of creation of the tweet) from "tweets" collection. <br></br>
> _**Hint:**_ You may display image by url in the following way:

> `In [1]: from IPython.display import HTML`<br></br>
> <span style="margin-left:4.5em"></span>`bg = api.get_user("BillGates")`<br></br>
> <span style="margin-left:4.5em"></span>`print bg.name`<br></br>
> <span style="margin-left:4.5em"></span>`HTML('<img src="' + bg.profile_banner_url + '" width="700">')`<br></br>
> <span style="margin-left:4.5em"></span>`Bill Gates`<br></br>
> `Out [1]:`
> <img src="images/bill_gates.jpg">