#  Mongo Database Optimization Walk Through

##  Database and Query Creation

Firstly, all necessary modules are imported:

In [9]:
from datetime import datetime
import pymongo
from mongo_data_insertion import insert
from mongo_optimization import create_indexes, create_additional_fields
from mongo_queries import get_genre, get_band
from mongo_optimization_measurements import measure_query_runtime,query_bands,query_bands_optimized

Creation of MongoDB database and insertion of documents:

In [3]:
insert()

Inserted 10000 of 10000 documents in database

For data retrieval two complex queries were implemented in `mongo_queries.py`. The tasks of the queries are listed respectively:

1. **Get the most successful band in the 2010s (01.01.2010 - 31.12.2019) in the most successful genre of the 1990s (01.01.1990 - 31.12.1999)**

2. **Add a new album to the most successful band of most successful genre in the 1990s, so that it is more successful than all of the albums of the most successful band in this genre in the 2010s**

For the next steps only the first query is considered as the computations in both queries are similar. The next cell runs the first query and outputs the result:

In [10]:
start_90s=datetime(1989,12,31,0,0,0,0)
end_90s=datetime(2000,1,1,0,0,0,0)
start_10s=datetime(2009,12,31,0,0,0,0)
end_10s=datetime(2020,1,1,0,0,0,0)

with pymongo.MongoClient('mongodb://localhost:27017/') as client:
    col = client.musicians.bands
    genre=get_genre(col,start_90s, end_90s)
    band_10s=get_band(col,start_10s, end_10s, genre)
band_10s

'http://dbpedia.org/resource/The_Offspring'

## Database Optimization

The definition of the optimization steps for the MongoDB is listed below:
1. **No optimization**
2. **Optimization by adding indexes on the fields `albums.release_date` and `genres`**
3. **Optimization by precomputing the sum of sales in the 1990s and 2010s and adding additional fields to relevant documents**

For optimizing of the database the cell below is adding indexes on the fields `albums.release_date` and `genres` and precomputing the sum of sales in the 1990s and 2010s and adding additional field to relevant documents:

In [11]:
with pymongo.MongoClient('mongodb://localhost:27017/') as client:
    col = client.musicians.bands
    create_indexes(col)
    create_additional_fields(col)

Inserted 3317 of 3317 fields in database

Measurements for each step of the optimization process are performed by running query 1 `n_runs` and averaging the runtime of each run:

In [8]:
n_runs=10
    
#measuring runtime with not optimized query
with pymongo.MongoClient('mongodb://localhost:27017/') as client:
    col = client.musicians.bands
    col.drop_indexes() 
runtime_not_optimized=measure_query_runtime(query_bands, n_runs)
print('Runtime not optimized: {}'.format(runtime_not_optimized))
    
#measuring runtime creating indexes on release_date and genres
with pymongo.MongoClient('mongodb://localhost:27017/') as client:
    col = client.musicians.bands
    create_indexes(col)
runtime_with_indexes=measure_query_runtime(query_bands, n_runs) 
print('Runtime with indexes: {}'.format(runtime_with_indexes))
    
#measuring runtime using optimized queries
runtime_optimized=measure_query_runtime(query_bands_optimized, n_runs) 
print('Runtime optimized: {}'.format(runtime_optimized))

Runtime not optimized: 0.18725986480712892
Runtime with indexes: 0.16331088542938232
Runtime optimized: 0.10087838172912597
