#### MongoBase starting guide

In [263]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import sys
import time
import threading
import multiprocessing
import datetime as dt
sys.path.append('../')
from models.mongobase import MongoBase, db_context
from bson import ObjectId

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


#### ObjectId

First, let's talk about ObjectId.

In [264]:
x = ObjectId()
time.sleep(1)
y = ObjectId()
time.sleep(1)
z = ObjectId()

In [265]:
x

ObjectId('5c80c7c516fa0d6c102c3270')

In [266]:
str(x)

'5c80c7c516fa0d6c102c3270'

In [267]:
x.generation_time

datetime.datetime(2019, 3, 7, 7, 27, 1, tzinfo=<bson.tz_util.FixedOffset object at 0x7f32c5282400>)

In [268]:
y.generation_time

datetime.datetime(2019, 3, 7, 7, 27, 2, tzinfo=<bson.tz_util.FixedOffset object at 0x7f32c5282400>)

In [269]:
x < y and y < z

True

Actually, ObjectId is usuful. It is unique, sortable and memory efficient.

http://api.mongodb.com/python/current/api/bson/objectid.html

>An ObjectId is a 12-byte unique identifier consisting of:

>a 4-byte value representing the seconds since the Unix epoch,  
>a 3-byte machine identifier,  
>a 2-byte process id, and  
>a 3-byte counter, starting with a random value.  

And also ObjectId is fast for inserting or indexing. The index size is small.

https://github.com/Restuta/mongo.Guid-vs-ObjectId-performance

#### Define a database model

So now, we create a simple test collection with MongoBase.

In [270]:
class Bird(MongoBase):
    __collection__ = 'birds'
    __structure__ = {
        '_id': ObjectId,
        'name': str,
        'age': int,
        'is_able_to_fly': bool,
        'created': dt.datetime,
        'updated': dt.datetime
    }
    __required_fields__ = ['_id', 'name']
    __default_values__ = {
        '_id': ObjectId(),
        'is_able_to_fly': False,
        'created': dt.datetime.now(dt.timezone.utc),
        'updated': dt.datetime.now(dt.timezone.utc)
    }
    __validators__ = {}
    __indexed_keys__ = {}

The `__structure__` part represents the definition of the model.


#### Basic instractions. (insert, update, find, remove)

Let's try basic instractions like inserts, updates, find and remove. 

Firstly, let's begin with creating an instance to be stored.

In [271]:
chicken = Bird({'_id': ObjectId(), 'name': 'chicken', 'age': 3})

In [272]:
chicken

{'_id': ObjectId('5c80c7c716fa0d6c102c3274'),
 'name': 'chicken',
 'age': 3,
 'is_able_to_fly': False,
 'created': datetime.datetime(2019, 3, 7, 7, 27, 3, 190594, tzinfo=datetime.timezone.utc),
 'updated': datetime.datetime(2019, 3, 7, 7, 27, 3, 190605, tzinfo=datetime.timezone.utc)}

In [273]:
chicken._id.generation_time

datetime.datetime(2019, 3, 7, 7, 27, 3, tzinfo=<bson.tz_util.FixedOffset object at 0x7f32c5282400>)

Good chicken. Let's save while it is fresh.

In [274]:
chicken.save()

{'_id': ObjectId('5c80c7c716fa0d6c102c3274'),
 'name': 'chicken',
 'age': 3,
 'is_able_to_fly': False,
 'created': datetime.datetime(2019, 3, 7, 7, 27, 3, 190594, tzinfo=datetime.timezone.utc),
 'updated': datetime.datetime(2019, 3, 7, 7, 27, 3, 190605, tzinfo=datetime.timezone.utc)}

Chickens are considered to be unable to fly by default. We can let it be enable by updating.

In [275]:
chicken.is_able_to_fly

False

In [276]:
chicken.is_able_to_fly = True
chicken.update()

{'_id': ObjectId('5c80c7c716fa0d6c102c3274'),
 'name': 'chicken',
 'age': 3,
 'is_able_to_fly': True,
 'created': datetime.datetime(2019, 3, 7, 7, 27, 3, 190594, tzinfo=datetime.timezone.utc),
 'updated': datetime.datetime(2019, 3, 7, 7, 27, 3, 190605, tzinfo=datetime.timezone.utc)}

You would be able to see `'is_able_to_fly': True`.  

Chickens grow up in several ways.

In [277]:
chicken.age = 5
chicken = chicken.update()
assert chicken.age == 5, 'something wrong on update()'
chicken = Bird.findAndUpdateById(chicken._id, {'age': 6})
assert chicken.age == 6, 'something wrong on findAndUpdateById()'

Next let's try find methods.

In [278]:
mother_chicken = Bird({'_id': ObjectId(), 'name': 'mother chicken', 'age': 63})
mother_chicken.save()

{'_id': ObjectId('5c80c7c716fa0d6c102c3275'),
 'name': 'mother chicken',
 'age': 63,
 'is_able_to_fly': False,
 'created': datetime.datetime(2019, 3, 7, 7, 27, 3, 190594, tzinfo=datetime.timezone.utc),
 'updated': datetime.datetime(2019, 3, 7, 7, 27, 3, 190605, tzinfo=datetime.timezone.utc)}

Now we can retrieve the same document from database.

In [279]:
Bird.findOne({'name': 'mother chicken'})

{'_id': ObjectId('5c79166716fa0d215968d3ba'),
 'name': 'mother chicken',
 'age': 63,
 'is_able_to_fly': False,
 'created': datetime.datetime(2019, 3, 1, 11, 20, 21, 306000),
 'updated': datetime.datetime(2019, 3, 1, 11, 20, 21, 306000)}

It is the same chicken, isn't it? great. Let's clear (eat) it.

In [280]:
mother_chicken.remove()

1

In [281]:
if not Bird.findOne({'_id': mother_chicken._id}):
    print('Yes. The mother chicken not found. Someone might ate it.')

Yes. The mother chicken not found. Someone might ate it.


Now we get all chickens which we stored so far.

In [282]:
all_chickens = Bird.find({'name': 'chicken'}, sort=[('_id', 1)])

In [283]:
len(all_chickens)

17

Or we can count with count() method directly.

In [284]:
Bird.count({'name': 'chicken'})

17

Let's check if the latest chicken is equal to the one which we just saved.

In [285]:
all_chickens[-1]._id.generation_time == chicken._id.generation_time

True

Is that `True`, right?

#### Contextual database

MongoBase automatically creates mongodb client for each process.  
But in some cases, some instances must be written or read for a different client or db.  
If you use db context, it uses a designated database within the context.  
Let's get try on it.

In [286]:
with db_context(db_uri='localhost', db_name='test') as db:
    print(db)
    flamingo = Bird({'_id': ObjectId(), 'name': 'flamingo', 'age': 20})
    flamingo.save(db=db)
    
    flamingo.age = 23
    flamingo = flamingo.update(db=db)
    assert flamingo.age == 23, 'something wrong on update()'
    flamingo = Bird.findAndUpdateById(flamingo._id, {'age': 24}, db=db)
    assert flamingo.age == 24, 'something wrong on findAndUpdateById()'
    
    n_flamingo = Bird.count({'name': 'flamingo'}, db=db)
    print(f'{n_flamingo} flamingo found in the test database.')

n_flamingo = Bird.count({'name': 'flamingo'})
print(f'{n_flamingo} flamingo found in the default database.')
assert n_flamingo == 0

Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, connecttimeoutms=10000, serverselectiontimeoutms=10000, sockettimeoutms=10000, socketkeepalive=True, maxidletimems=10000, maxpoolsize=200, minpoolsize=10, waitqueuemultiple=12, waitqueuetimeoutms=100), 'test')
32 flamingo found in the test database.
0 flamingo found in the default database.


#### Bulk Operation

Many insert operations takes a large computing cost. Fortunately, MongoDB provides an operation named "bulk write".  
It enables to insert many documents in one operation.

Bulk Insert

In [313]:
many_pigeon = []
for i in range(10000):
    many_pigeon += [Bird({'_id': ObjectId(), 'name': f'pigeon', 'age': i})]
print(many_pigeon[1])

{'_id': ObjectId('5c80c93b16fa0d6c102cab92'), 'name': 'pigeon', 'age': 1, 'is_able_to_fly': False, 'created': datetime.datetime(2019, 3, 7, 7, 27, 3, 190594, tzinfo=datetime.timezone.utc), 'updated': datetime.datetime(2019, 3, 7, 7, 27, 3, 190605, tzinfo=datetime.timezone.utc)}


In [314]:
%%time
Bird.bulk_insert(many_pigeon)

CPU times: user 244 ms, sys: 12 ms, total: 256 ms
Wall time: 295 ms


10000

In [315]:
Bird.count({'name': 'pigeon'})

10000

Bulk Update

In [316]:
updates = []
for pigeon in many_pigeon:
    pigeon.age *= 3
    updates += [pigeon]

In [317]:
%%time
print(len(updates))
Bird.bulk_update(updates)

10000
UpdateOne({'_id': ObjectId('5c80c93b16fa0d6c102cab91')}, {'$set': {'name': 'pigeon', 'age': 0, 'is_able_to_fly': False, 'created': datetime.datetime(2019, 3, 7, 7, 27, 3, 190594, tzinfo=datetime.timezone.utc), 'updated': datetime.datetime(2019, 3, 7, 7, 33, 16, 491090, tzinfo=datetime.timezone.utc)}}, False)
CPU times: user 600 ms, sys: 24 ms, total: 624 ms
Wall time: 1.11 s


Check if all ages are updated

In [319]:
%%time
for i, pigeon in enumerate(many_pigeon):
    check = Bird.findOne({'_id': pigeon._id})
    assert check.age == i*3

CPU times: user 3.21 s, sys: 344 ms, total: 3.56 s
Wall time: 4.81 s


No error? Cool.

In [320]:
Bird.delete({'name': 'pigeon'})

10000

#### Multi Threading and Processing

In [294]:
def breed(i):
    try:
        sparrow = Bird({'_id': ObjectId(), 'name': f'sparrow', 'age': 0})
        sparrow.save()
        sparrow.age += 1
        sparrow.update()
    except Exception as e:
        print(f'Exception occured. {e} in thread {threading.current_thread()}')
    else:
        print(f'{i} saved in thread {threading.current_thread()}.')

Threading (using the same memory space)

>The threading module uses threads, the multiprocessing module uses processes. The difference is that threads run in the same memory space, while processes have separate memory. This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock is for.

https://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python

In [295]:
%%time
for i in range(1000):
    t = threading.Thread(target=breed, name=f'breed sparrow {i}', args=(i,))
    t.start()
    
Bird.delete({'name':'sparrow'})

0 saved in thread <Thread(breed sparrow 0, started 139852918023936)>.
3 saved in thread <Thread(breed sparrow 3, started 139853274806016)>.
2 saved in thread <Thread(breed sparrow 2, started 139853312554752)>.
4 saved in thread <Thread(breed sparrow 4, started 139852389545728)>.
1 saved in thread <Thread(breed sparrow 1, started 139853891618560)>.
5 saved in thread <Thread(breed sparrow 5, started 139852381153024)>.
6 saved in thread <Thread(breed sparrow 6, started 139852372760320)>.
8 saved in thread <Thread(breed sparrow 8, started 139852918023936)>.
7 saved in thread <Thread(breed sparrow 7, started 139852364367616)>.
9 saved in thread <Thread(breed sparrow 9, started 139853274806016)>.
11 saved in thread <Thread(breed sparrow 11, started 139853891618560)>.
10 saved in thread <Thread(breed sparrow 10, started 139852389545728)>.
13 saved in thread <Thread(breed sparrow 13, started 139852372760320)>.
15 saved in thread <Thread(breed sparrow 15, started 139853312554752)>.
12 saved in 

231 saved in thread <Thread(breed sparrow 231, started 139852381153024)>.
221 saved in thread <Thread(breed sparrow 221, started 139852347582208)>.
229 saved in thread <Thread(breed sparrow 229, started 139852372760320)>.
234 saved in thread <Thread(breed sparrow 234, started 139852339189504)>.
233 saved in thread <Thread(breed sparrow 233, started 139852364367616)>.
230 saved in thread <Thread(breed sparrow 230, started 139853312554752)>.
240 saved in thread <Thread(breed sparrow 240, started 139852347582208)>.
236 saved in thread <Thread(breed sparrow 236, started 139852919338752)>.
238 saved in thread <Thread(breed sparrow 238, started 139853274806016)>.
237 saved in thread <Thread(breed sparrow 237, started 139852389545728)>.
239 saved in thread <Thread(breed sparrow 239, started 139852381153024)>.
235 saved in thread <Thread(breed sparrow 235, started 139852255328000)>.
242 saved in thread <Thread(breed sparrow 242, started 139852339189504)>.241 saved in thread <Thread(breed sparr

356 saved in thread <Thread(breed sparrow 356, started 139852389545728)>.
360 saved in thread <Thread(breed sparrow 360, started 139852372760320)>.357 saved in thread <Thread(breed sparrow 357, started 139853312554752)>.

362 saved in thread <Thread(breed sparrow 362, started 139853274806016)>.
359 saved in thread <Thread(breed sparrow 359, started 139852355974912)>.
364 saved in thread <Thread(breed sparrow 364, started 139852919338752)>.
361 saved in thread <Thread(breed sparrow 361, started 139853891618560)>.
363 saved in thread <Thread(breed sparrow 363, started 139852381153024)>.
365 saved in thread <Thread(breed sparrow 365, started 139852364367616)>.
367 saved in thread <Thread(breed sparrow 367, started 139852389545728)>.
366 saved in thread <Thread(breed sparrow 366, started 139853312554752)>.
368 saved in thread <Thread(breed sparrow 368, started 139853274806016)>.
370 saved in thread <Thread(breed sparrow 370, started 139853891618560)>.
371 saved in thread <Thread(breed spar

490 saved in thread <Thread(breed sparrow 490, started 139852339189504)>.
482 saved in thread <Thread(breed sparrow 482, started 139852355974912)>.
489 saved in thread <Thread(breed sparrow 489, started 139853274806016)>.
491 saved in thread <Thread(breed sparrow 491, started 139852347582208)>.
492 saved in thread <Thread(breed sparrow 492, started 139852389545728)>.
494 saved in thread <Thread(breed sparrow 494, started 139852255328000)>.
487 saved in thread <Thread(breed sparrow 487, started 139852919338752)>.
493 saved in thread <Thread(breed sparrow 493, started 139853312554752)>.
495 saved in thread <Thread(breed sparrow 495, started 139852339189504)>.
496 saved in thread <Thread(breed sparrow 496, started 139853274806016)>.
497 saved in thread <Thread(breed sparrow 497, started 139852347582208)>.
500 saved in thread <Thread(breed sparrow 500, started 139852919338752)>.
498 saved in thread <Thread(breed sparrow 498, started 139852355974912)>.
501 saved in thread <Thread(breed spar

613 saved in thread <Thread(breed sparrow 613, started 139853274806016)>.
616 saved in thread <Thread(breed sparrow 616, started 139853891618560)>.
617 saved in thread <Thread(breed sparrow 617, started 139853274806016)>.
618 saved in thread <Thread(breed sparrow 618, started 139852347582208)>.
619 saved in thread <Thread(breed sparrow 619, started 139852919338752)>.
621 saved in thread <Thread(breed sparrow 621, started 139852389545728)>.
615 saved in thread <Thread(breed sparrow 615, started 139852364367616)>.
620 saved in thread <Thread(breed sparrow 620, started 139853312554752)>.
625 saved in thread <Thread(breed sparrow 625, started 139852347582208)>.
622 saved in thread <Thread(breed sparrow 622, started 139852381153024)>.
626 saved in thread <Thread(breed sparrow 626, started 139852389545728)>.
623 saved in thread <Thread(breed sparrow 623, started 139853891618560)>.
629 saved in thread <Thread(breed sparrow 629, started 139853274806016)>.
624 saved in thread <Thread(breed spar

745 saved in thread <Thread(breed sparrow 745, started 139852372760320)>.
744 saved in thread <Thread(breed sparrow 744, started 139852919338752)>.
743 saved in thread <Thread(breed sparrow 743, started 139852355974912)>.
746 saved in thread <Thread(breed sparrow 746, started 139853891618560)>.
742 saved in thread <Thread(breed sparrow 742, started 139853312554752)>.
747 saved in thread <Thread(breed sparrow 747, started 139852389545728)>.
751 saved in thread <Thread(breed sparrow 751, started 139852347582208)>.
753 saved in thread <Thread(breed sparrow 753, started 139852389545728)>.755 saved in thread <Thread(breed sparrow 755, started 139853891618560)>.

754 saved in thread <Thread(breed sparrow 754, started 139853312554752)>.752 saved in thread <Thread(breed sparrow 752, started 139852919338752)>.
750 saved in thread <Thread(breed sparrow 750, started 139853274806016)>.
749 saved in thread <Thread(breed sparrow 749, started 139852364367616)>.
748 saved in thread <Thread(breed sparr

869 saved in thread <Thread(breed sparrow 869, started 139852381153024)>.
872 saved in thread <Thread(breed sparrow 872, started 139852372760320)>.
870 saved in thread <Thread(breed sparrow 870, started 139852364367616)>.
875 saved in thread <Thread(breed sparrow 875, started 139853891618560)>.
868 saved in thread <Thread(breed sparrow 868, started 139853312554752)>.
877 saved in thread <Thread(breed sparrow 877, started 139853274806016)>.
874 saved in thread <Thread(breed sparrow 874, started 139852355974912)>.
876 saved in thread <Thread(breed sparrow 876, started 139852919338752)>.
878 saved in thread <Thread(breed sparrow 878, started 139852364367616)>.
883 saved in thread <Thread(breed sparrow 883, started 139852372760320)>.879 saved in thread <Thread(breed sparrow 879, started 139852381153024)>.

881 saved in thread <Thread(breed sparrow 881, started 139853891618560)>.
880 saved in thread <Thread(breed sparrow 880, started 139853312554752)>.
882 saved in thread <Thread(breed spar

999 saved in thread <Thread(breed sparrow 999, started 139852919338752)>.
998 saved in thread <Thread(breed sparrow 998, started 139853312554752)>.
994 saved in thread <Thread(breed sparrow 994, started 139852372760320)>.
996 saved in thread <Thread(breed sparrow 996, started 139853274806016)>.
997 saved in thread <Thread(breed sparrow 997, started 139852255328000)>.
CPU times: user 1.57 s, sys: 392 ms, total: 1.96 s
Wall time: 1.69 s


Multiprocessing (using the separated memory for each process)

>PyMongo is not fork-safe. Care must be taken when using instances of MongoClient with fork(). Specifically, instances of MongoClient must not be copied from a parent process to a child process. Instead, the parent process and each child process must create their own instances of MongoClient. Instances of MongoClient copied from the parent process have a high probability of deadlock in the child process due to the inherent incompatibilities between fork(), threads, and locks described below. PyMongo will attempt to issue a warning if there is a chance of this deadlock occurring.
http://api.mongodb.com/python/current/faq.html#pymongo-fork-safe%3E

In [296]:
def breed2(tasks):
    db = Bird._db()  # create a MongoDB Client for the forked process
    try:
        for i in range(len(tasks)):
            sparrow = Bird({'_id': ObjectId(), 'name': f'sparrow', 'age': 0})
            sparrow.save(db=db)
            sparrow.age += 1
            sparrow.update(db=db)
    except Exception as e:
        print(f'Exception occured. {e} in process {multiprocessing.current_process()}')
    else:
        print(f'{len(tasks)} sparrow saved in process {multiprocessing.current_process()}.')

In [297]:
%%time
print(f'{multiprocessing.cpu_count()} cpu resources found.')
tasks = [[f'sparrow {i}' for i in range(250)] for j in range(4)]
process_pool = multiprocessing.Pool(4)
process_pool.map(breed2, tasks)

40 cpu resources found.
250 sparrow saved in process <ForkProcess(ForkPoolWorker-28, started daemon)>.
250 sparrow saved in process <ForkProcess(ForkPoolWorker-26, started daemon)>.
250 sparrow saved in process <ForkProcess(ForkPoolWorker-27, started daemon)>.
250 sparrow saved in process <ForkProcess(ForkPoolWorker-25, started daemon)>.
CPU times: user 16 ms, sys: 28 ms, total: 44 ms
Wall time: 373 ms
