#### MongoBase starting guide

In [10]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import sys
import time
import threading
import multiprocessing
import datetime as dt
from mongobase.mongobase import MongoBase, db_context
from bson import ObjectId

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


#### ObjectId

First, let's talk about ObjectId.

In [11]:
x = ObjectId()
time.sleep(1)
y = ObjectId()
time.sleep(1)
z = ObjectId()

In [12]:
x

ObjectId('5c821fd0b520d9a7d9820a5d')

In [13]:
str(x)

'5c821fd0b520d9a7d9820a5d'

In [14]:
x.generation_time

datetime.datetime(2019, 3, 8, 7, 54, 56, tzinfo=<bson.tz_util.FixedOffset object at 0x113e38550>)

In [15]:
y.generation_time

datetime.datetime(2019, 3, 8, 7, 54, 57, tzinfo=<bson.tz_util.FixedOffset object at 0x113e38550>)

In [16]:
x < y and y < z

True

Actually, ObjectId is usuful. It is unique, sortable and memory efficient.

http://api.mongodb.com/python/current/api/bson/objectid.html

>An ObjectId is a 12-byte unique identifier consisting of:

>a 4-byte value representing the seconds since the Unix epoch,  
>a 3-byte machine identifier,  
>a 2-byte process id, and  
>a 3-byte counter, starting with a random value.  

And also ObjectId is fast for inserting or indexing. The index size is small.

https://github.com/Restuta/mongo.Guid-vs-ObjectId-performance

#### Define a database model

So now, we create a simple test collection with MongoBase.

In [17]:
class Bird(MongoBase):
    __collection__ = 'birds'
    __structure__ = {
        '_id': ObjectId,
        'name': str,
        'age': int,
        'is_able_to_fly': bool,
        'created': dt.datetime,
        'updated': dt.datetime
    }
    __required_fields__ = ['_id', 'name']
    __default_values__ = {
        '_id': ObjectId(),
        'is_able_to_fly': False,
        'created': dt.datetime.now(dt.timezone.utc),
        'updated': dt.datetime.now(dt.timezone.utc)
    }
    __validators__ = {}
    __indexed_keys__ = {}

The `__structure__` part represents the definition of the model.


#### Basic instractions. (insert, update, find, remove)

Let's try basic instractions like inserts, updates, find and remove. 

Firstly, let's begin with creating an instance to be stored.

In [18]:
chicken = Bird({'_id': ObjectId(), 'name': 'chicken', 'age': 3})

In [19]:
chicken

{'_id': ObjectId('5c821fd2b520d9a7d9820a61'),
 'name': 'chicken',
 'age': 3,
 'is_able_to_fly': False,
 'created': datetime.datetime(2019, 3, 8, 7, 54, 58, 601376, tzinfo=datetime.timezone.utc),
 'updated': datetime.datetime(2019, 3, 8, 7, 54, 58, 601380, tzinfo=datetime.timezone.utc)}

In [20]:
chicken._id.generation_time

datetime.datetime(2019, 3, 8, 7, 54, 58, tzinfo=<bson.tz_util.FixedOffset object at 0x113e38550>)

Good chicken. Let's save while it is fresh.

In [21]:
chicken.save()

{'_id': ObjectId('5c821fd2b520d9a7d9820a61'),
 'name': 'chicken',
 'age': 3,
 'is_able_to_fly': False,
 'created': datetime.datetime(2019, 3, 8, 7, 54, 58, 601376, tzinfo=datetime.timezone.utc),
 'updated': datetime.datetime(2019, 3, 8, 7, 54, 58, 601380, tzinfo=datetime.timezone.utc)}

Chickens are considered to be unable to fly by default. We can let it be enable by updating.

In [22]:
chicken.is_able_to_fly

False

In [23]:
chicken.is_able_to_fly = True
chicken.update()

{'_id': ObjectId('5c821fd2b520d9a7d9820a61'),
 'name': 'chicken',
 'age': 3,
 'is_able_to_fly': True,
 'created': datetime.datetime(2019, 3, 8, 7, 54, 58, 601376, tzinfo=datetime.timezone.utc),
 'updated': datetime.datetime(2019, 3, 8, 7, 54, 58, 601380, tzinfo=datetime.timezone.utc)}

You would be able to see `'is_able_to_fly': True`.  

Chickens grow up in several ways.

In [24]:
chicken.age = 5
chicken = chicken.update()
assert chicken.age == 5, 'something wrong on update()'
chicken = Bird.findAndUpdateById(chicken._id, {'age': 6})
assert chicken.age == 6, 'something wrong on findAndUpdateById()'

Next let's try find methods.

In [25]:
mother_chicken = Bird({'_id': ObjectId(), 'name': 'mother chicken', 'age': 63})
mother_chicken.save()

{'_id': ObjectId('5c821fd2b520d9a7d9820a62'),
 'name': 'mother chicken',
 'age': 63,
 'is_able_to_fly': False,
 'created': datetime.datetime(2019, 3, 8, 7, 54, 58, 601376, tzinfo=datetime.timezone.utc),
 'updated': datetime.datetime(2019, 3, 8, 7, 54, 58, 601380, tzinfo=datetime.timezone.utc)}

Now we can retrieve the same document from database.

In [26]:
Bird.findOne({'name': 'mother chicken'})

{'_id': ObjectId('5c821fd2b520d9a7d9820a62'),
 'name': 'mother chicken',
 'age': 63,
 'is_able_to_fly': False,
 'created': datetime.datetime(2019, 3, 8, 7, 54, 58, 601000),
 'updated': datetime.datetime(2019, 3, 8, 7, 54, 58, 601000)}

It is the same chicken, isn't it? great. Let's clear (eat) it.

In [27]:
mother_chicken.remove()

1

In [28]:
if not Bird.findOne({'_id': mother_chicken._id}):
    print('Yes. The mother chicken not found. Someone might ate it.')

Yes. The mother chicken not found. Someone might ate it.


Now we get all chickens which we stored so far.

In [29]:
all_chickens = Bird.find({'name': 'chicken'}, sort=[('_id', 1)])

In [30]:
len(all_chickens)

1

Or we can count with count() method directly.

In [31]:
Bird.count({'name': 'chicken'})

1

Let's check if the latest chicken is equal to the one which we just saved.

In [32]:
all_chickens[-1]._id.generation_time == chicken._id.generation_time

True

Is that `True`, right?

#### Contextual database

MongoBase automatically creates mongodb client for each process.  
But in some cases, some instances must be written or read for a different client or db.  
If you use db context, it uses a designated database within the context.  
Let's get try on it.

In [33]:
with db_context(db_uri='localhost', db_name='test') as db:
    print(db)
    flamingo = Bird({'_id': ObjectId(), 'name': 'flamingo', 'age': 20})
    flamingo.save(db=db)
    
    flamingo.age = 23
    flamingo = flamingo.update(db=db)
    assert flamingo.age == 23, 'something wrong on update()'
    flamingo = Bird.findAndUpdateById(flamingo._id, {'age': 24}, db=db)
    assert flamingo.age == 24, 'something wrong on findAndUpdateById()'
    
    n_flamingo = Bird.count({'name': 'flamingo'}, db=db)
    print(f'{n_flamingo} flamingo found in the test database.')

n_flamingo = Bird.count({'name': 'flamingo'})
print(f'{n_flamingo} flamingo found in the default database.')
assert n_flamingo == 0

Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, connecttimeoutms=3000, serverselectiontimeoutms=3000, sockettimeoutms=300000, socketkeepalive=True, maxidletimems=40000, maxpoolsize=200, minpoolsize=10, waitqueuemultiple=12, waitqueuetimeoutms=100), 'test')
1 flamingo found in the test database.
0 flamingo found in the default database.


#### Bulk Operation

Many insert operations takes a large computing cost. Fortunately, MongoDB provides an operation named "bulk write".  
It enables to insert many documents in one operation.

Bulk Insert

In [34]:
many_pigeon = []
for i in range(10000):
    many_pigeon += [Bird({'_id': ObjectId(), 'name': f'pigeon', 'age': i})]
print(many_pigeon[1])

{'_id': ObjectId('5c821fd3b520d9a7d9820a66'), 'name': 'pigeon', 'age': 1, 'is_able_to_fly': False, 'created': datetime.datetime(2019, 3, 8, 7, 54, 58, 601376, tzinfo=datetime.timezone.utc), 'updated': datetime.datetime(2019, 3, 8, 7, 54, 58, 601380, tzinfo=datetime.timezone.utc)}


In [35]:
%%time
Bird.bulk_insert(many_pigeon)

CPU times: user 99.1 ms, sys: 3.27 ms, total: 102 ms
Wall time: 127 ms


10000

In [36]:
Bird.count({'name': 'pigeon'})

10000

Bulk Update

In [37]:
updates = []
for pigeon in many_pigeon:
    pigeon.age *= 3
    updates += [pigeon]

In [38]:
%%time
print(len(updates))
Bird.bulk_update(updates)

10000
UpdateOne({'_id': ObjectId('5c821fd3b520d9a7d9820a65')}, {'$set': {'name': 'pigeon', 'age': 0, 'is_able_to_fly': False, 'created': datetime.datetime(2019, 3, 8, 7, 54, 58, 601376, tzinfo=datetime.timezone.utc), 'updated': datetime.datetime(2019, 3, 8, 7, 54, 59, 398739, tzinfo=datetime.timezone.utc)}}, False, None, None)
CPU times: user 247 ms, sys: 5.24 ms, total: 252 ms
Wall time: 475 ms


Check if all ages are updated

In [39]:
%%time
for i, pigeon in enumerate(many_pigeon):
    check = Bird.findOne({'_id': pigeon._id})
    assert check.age == i*3

CPU times: user 1.53 s, sys: 167 ms, total: 1.69 s
Wall time: 2.21 s


No error? Cool.

In [40]:
Bird.delete({'name': 'pigeon'})

10000

#### Multi Threading and Processing

In [41]:
def breed(i):
    try:
        sparrow = Bird({'_id': ObjectId(), 'name': f'sparrow', 'age': 0})
        sparrow.save()
        sparrow.age += 1
        sparrow.update()
    except Exception as e:
        print(f'Exception occured. {e} in thread {threading.current_thread()}')
    else:
        print(f'{i} saved in thread {threading.current_thread()}.')

Threading (using the same memory space)

>The threading module uses threads, the multiprocessing module uses processes. The difference is that threads run in the same memory space, while processes have separate memory. This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock is for.

https://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python

In [42]:
%%time
for i in range(1000):
    t = threading.Thread(target=breed, name=f'breed sparrow {i}', args=(i,))
    t.start()
    
Bird.delete({'name':'sparrow'})

7 saved in thread <Thread(breed sparrow 7, started 123145615044608)>.
6 saved in thread <Thread(breed sparrow 6, started 123145609789440)>.0 saved in thread <Thread(breed sparrow 0, started 123145573003264)>.1 saved in thread <Thread(breed sparrow 1, started 123145583513600)>.


4 saved in thread <Thread(breed sparrow 4, started 123145599279104)>.3 saved in thread <Thread(breed sparrow 3, started 123145594023936)>.5 saved in thread <Thread(breed sparrow 5, started 123145604534272)>.


8 saved in thread <Thread(breed sparrow 8, started 123145620299776)>.
2 saved in thread <Thread(breed sparrow 2, started 123145588768768)>.
9 saved in thread <Thread(breed sparrow 9, started 123145625554944)>.
14 saved in thread <Thread(breed sparrow 14, started 123145573003264)>.
12 saved in thread <Thread(breed sparrow 12, started 123145641320448)>.10 saved in thread <Thread(breed sparrow 10, started 123145630810112)>.

11 saved in thread <Thread(breed sparrow 11, started 123145636065280)>.15 saved in t

206 saved in thread <Thread(breed sparrow 206, started 123145620299776)>.205 saved in thread <Thread(breed sparrow 205, started 123145609789440)>.

207 saved in thread <Thread(breed sparrow 207, started 123145594023936)>.
209 saved in thread <Thread(breed sparrow 209, started 123145625554944)>.
211 saved in thread <Thread(breed sparrow 211, started 123145588768768)>.
210 saved in thread <Thread(breed sparrow 210, started 123145573003264)>.
212 saved in thread <Thread(breed sparrow 212, started 123145615044608)>.208 saved in thread <Thread(breed sparrow 208, started 123145583513600)>.

213 saved in thread <Thread(breed sparrow 213, started 123145604534272)>.215 saved in thread <Thread(breed sparrow 215, started 123145609789440)>.

216 saved in thread <Thread(breed sparrow 216, started 123145594023936)>.
218 saved in thread <Thread(breed sparrow 218, started 123145588768768)>.217 saved in thread <Thread(breed sparrow 217, started 123145573003264)>.
214 saved in thread <Thread(breed sparr

419 saved in thread <Thread(breed sparrow 419, started 123145594023936)>.418 saved in thread <Thread(breed sparrow 418, started 123145573003264)>.420 saved in thread <Thread(breed sparrow 420, started 123145599279104)>.


422 saved in thread <Thread(breed sparrow 422, started 123145615044608)>.421 saved in thread <Thread(breed sparrow 421, started 123145604534272)>.

423 saved in thread <Thread(breed sparrow 423, started 123145620299776)>.
424 saved in thread <Thread(breed sparrow 424, started 123145583513600)>.
425 saved in thread <Thread(breed sparrow 425, started 123145588768768)>.426 saved in thread <Thread(breed sparrow 426, started 123145609789440)>.
427 saved in thread <Thread(breed sparrow 427, started 123145625554944)>.

429 saved in thread <Thread(breed sparrow 429, started 123145599279104)>.
428 saved in thread <Thread(breed sparrow 428, started 123145630810112)>.
431 saved in thread <Thread(breed sparrow 431, started 123145594023936)>.
430 saved in thread <Thread(breed spar

633 saved in thread <Thread(breed sparrow 633, started 123145573003264)>.
635 saved in thread <Thread(breed sparrow 635, started 123145615044608)>.
634 saved in thread <Thread(breed sparrow 634, started 123145583513600)>.
636 saved in thread <Thread(breed sparrow 636, started 123145620299776)>.
637 saved in thread <Thread(breed sparrow 637, started 123145625554944)>.638 saved in thread <Thread(breed sparrow 638, started 123145588768768)>.

640 saved in thread <Thread(breed sparrow 640, started 123145604534272)>.
639 saved in thread <Thread(breed sparrow 639, started 123145599279104)>.
641 saved in thread <Thread(breed sparrow 641, started 123145594023936)>.
642 saved in thread <Thread(breed sparrow 642, started 123145573003264)>.
643 saved in thread <Thread(breed sparrow 643, started 123145583513600)>.
646 saved in thread <Thread(breed sparrow 646, started 123145588768768)>.645 saved in thread <Thread(breed sparrow 645, started 123145615044608)>.
644 saved in thread <Thread(breed sparr

839 saved in thread <Thread(breed sparrow 839, started 123145573003264)>.838 saved in thread <Thread(breed sparrow 838, started 123145620299776)>.
837 saved in thread <Thread(breed sparrow 837, started 123145615044608)>.

840 saved in thread <Thread(breed sparrow 840, started 123145583513600)>.
841 saved in thread <Thread(breed sparrow 841, started 123145588768768)>.
842 saved in thread <Thread(breed sparrow 842, started 123145594023936)>.
843 saved in thread <Thread(breed sparrow 843, started 123145599279104)>.844 saved in thread <Thread(breed sparrow 844, started 123145604534272)>.

846 saved in thread <Thread(breed sparrow 846, started 123145573003264)>.
849 saved in thread <Thread(breed sparrow 849, started 123145588768768)>.845 saved in thread <Thread(breed sparrow 845, started 123145609789440)>.847 saved in thread <Thread(breed sparrow 847, started 123145615044608)>.


848 saved in thread <Thread(breed sparrow 848, started 123145583513600)>.
850 saved in thread <Thread(breed spar

Multiprocessing (using the separated memory for each process)

>PyMongo is not fork-safe. Care must be taken when using instances of MongoClient with fork(). Specifically, instances of MongoClient must not be copied from a parent process to a child process. Instead, the parent process and each child process must create their own instances of MongoClient. Instances of MongoClient copied from the parent process have a high probability of deadlock in the child process due to the inherent incompatibilities between fork(), threads, and locks described below. PyMongo will attempt to issue a warning if there is a chance of this deadlock occurring.
http://api.mongodb.com/python/current/faq.html#pymongo-fork-safe%3E

In [43]:
def breed2(tasks):
    db = Bird._db()  # create a MongoDB Client for the forked process
    try:
        for i in range(len(tasks)):
            sparrow = Bird({'_id': ObjectId(), 'name': f'sparrow', 'age': 0})
            sparrow.save(db=db)
            sparrow.age += 1
            sparrow.update(db=db)
    except Exception as e:
        print(f'Exception occured. {e} in process {multiprocessing.current_process()}')
    else:
        print(f'{len(tasks)} sparrow saved in process {multiprocessing.current_process()}.')

In [44]:
%%time
print(f'{multiprocessing.cpu_count()} cpu resources found.')
tasks = [[f'sparrow {i}' for i in range(250)] for j in range(4)]
process_pool = multiprocessing.Pool(4)
process_pool.map(breed2, tasks)

12 cpu resources found.
250 sparrow saved in process <ForkProcess(ForkPoolWorker-4, started daemon)>.
250 sparrow saved in process <ForkProcess(ForkPoolWorker-2, started daemon)>.
250 sparrow saved in process <ForkProcess(ForkPoolWorker-1, started daemon)>.
250 sparrow saved in process <ForkProcess(ForkPoolWorker-3, started daemon)>.
CPU times: user 11.9 ms, sys: 15.9 ms, total: 27.8 ms
Wall time: 286 ms
