# H5mongo

This databases uses `pymongo` as the backend database. Only meta data (or part of it) is stored in the database, not the raw data

In [1]:
import pymongo
from pymongo import MongoClient

from h5rdmtoolbox import tutorial
import h5rdmtoolbox as h5tbx

from pprint import pprint

h5tbx.__version__

'0.1.3'

In [2]:
client = MongoClient()
client

MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

In [3]:
db = client['h5database_notebook_tutorial']
collection = db['test']

Let's generate some test data

In [4]:
usernames = ('Allen', 'Mike', 'Ellen', 'Alliot')
company = ('bikeCompany', 'shoeCompany', 'bikeCompany', 'shoeCompany')
filenames = []
for i, (username, company) in enumerate(zip(usernames, company)):
    with h5tbx.H5File(h5tbx.generate_temporary_filename(), 'w') as h5:
        filenames.append(h5.hdf_filename)
        h5.attrs['username'] = username
        h5.attrs['company'] = company
        h5.attrs['meta'] = {'day': 'monday', 'iday': 0}
        g = h5.create_group('idgroup')
        g.attrs['id'] = i

Import the mongo module (will add the accessor `mongo` to datasets and groups)

In [5]:
from h5rdmtoolbox.h5database import mongo

## Filling the database

The accessor object `mongo` can be applied both on groups and datasets. If applied on groups, then two parameter values should be considered specifically:
- `recursive`: Whether to write all data of the group and below or not
- `flatten_tree`: The tree-structure can be flatten. Then there will be a DB entry per group and dataset. If False, then only one entry as a dictionary is written:

We devide for `recursive=True` and check how the DB entries look different for `flatten_tree=True` and `flatten_tree=False`.

In [6]:
collection.drop() # delete all entries if already exist
for fname in filenames:
    with h5tbx.H5File(fname) as h5:
        h5.mongo.insert(collection=collection, recursive=True, flatten_tree=False)



Let's inspect the first result:

### Find all:

In [7]:
res = collection.find({})
pprint(res.rewind()[0])

{'__h5rdmtoolbox_version__': '0.1.3',
 '__standard_name_table__': 'EmptyStandardizedNameTable-v0',
 '__wrcls__': 'H5File',
 '_id': ObjectId('62ff27af2a7fa06235ffd468'),
 'company': 'bikeCompany',
 'file_creation_time': datetime.datetime(2022, 8, 19, 6, 3, 27, 488000),
 'idgroup': {'id': 0},
 'meta': {'day': 'monday', 'iday': 0},
 'username': 'Allen'}


### Find inside a dictionary:

The dictionary entry can be used for filter keyword, too:

In [8]:
for r in collection.find({'meta.day': 'monday'}):
    pprint(r)

{'__h5rdmtoolbox_version__': '0.1.3',
 '__standard_name_table__': 'EmptyStandardizedNameTable-v0',
 '__wrcls__': 'H5File',
 '_id': ObjectId('62ff27af2a7fa06235ffd468'),
 'company': 'bikeCompany',
 'file_creation_time': datetime.datetime(2022, 8, 19, 6, 3, 27, 488000),
 'idgroup': {'id': 0},
 'meta': {'day': 'monday', 'iday': 0},
 'username': 'Allen'}
{'__h5rdmtoolbox_version__': '0.1.3',
 '__standard_name_table__': 'EmptyStandardizedNameTable-v0',
 '__wrcls__': 'H5File',
 '_id': ObjectId('62ff27af2a7fa06235ffd469'),
 'company': 'shoeCompany',
 'file_creation_time': datetime.datetime(2022, 8, 19, 6, 3, 27, 498000),
 'idgroup': {'id': 1},
 'meta': {'day': 'monday', 'iday': 0},
 'username': 'Mike'}
{'__h5rdmtoolbox_version__': '0.1.3',
 '__standard_name_table__': 'EmptyStandardizedNameTable-v0',
 '__wrcls__': 'H5File',
 '_id': ObjectId('62ff27af2a7fa06235ffd46a'),
 'company': 'bikeCompany',
 'file_creation_time': datetime.datetime(2022, 8, 19, 6, 3, 27, 498000),
 'idgroup': {'id': 2},
 'm

### Filter for datetime:
Each entry has a `document_last_modified` (time when the document entry was last modified) and a `file_creation_time` (time when HDF file was created):

In [9]:
import datetime
d = datetime.datetime.utcnow()- datetime.timedelta(hours=0, minutes=50)
print(d)
for r in collection.find({"document_last_modified": {"$gt": d}}).sort("username"):
    pprint(r)

2022-08-19 05:13:27.638861


### Get the number of found documents:

In [10]:
collection.count_documents({'company': "shoeCompany"})

2

Total number of documents:

In [11]:
collection.count_documents({})

4

### Get generation_time of document

In [12]:
res.rewind()[0]['_id'].generation_time

datetime.datetime(2022, 8, 19, 6, 3, 27, tzinfo=<bson.tz_util.FixedOffset object at 0x000001EC123E0850>)

### Update a document:

In [13]:
filter = { 'username': 'Allen' }
newvalues = { "$set": { 'username': 'me' } }

collection.update_one(filter, newvalues)

<pymongo.results.UpdateResult at 0x1ec31c5f490>