Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Sync Data Structures #336

Merged
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
2db2f68
Added SyncedCollection, SyncedDict, SyncedList Classes.
vishav1771 Jun 2, 2020
0034669
Added JsonCollection
vishav1771 Jun 4, 2020
d1dbb45
Added reset for SyncedDict and SyncedList. Implemented JsonDict
vishav1771 Jun 6, 2020
5494f4f
Removed _load and _sync
vishav1771 Jun 6, 2020
99b9dbe
Added mypy ignore
vishav1771 Jun 6, 2020
4385161
Added _check_methods defination
vishav1771 Jun 6, 2020
698ac95
Added parent hook and implented _dfs_update as reset
vishav1771 Jun 10, 2020
c3fc1c3
Changes
vishav1771 Jun 13, 2020
b2cf492
Changes
vishav1771 Jun 17, 2020
c88a9c4
added safe_sync
vishav1771 Jun 18, 2020
1f6068b
Updated test_synced_attrdict for JSONDict
vishav1771 Jun 19, 2020
e3af863
Updated _validate_key
vishav1771 Jun 21, 2020
a929172
Revert Changes in test_synced_attrdict
vishav1771 Jun 23, 2020
73f3d5b
Updated list reset
vishav1771 Jun 24, 2020
32714b5
removed __isinstancecheck__
vishav1771 Jun 25, 2020
d32293c
Apply suggestions from code review
vishav1771 Jun 27, 2020
3b37c75
Suggested Chnages
vishav1771 Jun 27, 2020
8c5bee5
added _update
vishav1771 Jun 29, 2020
9c81919
Added metaclass
vishav1771 Jul 1, 2020
7b973a3
Removed test.json
vishav1771 Jul 1, 2020
e6f5add
Added mypy ignore
vishav1771 Jul 1, 2020
17cf07f
Added Test
vishav1771 Jul 3, 2020
8c75207
Added Test
vishav1771 Jul 3, 2020
0b4f6c1
Added test_remove
vishav1771 Jul 3, 2020
e839fcc
Minor Changes
vishav1771 Jul 5, 2020
6dac397
Add dependabot (#341)
Jul 7, 2020
130d6df
Bump pandas from 0.25.3 to 1.0.5 (#353)
dependabot[bot] Jul 7, 2020
3f48b61
Bump gitdb2 from 3.0.2 to 4.0.2 (#352)
dependabot[bot] Jul 7, 2020
ab1e630
Bump coverage from 5.0.3 to 5.2 (#351)
dependabot[bot] Jul 7, 2020
d87ca4b
Bump gitpython from 3.0.8 to 3.1.3 (#350)
dependabot[bot] Jul 7, 2020
534a9e1
Update dependabot.yml
Jul 7, 2020
51c3110
Apply suggestions from code review
vishav1771 Jul 7, 2020
96d7528
Bump numpy from 1.18 to 1.19.0 (#349)
dependabot[bot] Jul 7, 2020
64bef8f
Bump pytest-cov from 2.8.1 to 2.10.0 (#345)
dependabot[bot] Jul 7, 2020
ef50cd4
Bump psutil from 5.6.7 to 5.7.0 (#346)
dependabot[bot] Jul 7, 2020
c326f8f
Bump pytest-subtests from 0.3.0 to 0.3.1 (#347)
dependabot[bot] Jul 7, 2020
184613f
udated docstring and test
vishav1771 Jul 7, 2020
c0d37c4
udated docstring
vishav1771 Jul 7, 2020
6418493
Merge branch 'master' into syncedCollection
vishav1771 Jul 7, 2020
c1a1808
Added tests
vishav1771 Jul 8, 2020
1a446b6
Updated doctstring
vishav1771 Jul 9, 2020
ac363eb
Changes
vishav1771 Jul 12, 2020
c597750
Removed metaclass
vishav1771 Jul 13, 2020
6303364
Updated docstring of register
vishav1771 Jul 14, 2020
acec603
Updated test_init
vishav1771 Jul 14, 2020
eceddd3
Correction in test_init
vishav1771 Jul 14, 2020
3eddb2e
Applied Docstring changes from suggestion
vishav1771 Jul 16, 2020
51efd31
Added tests
vishav1771 Jul 16, 2020
7c7cd31
Added test_reversed
vishav1771 Jul 16, 2020
d2512d9
Upadted test_update_recursive for codecov
vishav1771 Jul 16, 2020
45cc230
Correction in _update in SyncedList
vishav1771 Jul 16, 2020
74a6b4c
Updated modular docstring
vishav1771 Jul 20, 2020
830660b
Added test_call
vishav1771 Jul 21, 2020
c877ae0
Updated test to be backend generic
vishav1771 Jul 25, 2020
c105954
Applied Suggestions
vishav1771 Jul 25, 2020
85ef4e4
inserted deleted part
vishav1771 Jul 27, 2020
db30f52
Updated test_call
vishav1771 Jul 29, 2020
cc895ea
Removed `self.cls` from tests
vishav1771 Jul 30, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
9 changes: 9 additions & 0 deletions .github/dependabot.yml
@@ -0,0 +1,9 @@
version: 2
updates:
- package-ecosystem: pip
directory: "/"
schedule:
interval: monthly
open-pull-requests-limit: 10
reviewers:
- glotzerlab/signac-committers
12 changes: 6 additions & 6 deletions requirements-benchmark.txt
@@ -1,7 +1,7 @@
click>=7.0
numpy==1.18
gitdb2==3.0.2
GitPython==3.0.8
pandas==0.25.3; implementation_name!='cpython' --no-binary pandas
pandas==0.25.3; implementation_name=='cpython' --no-binary :none:
psutil==5.6.7
numpy==1.19.0
gitdb2==4.0.2
GitPython==3.1.3
pandas==1.0.5; implementation_name!='cpython' --no-binary pandas
pandas==1.0.5; implementation_name=='cpython' --no-binary :none:
psutil==5.7.0
12 changes: 6 additions & 6 deletions requirements-dev.txt
@@ -1,13 +1,13 @@
coverage==5.0.3
numpy==1.18
pandas==0.25.3; implementation_name!='cpython' --no-binary pandas
pandas==0.25.3; implementation_name=='cpython' --no-binary :none:
coverage==5.2
numpy==1.19.0
pandas==1.0.5; implementation_name!='cpython' --no-binary pandas
pandas==1.0.5; implementation_name=='cpython' --no-binary :none:
h5py==2.10; implementation_name=='cpython'
tables==3.6.1; implementation_name=='cpython'
click>=7.0
ruamel.yaml>=0.15.89
pytest==5.3.4
pytest-subtests==0.3.0
pytest-subtests==0.3.1
pydocstyle==5.0.2
pytest-cov==2.8.1
pytest-cov==2.10.0
pymongo==3.10.1
2 changes: 1 addition & 1 deletion setup.cfg
Expand Up @@ -15,7 +15,7 @@ max-line-length = 100
exclude = configobj,passlib,cite.py,conf.py

[pydocstyle]
match = jsondict.py
match = jsondict.py | synced_collection.py | jsoncollection.py | syncedattrdict.py | synced_list.py
ignore = D105, D107, D203, D213

[mypy]
Expand Down
151 changes: 151 additions & 0 deletions signac/core/jsoncollection.py
@@ -0,0 +1,151 @@
# Copyright (c) 2020 The Regents of the University of Michigan
# All rights reserved.
# This software is licensed under the BSD 3-Clause License.
"""Implements JSON-backend.

This implements the JSON-backend for SyncedCollection API by
implementing sync and load methods.
"""

import os
import json
import errno
import uuid

from .synced_collection import SyncedCollection
from .syncedattrdict import SyncedAttrDict
from .synced_list import SyncedList


class JSONCollection(SyncedCollection):
"""Implement sync and load using a JSON back end."""

backend = __name__ # type: ignore

def __init__(self, filename=None, write_concern=False, **kwargs):
self._filename = os.path.realpath(filename) if filename is not None else None
self._write_concern = write_concern
super().__init__(**kwargs)
if (filename is None) == (self._parent is None):
raise ValueError(
"Illegal argument combination, one of the two arguments, "
"parent or filename must be None, but not both.")

def _load(self):
"""Load the data from a JSON-file."""
try:
with open(self._filename, 'rb') as file:
blob = file.read()
return json.loads(blob)
except IOError as error:
if error.errno == errno.ENOENT:
return None

def _sync(self):
"""Write the data to json file."""
data = self.to_base()
# Serialize data:
blob = json.dumps(data).encode()
# When write_concern flag is set, we write the data into dummy file and then
# replace that file with original file.
if self._write_concern:
vishav1771 marked this conversation as resolved.
Show resolved Hide resolved
dirname, filename = os.path.split(self._filename)
fn_tmp = os.path.join(dirname, '._{uid}_{fn}'.format(
uid=uuid.uuid4(), fn=filename))
with open(fn_tmp, 'wb') as tmpfile:
tmpfile.write(blob)
os.replace(fn_tmp, self._filename)
else:
with open(self._filename, 'wb') as file:
file.write(blob)


class JSONDict(JSONCollection, SyncedAttrDict):
"""A dict-like mapping interface to a persistent JSON file.

The JSONDict inherits from :class:`~core.collection_api.SyncedCollection`
and :class:`~core.syncedattrdict.SyncedAttrDict`.

.. code-block:: python

doc = JSONDict('data.json', write_concern=True)
doc['foo'] = "bar"
assert doc.foo == doc['foo'] == "bar"
assert 'foo' in doc
del doc['foo']

.. code-block:: python

>>> doc['foo'] = dict(bar=True)
>>> doc
{'foo': {'bar': True}}
>>> doc.foo.bar = False
{'foo': {'bar': False}}

.. warning::

While the JSONDict object behaves like a dictionary, there are
important distinctions to remember. In particular, because operations
are reflected as changes to an underlying file, copying (even deep
copying) a JSONDict instance may exhibit unexpected behavior. If a
true copy is required, you should use the `to_base()` method to get a
dictionary representation, and if necessary construct a new JSONDict
instance: `new_dict = JSONDict(old_dict.to_base())`.

Parameters
----------
filename: str, optional
The filename of the associated JSON file on disk (Default value = None).
data: mapping, optional
The intial data pass to JSONDict. Defaults to `list()`
parent: object, optional
A parent instance of JSONDict or None (Default value = None).
write_concern: bool, optional
Ensure file consistency by writing changes back to a temporary file
first, before replacing the original file (Default value = None).
"""

pass


class JSONList(JSONCollection, SyncedList):
"""A non-string sequence interface to a persistent JSON file.

The JSONDict inherits from :class:`~core.collection_api.SyncedCollection`
and :class:`~core.syncedlist.SyncedList`.

.. code-block:: python

synced_list = JSONList('data.json', write_concern=True)
synced_list.append("bar")
assert synced_list[0] == "bar"
assert len(synced_list) == 1
del synced_list[0]

.. warning::

While the JSONList object behaves like a list, there are
important distinctions to remember. In particular, because operations
are reflected as changes to an underlying file, copying (even deep
copying) a JSONList instance may exhibit unexpected behavior. If a
true copy is required, you should use the `to_base()` method to get a
dictionary representation, and if necessary construct a new JSONList
instance: `new_list = JSONList(old_list.to_base())`.

Parameters
----------
filename: str
The filename of the associated JSON file on disk (Default value = None).
data: non-str Sequence
The intial data pass to JSONDict
parent: object
A parent instance of JSONDict or None (Default value = None).
write_concern: bool
Ensure file consistency by writing changes back to a temporary file
first, before replacing the original file (Default value = None).
"""

pass


SyncedCollection.register(JSONDict, JSONList)
164 changes: 164 additions & 0 deletions signac/core/synced_collection.py
@@ -0,0 +1,164 @@
# Copyright (c) 2020 The Regents of the University of Michigan
csadorf marked this conversation as resolved.
Show resolved Hide resolved
# All rights reserved.
# This software is licensed under the BSD 3-Clause License.
"""Implement the SyncedCollection class.

SyncedCollection encapsulates the synchronization of different data-structures.
These features are implemented in different subclasses which enable us to use a
backend with different data-structures or vice-versa. It declares as abstract
methods the methods that must be implemented by any subclass to match the API.
"""

from contextlib import contextmanager
from abc import abstractmethod
from collections import defaultdict
from collections.abc import Collection

try:
import numpy
NUMPY = True
except ImportError:
NUMPY = False


class SyncedCollection(Collection):
"""The base synced collection represents a collection that is synced with a backend.

The class is intended for use as an ABC. The SyncedCollection is a
:class:`~collections.abc.Collection` where all data is stored persistently
in the underlying backend. The backend name wil be same as the module name.
"""

backend = None

def __init__(self, parent=None):
self._data = None
self._parent = parent
self._suspend_sync_ = 0

@classmethod
def register(cls, *args):
"""Register the synced data structures.

Registry is used when recursively converting synced data structures to determine
what to convert their children into.

Parameters
----------
*args
Classes to register
"""
if not hasattr(cls, 'registry'):
cls.registry = defaultdict(list)
for _cls in args:
cls.registry[_cls.backend].append(_cls)

@classmethod
def from_base(cls, data, backend=None, **kwargs):
"""Dynamically resolve the type of object to the corresponding synced collection.

Parameters
----------
data : any
Data to be converted from base class.
backend: str
Name of backend for synchronization. Default to backend of class.
**kwargs:
Kwargs passed to instance of synced collection.

Returns
-------
data : object
Synced object of corresponding base type.
"""
backend = cls.backend if backend is None else backend
if backend is None:
raise ValueError("No backend found.")
for _cls in cls.registry[backend]:
if _cls.is_base_type(data):
return _cls(data=data, **kwargs)
if NUMPY:
if isinstance(data, numpy.number):
return data.item()
return data

@abstractmethod
def to_base(self):
"""Dynamically resolve the synced collection to the corresponding base type."""
pass

@contextmanager
def _suspend_sync(self):
"""Prepare context where load and sync are suspended."""
self._suspend_sync_ += 1
yield
self._suspend_sync_ -= 1

@classmethod
@abstractmethod
def is_base_type(cls, data):
"""Check whether data is of the same base type (such as list or dict) as this class."""
pass

@abstractmethod
def _load(self):
"""Load data from underlying backend."""
pass

@abstractmethod
def _sync(self):
"""Write data to underlying backend."""
pass

def sync(self):
"""Synchronize the data with the underlying backend."""
if self._suspend_sync_ <= 0:
if self._parent is None:
self._sync()
else:
self._parent.sync()

def load(self):
"""Load the data from the underlying backend."""
if self._suspend_sync_ <= 0:
if self._parent is None:
data = self._load()
with self._suspend_sync():
self._update(data)
else:
self._parent.load()

# methods having same implementaion for all data-structures
vishav1771 marked this conversation as resolved.
Show resolved Hide resolved
def __getitem__(self, key):
self.load()
return self._data[key]

def __delitem__(self, item):
del self._data[item]
self.sync()

def __iter__(self):
self.load()
return iter(self._data)

def __len__(self):
self.load()
return len(self._data)

def __call__(self):
self.load()
return self.to_base()

def __eq__(self, other):
if isinstance(other, type(self)):
return self() == other()
else:
return self() == other

def __repr__(self):
self.load()
return repr(self._data)

def __str__(self):
self.load()
return str(self._data)