Improve Sync Data Structures #336

vishav1771 · 2020-06-02T19:15:27Z

Description

This PR is related to #249 . In this PR, we are implementing SyncedCollection, SyncedAttrDict, SyncedList, JSONCollection, JSONDict, JSONList.

Motivation and Context

This refractor is to provide support for the multiple backends and resolve #196.

Types of Changes

Documentation update
Bug fix
New feature
Breaking change¹

¹The change breaks (or has the potential to break) existing functionality.

Checklist:

I am familiar with the Contributing Guidelines.
I agree with the terms of the Contributor Agreement.
My name is on the list of contributors.
My code follows the code style guideline of this project.
The changes introduced by this pull request are covered by existing or newly introduced tests.

If necessary:

I have updated the API documentation as part of the package doc-strings.
I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
I have updated the changelog and added all related issue and pull request numbers for future reference (if applicable). See example below.

Example for a changelog entry: Fix issue with launching rockets to the moon (#101, #212).

vyasr · 2020-06-02T21:25:03Z

Cool! What are you thinking of as your next step? Adding file synchronization, or something else?

vishav1771 · 2020-06-04T17:10:10Z

Yes, I'm thinking of adding JSON backend as my next step and then move to buffering.

csadorf

I think it will be important to break the PRs related to this project up into many smaller PRs. I already had a very hard time reading every line in this PR. Ideally a PR should not be much larger than 400 LoCs. I assume this does not matter yet, because this is a draft?

I think it would also be helpful to get some context of which classes were copied from where and whether there were just copied as-is or whether any modifications were made.

Finally, it would be important to know what kind of review you expect from us. I see that this is a draft, but I was explicitly requested for review, so I'm not sure what I should be looking for. General guidance on design? Correctness?

signac/core/collection_api.py

vishav1771 · 2020-06-06T17:38:02Z

I think it will be important to break the PRs related to this project up into many smaller PRs. I already had a very hard time reading every line in this PR. Ideally a PR should not be much larger than 400 LoCs. I assume this does not matter yet, because this is a draft?

@csadorf All the classes are interdependent on each other. So, it will be hard to break them in separate PRs. But I'll try.

Finally, it would be important to know what kind of review you expect from us. I see that this is a draft, but I was explicitly requested for review, so I'm not sure what I should be looking for. General guidance on design? Correctness?

This PR is to track my GSOC project. I want you to make sure I am on the right track.

csadorf · 2020-06-08T07:53:20Z

I think it will be important to break the PRs related to this project up into many smaller PRs. I already had a very hard time reading every line in this PR. Ideally a PR should not be much larger than 400 LoCs. I assume this does not matter yet, because this is a draft?

@csadorf All the classes are interdependent on each other. So, it will be hard to break them in separate PRs. But I'll try.

A PR does not have to be into master. Furthermore, breaking up the changes into smaller increments will force you better identify the interdependencies, which is a good thing. I don't think that asking reviewers to review the changes related to this larger project in one PR is feasible or constructive.

This PR is to track my GSOC project. I want you to make sure I am on the right track.

That's totally fine, but the PR shows up in my review queue when you request a review. So I need to know what I'm supposed to review.

codecov · 2020-06-10T19:52:25Z

Codecov Report

Merging #336 into feature/synced_collections will increase coverage by 0.77%.
The diff coverage is 98.27%.

@@                      Coverage Diff                       @@
##           feature/synced_collections     #336      +/-   ##
==============================================================
+ Coverage                       76.22%   77.00%   +0.77%     
==============================================================
  Files                              43       47       +4     
  Lines                            7096     7367     +271     
==============================================================
+ Hits                             5409     5673     +264     
- Misses                           1687     1694       +7

Impacted Files	Coverage Δ
signac/core/synced_collection.py	`92.77% <92.77%> (ø)`
signac/core/jsoncollection.py	`100.00% <100.00%> (ø)`
signac/core/synced_list.py	`100.00% <100.00%> (ø)`
signac/core/syncedattrdict.py	`100.00% <100.00%> (ø)`
signac/common/connection.py	`35.06% <0.00%> (-3.21%)`	⬇️
signac/common/crypt.py	`75.00% <0.00%> (-1.75%)`	⬇️
signac/common/host.py	`44.85% <0.00%> (-1.58%)`	⬇️
signac/contrib/filesystems.py	`44.66% <0.00%> (-1.57%)`	⬇️
signac/core/utility.py	`70.58% <0.00%> (-0.85%)`	⬇️
signac/contrib/job.py	`90.63% <0.00%> (-0.53%)`	⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2044a76...cc895ea. Read the comment docs.

bdice

@vishav1771 Nice work so far. I left some comments.

signac/core/collection_api.py

tests/test_synced_collection.py

signac/core/collection_api.py

vishav1771 · 2020-07-16T19:46:00Z

tests/test_synced_collection.py

+        # deleting _parent will lead to recursion as _parent is treated as key
+        # load() will check for _parent and __getattr__ will call __getitem__ which calls load()
+        with pytest.raises(RecursionError):
+            sd.load()


I am not sure about this test (deleting a protected attribute). Please suggest any other way to test this.

I think del sd._x should generally be considered unsafe, so the fact that you are testing the behavior at all can be commended. So no further action needed IMO.

vishav1771 · 2020-07-17T18:16:44Z

I have added the tests.

vyasr

@vishav1771 can you add more extended module-level documentation? In particular, synced_collection.py should have a longer docstring explaining the reason that synced collections are useful and what we hope to achieve with them. A couple of other minor suggestions:

I think from_base should error if it's passed an unsynced collection that it can't convert. Right now it just returns the passed object if it can't convert it, but I think the expected behavior would be an isinstance(data, SyncedCollection) test prior to returning.
Regarding the _update method, I can't immediately think of a natural way to combine the implementations so I think the current version is fine for now. The tests look good, so it should be possible to go back and optimize this later if we want.

Aside from that, I think this PR is good to go. We'll probably expose other issues as we go through implementing caching/lazy loading and additional backends, and since we want to do those in separate PRs I don't see a reason to hold this one up any further. @bdice @csadorf @mikemhenry whenever you review please verify that the module docstrings are updated as per my first request, I am approving contingent on that change.

Nice work @vishav1771!

csadorf

This is really coming together now. Very good work! 👏

There are still a few remaining critical issues and I think that the test implementation needs some restructuring before we can merge this.

I've tried to provide some guidance with respect to the test implementation in the comments, but I want to briefly summarize here for improved clarity. The test must be 100% backend generic. That means it must be possible to run the same tests for all classes for all backends. This implies that none of the tests can contain any backend specific code. That is currently not the case. I think the best way to achieve this is to ensure that backend-specific code is only injected via the fixtures.

So, I would just implement all tests for one specific backend, e.g., JSON, and then override only the fixtures for the other backends:

class TestJSONDict:

    @pytest.fixture
    def synced_dict(self):
    # setup JSON backend ... 
    yield json_dict
    # ... tear down JSON backend

    # [rest of test code]
    # ^^^ CAN NOT REFERENCE JSON!!
   
class TestZarrDict(TestJSONDict):

    @pytest.fixture
    def synced_dict(self):
        # setup zarr backend
        yield zarr_dict
        # ... tear down zarr backend

Also, I would like to see at least one, ideally two other backend implementations. My vote is a redis implementation and a zarr implementation, but I assume that is going to be done in separate PRs?

signac/core/synced_collection.py

signac/core/jsoncollection.py

signac/core/synced_collection.py

tests/test_synced_collection.py

csadorf · 2020-07-24T14:56:49Z

tests/test_synced_collection.py

+        # deleting _parent will lead to recursion as _parent is treated as key
+        # load() will check for _parent and __getattr__ will call __getitem__ which calls load()
+        with pytest.raises(RecursionError):
+            sd.load()


I think del sd._x should generally be considered unsafe, so the fact that you are testing the behavior at all can be commended. So no further action needed IMO.

tests/test_synced_collection.py

csadorf

I'm mostly satisfied, however, there are still a number of issues that will warrant another review iteration.

csadorf · 2020-07-27T16:56:42Z

tests/test_synced_collection.py

+        self._tmp_dir.cleanup()
+
+    def test_from_base(self):
+        print(JSONDict.backend)


This seems like it was accidentally added. Please make sure to use git add -up instead of git add or git add -u alone and use git diff --staged to review every single line of code prior to creating the commit.

The reason that these kind of mistakes are highly problematic is because it means I can now no longer trust that you put in due diligence when you created the commit, but need to review every single line of the diff for similar issues, vastly increasing the required effort.

Sorry for the mistake. I'll make sure to check every single line of code before creating the commit. This commit actually changed the whole file, so I must have missed it. It will not happen again.

csadorf · 2020-07-27T16:57:18Z

tests/test_synced_collection.py

+
+class TestSyncedCollectionBase():
+
+    @pytest.fixture(autouse=True)


Can you briefly explain why autouse=True is necessary?

Addendum: After reading the rest of the code, I assume it's because of the use of the implicit self._cls class variable. I've provided explanation below as to why that is a very bad idea. Please remove this.

This fixture only sets temp directory for TestSyncedCollectionBase and cleanup after the test. It does not provide any Collection (synced_dict, synced_list). That's why I have used autouse=True. This is only for TestSyncedCollectionBase.

csadorf · 2020-07-27T16:57:55Z

tests/test_synced_collection.py

+    return str(uuid.uuid4())
+
+
+class TestSyncedCollectionBase():


Why is there still a test base class? It seems to me like this is a JSON-specific test class, because there are lot of references to JSON classes in its implementation. This should be reflected in the class name.

This class tests the code implemented in Synced_Collection. Currently, we only have JSON backend implemented so this class has a lot of references to JSON class. I will add more tests while adding another backend.

csadorf · 2020-07-28T07:21:18Z

tests/test_synced_collection.py

+    def test_invalid_kwargs(self):
+        with pytest.raises(ValueError):
+            return self._cls()


I appreciate your explanation as to the purpose of this test as part of your response to my questions on the PR, but this won't help anyone who stumbles upon this piece of code in the future. The tests must be self-explanatory, and in cases where they are not, you must provide sufficient in-code commentary.

In general, whenever someone asks for more context/explanation regarding as specific code section, it is almost always warranted to at least add more comments for explanation and sometimes to refactor the code to make it clearer.

Furthermore, as pointed out above, using this kind of implicitly created hidden class variable is a serious design flaw. If you wanted to test the constructor of a fixture use type(fixture)().

I apologize again, this has always been a serious shortcoming in my codes. I have been trying to overcome this problem for some time now. Will keep in mind your points. I'll try not to repeat this in the future.

csadorf · 2020-07-28T07:28:02Z

tests/test_synced_collection.py

+    def synced_dict(self):
+        self._tmp_dir = TemporaryDirectory(prefix='jsondict_')
+        self._fn_ = os.path.join(self._tmp_dir.name, FN_JSON)
+        self._cls = JSONDict


I don't think we should be introducing some kind of hidden class variable as part of the fixture creation. And then especially, we should never reference it outside of the fixture implementation. That is a serious flaw in test design.

csadorf · 2020-07-28T07:30:13Z

tests/test_synced_collection.py

+    def test_isinstance(self, synced_dict):
+        assert isinstance(synced_dict, SyncedCollection)
+        assert isinstance(synced_dict, MutableMapping)
+        assert isinstance(synced_dict, self._cls)


As pointed out before, the self._cls variable defeats the purpose of using fixtures in the first place.

csadorf · 2020-07-28T07:37:50Z

tests/test_synced_collection.py

+    @pytest.fixture(autouse=True)
+    def synced_list(self):
+        self._tmp_dir = TemporaryDirectory(prefix='jsondict_')
+        self._fn_ = os.path.join(self._tmp_dir.name, FN_JSON)
+        self._cls = JSONList
+        self._backend_kwargs = {'filename': self._fn_, 'write_concern': self._write_concern}
+        yield self._cls(**self._backend_kwargs)
+        self._tmp_dir.cleanup()


Same issue as with the other test class: please remove the use of the implicit class variable.

csadorf · 2020-07-28T07:38:38Z

tests/test_synced_collection.py

+
+class TestJSONListWriteConcern(TestJSONList):
+
+    _write_concern = True


I think it would be preferable for clarity to override the fixture also for the control of this fixture constructor parameter.

signac/core/synced_collection.py

csadorf

Great job!! 🛳

* Improve Sync Data Structures (#336) Adds a generic framework for synced collections to signac. The abstract SyncedCollection defines the interface for any Collection-like object (as defined by Python's collections.abc.Collection) that is synced with some form of persistent storage. This concept generalizes the existing _SyncedDict and SyncedList classes by implementing a common framework for these. Critically, this framework decouples the act of saving to and loading from the backend from the recursive conversion between a SyncedCollection and its base data type (e.g. a SyncedDict to a dict and vice versa), the latter of which is necessary to ensure proper synchronization of nested collections. This decoupling makes it much easier to enable different back ends for a given data structure without having to modify the synchronization logic. * Add alternative backends for synced_collection (#364) Adds Redis, Zarr, and MongoDB backends. * Added validation layer to SyncedCollection (#378) SyncedCollections can now arbitrary validators to input data to ensure that data conforms to a desired specification. This layer can be employed to ensure, for instance, that all inputs are valid JSON. * Added buffering to SyncedCollection (#363) Adds the ability to buffer I/O in SyncedCollections using an in-memory cache to store data. * Feature/synced collection/reorg (#437) Reorganizes synced collections into their own subpackage and move all corresponding tests into a subdirectory for easier identification. * Feature/synced collection/reorg tests (#438) Cleans up and standardizes the testing architecture so that all backends can easily apply the exact same set of tets using a few minor extra pieces of information. Prevents duplicate execution of tests upon import, while also adding a large number of additional tests for backends that were not previously tested. * Fix incomplete merge * Remove lingering old file. * Feature/synced collection/cleanup (#445) Rename various methods and remove unnecessary ones from the public API. Standardize internal APIs and simplify the implementation of new backends by automating more subclass behaviors. Improve constructors to enable positional arguments. Improve interfaces for various backends by making it easier for the user to specify and access the precise storage mechanisms. * Merge master, apply all relevant formatting tools, and add documentation (#446) Makes the SyncedCollection framework adhere to black, isort, flake8, pydocstyle, and mypy requirements while adding sufficiently detailed documentation to all classes. * Feature/synced collection/cleanup2 (#447) Simplifies and standardizes file buffering implementation. Adds extra tests for SyncedAttrDict.update and simplifies its implementation to use _update. * Feature/synced collection/optimization (#453) Optimize various aspects of the SyncedCollection framework, including validators, abstract base class type lookups, and the core data structures. * Remove unnecessary backend str from tests. * Feature/synced collection/test mongodb redis (#464) MongoDB and Redis will no longer be silently skipped on CI, so any changes that break support for those will be immediately discovered. * Make SyncedCollections thread-safe (#463) Operations that modify SyncedCollections (and their subclasses) now acquire object-specific mutexes prior to modification in order to guarantee consistency between threads. Thread safety can be enabled or disabled by the user. * Implements an in-memory buffer for SyncedCollections (#462) The new buffering mode is a variant on the old one that avoids unnecessary encoding, decoding, and in-memory updating by directly sharing memory between various collections accessing the same file in the buffer. This direct sharing allows all changes to be automatically persisted, avoiding any cache inconsistencies without the high overhead of JSON serialization for every single modification. * Clean up miscellaneous outstanding to-do items (#466) Completes TODO items scattered throughout the code base and removes a number of outdated ones that have already been addressed. * Make buffering thread safe (#468) In addition to synced collections being thread safe individually, while in buffered mode the buffer accesses also have to be made thread safe for multithreaded operation to be safe. This pull request introduces sufficient locking mechanisms to support thread-safe reads from and writes to the buffers. * Feature/synced collection/unify buffering (#469) Unifies the implementation of the two different file buffering modes as much as possible using a shared base class. In addition, this fixes a couple of issues with the thread-safe buffering solution in #468 that only show up on PyPy where execution is fast enough to stress-test multithreaded execution. It also reduces thread locking scopes to the minimum possible. * Feature/synced collection/contexts (#470) Removes usage of contextlib.contextmanager and replaces it with custom context classes implementing the context manager protocol. The contextlib decorator has measurable overhead that these pre-instantiated context managers avoid. Furthermore, many of the context managers in synced collections follow a very similar counter-based pattern that is now generalized and shared between them. * Install pymongo on pypy. * Don't sync on construction. * Add comparison operators to SyncedList and make sure modifying the filename of JSONDict is safe. * Remove unnecessary constructor validation, providing both data and resource arguments (e.g. filename for a JSONDict) is valid. * Fix unused imports. * Feature/synced collection/replace jsondict (#472) The old JSONDict and SyncedAttrDict classes are replaced with the new ones from the SyncedCollections framework. The new classes are now used for the Job's statepoint and document attributes as well as the Project document. The state point is now stored in the new _StatePointDict class, which extends the new JSONDict class to afford greater control over saving and loading of data. Much of the internals of the Job class have also been simplified, with most of the complex logic for job migration and validation when the state point changes now contained within the _StatePointDict. * Replace old JSONDict with new BufferedJSONDict. * Verify that replacing BufferedJSONDict with MemoryBufferedJSONDict. * Remove largely redundant _reset_sp method. * Remove single-use internal functions in Job to reduce surface area for SyncedCollection integration. * Move logic from _init into init. * Working implementation of statepoint using new SyncedCollection. * Remove _check_manifest. * Expose loading explicitly to remove the need for internal laziness in the StatepointDict. * Simplify the code as much as possible by inlining move method and catching the correct error. * Improve documentation of context manager for statepoint loading. * Replace MemoryBufferedJSONDict in Project for now. * Add documentation of why jobs must be stored as a list in the statepoint. * Address PR comments. * Add back import. * Ensure _StatepointDict is always initialized in constructor. * Change _StatepointDict to validate id on load. * Refactor error handling into _StatepointDict class. * Update docstrings. * Update comment. * Fix some docstrings. * Remove redundant JobsCorruptedError check. * Rewrite reset_statepoint to not depend on creating another job. * Reduce direct accesses of internal attributes and do some simplification of the code. * Reraise errors in JSONCollection. * Change reset to require a non-None argument and to call _update internally. * Add reset_data method to provide clear access points of the _StatepointDict for the Job. * Create new internal method for handling resetting. * Move statepoint resetting logic into the statepoint object itself. * Stop accessing internal statepoint filename attribute directly and rely on validation on construction. * Make statepoint thread safe. * Some minor cleanup. * Remove now unnecessary protection of the filename key. * Explicitly document behavior of returning None from _load_from_resource. * Apply suggestions from code review Co-authored-by: Bradley Dice <bdice@bradleydice.com> * Rename SCJSONEncoder to SyncedCollectionJSONEncoder. * Only access old id once. * Move lazy attribute initialization into one location. * Address PR requests that don't cause any issues. * Remove the temporary state point file backup. * Make as many old buffer tests as possible. * Reset buffer size after test. * Last changes from PR review. Co-authored-by: Bradley Dice <bdice@bradleydice.com> * Fix synced collection support for 0d numpy arrays. * Add oldest supported version of pymongo and ensure that zarr/mongo collections don't fail on import. * Deprecate json module (#480) Change all non-deprecated modules to import the built-in json module instead of signac's and deprecate signac.core.json. * Feature/synced collection/reorg (#481) Reorganizes the package structure so that the synced_collections subpackage is now at the package root and is internally structured with subpackages for data types, backends, and buffers. * Move synced collections from core to package root. * Fix all import locations. * Reorganize internals of synced collection package. * Fix all imports for reorganized package. * Hide caching module since it's still experimental. * Address PR comments. * Make __all__ into an empty list not rather than a list containing an empty string. * Feature/synced collection/simplify global buffering (#482) Eliminate the global buffering mode across all backends in favor of a more targeted per-backend approach. * Enable per backend buffering. * Remove global buffering in favor of class-specific buffering. * Reintroduce warnings for deprecated functionality. * Remove truly global buffering and replace it with class-level buffering. * Document new features. * Feature/synced collection/deprecate old (#483) Deprecates the old _SyncedDict, SyncedAttrDict, and JSONDict classes, along with any associated functions and exceptions. * Deprecate old synced dict classes. * Move class deprecation warnings to constructors. * Feature/synced collection/fix buffer reload (#486) * Make sure data type is preserved when reloading from buffer after flush. * Add test of new error case. * Fix lots of documentation issues. * Address first round of PR comments. * Update changelog. * Fix mypy error. * Don't error check uid unless it's provided. * First pass to address PR comments. * Feature/synced collection/remove attr access (#504) This patch removes attribute-based access to synced dictionaries. This logic is moved to a new `AttrDict` mixin class that can be inherited by any other subclasses if this feature is desired. * Simplify definition of __setattr__ by relying on complete list of protected keys. * Move all attribute-based access to a separate mixin class AttrDict. * Rename SyncedAttrDict to SyncedDict. * Move synced_attr_dict to synced_dict. * Remove attribute-based access from existing backend dict types and add new validator to check string keys. * Isolate deprecated non-str key handling to the _convert_key_to_str json validator. * Add tests of the AttrDict behavior. * Use new attrdict based classes in signac and make all tests pass. * Remove support for inheriting protected attributes, they must be defined by the class itself. * Change initialization order to call super first wherever possible. * Address PR comments. * Address final round of PR comments. * Feature/synced collection/numpy (#503) Isolate all numpy logic into a single utility file so that handling of numpy arrays can be standardized. Also substantially improves test coverage, testing a large matrix of different numpy dtypes and shapes with different types of synced collections. The testing framework as a whole is refactored to simplify the code and reduce the amount of boilerplate required to add the new numpy tests. * Initial pass at isolating all numpy logic to a single file. * Use pytest to generate temporary directories. * Stop saving backend kwargs as class variables. * Remove most references to class-level variables in tests. * Remove the _backend_collection variable. * Remove unnecessary autouse fixtures. * Start adding comprehensive tests for numpy conversion. * Make sure locks are released even if saving fails. * Add tests of vector types and add tests for SyncedList as well as SyncedDict. * Use pytest to parametrize numpy types instead of looping manually. * Unify vector and scalar tests. * Stop testing squeezing and just test the relevant shapes directly. * Add test of reset. * Move numpy tests back to main file. * Remove _conert_numpy_scalar, which performed undesirable squeezing of 1-element 1d arrays, and replace usage with _convert_numpy. * Separate numpy testing into separate functions and limit supported behavior to only the necessary use cases. * Add a warning when implicit numpy conversion occurs. * Update changelog. * Address all PR comments aside from numpy conversion in type resolver. * Catch numpy warnings in signac numpy integration tests. * Support older versions of numpy for random number generation. * Fix isort issue introduced by rebase. * Address PR comments. * Allow AbstractTypeResolver to perform arbitrary preprocessing, delegating the numpy-specialized code to the caller and making it less confusing. * Add missing call to _convert_numpy. * Set NUMPY_SHAPES for MongoDB tests when numpy is not present. * Remove add_validators and specify validators in class definition (#507) * Remove add_validators classmethod and instead require validators to be defined at class definition time, preventing one application from modifying validation process for others and giving a standard means to completely override parent validators in parents. * Change type in docstring. * Don't use os.path.join where not needed. (#511) The extra work performed by os.path.join can be slow, so this PR replaces it with direct string concatenation of os.sep. * Don't use os.path.join where not needed. * Update signac/contrib/job.py Co-authored-by: Bradley Dice <bdice@bradleydice.com> * Disable recursive validation during recursive conversion of nested types. (#509) * Feature/synced collection/optimize jsondict validation (#508) Define a single validator for JSONAttrDict classes that combines the logic of other validators while reducing collection traversal costs. Also switch from converting numpy arrays to just bypassing the resolver's cache for them. * Use single separate validator for state points for performance. * Remove preprocessor from type resolver and instead use a blocklist that prevents caching data for certain types. * Reorder resolvers to optimize performance. * Make sure not to include strings as sequences. * Move state point validator to collection_json and use for all JSONAttrDict types. * Make sure to also check complex types. * Add back missing period lost during stash merge. * Address review comments. * Feature/synced collection/optimize protected key lookup (#510) Since the most common operation on protected keys is to check if some key is within the list of protected keys, this patch changes the `_PROTECTED_KEYS` attribute to a set for faster O(1) membership checks. * Switch protected keys from a sequence to a set for faster containment checks. * Change evaluation order of checks in setattr. * Address PR comments. * Defer statepoint instantiation, unify reset_statepoint logic (#497) * Defer state point initialization when lazy loading. * Allow validation to be disabled in SyncedCollection._update. * Unify reset_statepoint logic across methods. * Revert validation-related changes. * Add test, fix bug. * Update tests/test_job.py * Feature/synced collection/optimize (#513) * Add a resoler to fast-exit _from_base for non-collection types. * Optimize synced collection init method. * Remove validators property and access internal variable directly. * Add early exit for _convert_array. * Use locally cached root var instead of self._root. * Remove superfluous duplicate check. * Optionally skip validation in SyncedCollection _update. (#512) * Add option to trust the data source during _update. * Skip validation if JSON is valid. * Fix pre-commit. * Rename trust_source to _validate. * Set _validate=False. Co-authored-by: Vishav Sharma <46069089+vishav1771@users.noreply.github.com> Co-authored-by: Bradley Dice <bdice@bradleydice.com>

Added SyncedCollection, SyncedDict, SyncedList Classes.

2db2f68

vishav1771 requested review from a team as code owners June 2, 2020 19:15

vishav1771 requested review from zhou-pj and yuanzhou0827 and removed request for a team June 2, 2020 19:15

vishav1771 marked this pull request as draft June 2, 2020 19:15

vishav1771 requested review from bdice, csadorf and vyasr and removed request for zhou-pj and yuanzhou0827 June 2, 2020 19:16

Added JsonCollection

0034669

csadorf reviewed Jun 5, 2020

View reviewed changes

signac/core/collection_api.py Outdated Show resolved Hide resolved

signac/core/collection_api.py Outdated Show resolved Hide resolved

vishav1771 added 4 commits June 7, 2020 00:53

Added reset for SyncedDict and SyncedList. Implemented JsonDict

d1dbb45

Removed _load and _sync

5494f4f

Added mypy ignore

99b9dbe

Added _check_methods defination

4385161

Added parent hook and implented _dfs_update as reset

698ac95

bdice reviewed Jun 11, 2020

View reviewed changes

Changes

c3fc1c3

vyasr reviewed Jun 16, 2020

View reviewed changes

signac/core/collection_api.py Outdated Show resolved Hide resolved

vyasr reviewed Jun 16, 2020

View reviewed changes

signac/core/collection_api.py Outdated Show resolved Hide resolved

vyasr reviewed Jun 16, 2020

View reviewed changes

signac/core/collection_api.py Outdated Show resolved Hide resolved

vishav1771 added 4 commits July 16, 2020 16:01

Applied Docstring changes from suggestion

3eddb2e

Added tests

51efd31

Added test_reversed

7c7cd31

Upadted test_update_recursive for codecov

d2512d9

vishav1771 commented Jul 16, 2020

View reviewed changes

Correction in _update in SyncedList

45cc230

vishav1771 requested review from csadorf, mikemhenry and bdice July 17, 2020 18:16

vyasr approved these changes Jul 20, 2020

View reviewed changes

mikemhenry approved these changes Jul 20, 2020

View reviewed changes

vishav1771 added 2 commits July 21, 2020 00:24

Updated modular docstring

74a6b4c

Added test_call

830660b

csadorf requested changes Jul 24, 2020

View reviewed changes

vishav1771 added 2 commits July 25, 2020 22:35

Updated test to be backend generic

c877ae0

Applied Suggestions

c105954

vishav1771 requested a review from csadorf July 25, 2020 19:02

inserted deleted part

85ef4e4

csadorf requested changes Jul 28, 2020

View reviewed changes

vishav1771 added 2 commits July 30, 2020 02:22

Updated test_call

db30f52

Removed self.cls from tests

cc895ea

vishav1771 force-pushed the syncedCollection branch from a5fa53e to cc895ea Compare July 30, 2020 20:28

vishav1771 requested a review from csadorf July 30, 2020 20:32

csadorf approved these changes Jul 31, 2020

View reviewed changes

vyasr merged commit 7605b6b into glotzerlab:feature/synced_collections Jul 31, 2020

vishav1771 mentioned this pull request Aug 3, 2020

Added hypothesis based tesing to SyncedCollection #373

Closed

12 tasks

vishav1771 mentioned this pull request Aug 27, 2020

Added validation layer to SyncedCollection #378

Merged

12 tasks

cbkerr mentioned this pull request Dec 31, 2020

Proposal: Unify dict classes and improve buffering and synchronization #249

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Sync Data Structures #336

Improve Sync Data Structures #336

vishav1771 commented Jun 2, 2020 •

edited

vyasr commented Jun 2, 2020

vishav1771 commented Jun 4, 2020

csadorf left a comment

vishav1771 commented Jun 6, 2020

csadorf commented Jun 8, 2020

codecov bot commented Jun 10, 2020 •

edited

bdice left a comment

vishav1771 Jul 16, 2020

csadorf Jul 24, 2020

vishav1771 commented Jul 17, 2020

vyasr left a comment

csadorf left a comment

csadorf Jul 24, 2020

csadorf left a comment

csadorf Jul 27, 2020

vishav1771 Jul 30, 2020

csadorf Jul 27, 2020

vishav1771 Jul 30, 2020

csadorf Jul 27, 2020

vishav1771 Jul 30, 2020

csadorf Jul 28, 2020

vishav1771 Jul 30, 2020 •

edited

csadorf Jul 28, 2020

csadorf Jul 28, 2020

csadorf Jul 28, 2020

csadorf Jul 28, 2020

csadorf left a comment


		class TestSyncedCollectionBase():

		@pytest.fixture(autouse=True)


		class TestJSONListWriteConcern(TestJSONList):

		_write_concern = True

Improve Sync Data Structures #336

Improve Sync Data Structures #336

Conversation

vishav1771 commented Jun 2, 2020 • edited

Description

Motivation and Context

Types of Changes

Checklist:

vyasr commented Jun 2, 2020

vishav1771 commented Jun 4, 2020

csadorf left a comment

Choose a reason for hiding this comment

vishav1771 commented Jun 6, 2020

csadorf commented Jun 8, 2020

codecov bot commented Jun 10, 2020 • edited

Codecov Report

bdice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vishav1771 commented Jul 17, 2020

vyasr left a comment

Choose a reason for hiding this comment

csadorf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csadorf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vishav1771 Jul 30, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csadorf left a comment

Choose a reason for hiding this comment

vishav1771 commented Jun 2, 2020 •

edited

codecov bot commented Jun 10, 2020 •

edited

vishav1771 Jul 30, 2020 •

edited