DM-14225: Make PosixDatastore's internal records persistent #35

TallJimbo · 2018-04-30T22:07:37Z

The approach I've taken here is:

Add a DatabaseDict ABC that just provides a mechanism for constructing persistent dict-like objects from Config.
Add a SQL implementation thereof, and a way to get an instance of that SQL-backed dict from a Registry.
Replace the internalRegistry in PosixDatastore with DatabaseDict it constructs from config.

Because DatabaseDict uses namedtuple objects for its value type (since, being backed by a database, that needs to be known up front), there's a bit of redundancy between StoredFileInfo and the new namedtuple PosixDatastore.RecordTuple (basically, they only differ on whether to hold a StorageClass instance or name). I imagine we'll want to address that eventually, but just letting them both exist for now minimizes the changes to PosixDatastore, and I think that's a good thing.

timj

I think there needs to be a few more tests. I've made some comments about that and I think there is a bad method call discard vs del. I was initially confused by key/value being everywhere but I think I've got the hang of it.

timj · 2018-05-01T19:50:11Z

tests/test_sqlDatabaseDict.py

+        d = DatabaseDict.fromConfig(self.config, key=self.key, fields=self.fields, value=value)
+        self.checkDatabaseDict(d, data)
+
+    def testFromRegistry(self):


Missing docstring.

timj · 2018-05-01T19:50:57Z

tests/test_sqlDatabaseDict.py

+        self.checkDatabaseDict(d, data)
+
+    def testExtraFields(self):
+        """Test when there are fields not in the value or the key."""


I don't understand this docstring. x and y are allowed so I don't see what "ExtraFields" means. "Extra" implies to me you define a namedtuple with a and b in it and try to store that when the DatabaseDict only expects x, y and z.

This test is to check that it's okay if the table has additional fields the DatabaseDict is not expected to set or retrieve. I've renamed it to testExtraFieldsInTable. I'll add a new testExtraFieldsInValue method for the case you described.

timj · 2018-05-01T19:51:42Z

tests/test_sqlDatabaseDict.py

+        """Test when the value includes all fields, including the key."""
+        value = namedtuple("TestValue", ["x", "y", "z"])
+        data = {
+            0: value(x=0, y="zero", z=0.0),


I'd like a test where the datatypes don't match those used in self.fields.

timj · 2018-05-01T19:53:43Z

tests/test_sqlDatabaseDict.py

+            0: value(x=0, y="zero", z=0.0),
+            1: value(x=1, y="one", z=0.1),
+        }
+        d = registry.makeDatabaseDict(table="TestTable", key=self.key, fields=self.fields, value=value)


Call this table a different name to ensure that nothing odd is happening relating to the config version?

timj · 2018-05-01T20:04:52Z

python/lsst/daf/butler/core/databaseDict.py

+        a tuple thereof.
+    value : `type`
+        The type used for the dictionary's values, typically a `namedtuple`.
+        Must have a ``_field`` class attribute that is a tuple of field


Is this _fields rather than _field? I found this all a bit confusing until I worked out that _fields and _make came from namedtuple.

Yes, _fields. I've updated the documentation to clarify where _fields and _make come from.

timj · 2018-05-01T20:27:28Z

python/lsst/daf/butler/core/sqlDatabaseDict.py

+                pass
+        # If we fail due to an IntegrityError (i.e. duplicate primary key values),
+        # try to do an update instead.
+        kwds.discard(self._key)


Is there a test for this? I think kwds is a dict (OrderedDict probably) and that does not have a discard() method. I think this should be pop() or del.

Good catch - yes, the behavior I was looking for would be kwds.pop(self._key, None), and it does need a test.

timj · 2018-05-01T20:28:13Z

python/lsst/daf/butler/core/sqlDatabaseDict.py

+        kwds = value._asdict()
+        with self._engine.begin() as connection:
+            try:
+                if self._key not in self._value._fields:


Should it be documented that if the key field is found in value, that the supplied key will be ignored? Should we test if key is supplied and it's different to the key field from value?

I suppose this question illustrates that we don't actually need to support the case where the value tuple includes the key. I think it'd simplify the code and the tests to just reject that possibility at construction.

timj · 2018-05-01T20:29:34Z

python/lsst/daf/butler/core/sqlDatabaseDict.py

+        # try to do an update instead.
+        kwds.discard(self._key)
+        with self._engine.begin() as connection:
+            connection.execute(self._updateSql, key=key, **kwds)


Is it inconsistent that set could use the internal key from value but if that fails update is done here with the supplied key?

Another reason I'm glad I removed support for having the key duplicated in the value.

timj · 2018-05-01T20:37:02Z

python/lsst/daf/butler/core/databaseDict.py

+
+
+class DatabaseDict(MutableMapping):
+    """An abstract base class for dict-like objects backed by a database.


"dict-like objects" implies something a bit simpler than what this generally is. It's not simple key/value pairs in a dict that are persisted to a database, it's more of a python model of a database table. I feel like we are close to reimplementing an astropy.table or afw table database serialization system.

Yeah, "dict-like object" would be both simpler and more flexible than than what this. But from the API perspective I think the big difference is the strong types for the key and value, especially the namedtuple values. The implementation in SqlDatabaseDict is getting a bit towards a Python model of a database table, but the very limited dict-like lookup we support is still a lot simpler. I'll update the docstring with a focus on the typing and use of namedtuples.

timj · 2018-05-01T20:47:22Z

python/lsst/daf/butler/core/sqlDatabaseDict.py

+    def __iter__(self):
+        with self._engine.begin() as connection:
+            for row in connection.execute(self._keysSql).fetchall():
+                yield row[0]


I'm a bit surprised that sqlalchemy doesn't have a built in iterator return. On the other hand from the docs for fetchall() I have no idea what it really returns.

I think it does have a built-in iterator that yields full rows, but I think it does need to be adapted as above to obtain an iterator over a single column from each row (even if the row only has one column).

TallJimbo

I believe I've addressed all review comments. I also renamed the "fields" argument to "types", since its purpose is really to provide the types of the fields provided in the "key" and "value". That makes the docs a lot clearer.

timj · 2018-05-02T20:42:25Z

Looks great. Much clearer now. Thank you.

Primarily intended for Datastore's internal records.

timj approved these changes May 1, 2018

View reviewed changes

TallJimbo commented May 2, 2018

View reviewed changes

TallJimbo added 2 commits May 2, 2018 16:42

Add database-backed dict hierarchy.

b8c1459

Primarily intended for Datastore's internal records.

Add makeDatabaseDict to SqlRegistry.

af70221

TallJimbo force-pushed the tickets/DM-14225 branch from 811e165 to 56711b9 Compare May 2, 2018 20:44

Use DatabaseDict for PosixDatastore internal records.

8bf463c

TallJimbo force-pushed the tickets/DM-14225 branch from 56711b9 to 8bf463c Compare May 2, 2018 23:42

TallJimbo merged commit 37f97aa into master May 2, 2018

ktlim deleted the tickets/DM-14225 branch August 25, 2018 06:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-14225: Make PosixDatastore's internal records persistent #35

DM-14225: Make PosixDatastore's internal records persistent #35

TallJimbo commented Apr 30, 2018

timj left a comment

timj May 1, 2018

timj May 1, 2018

TallJimbo May 2, 2018

timj May 1, 2018

timj May 1, 2018

timj May 1, 2018

TallJimbo May 2, 2018

timj May 1, 2018

TallJimbo May 2, 2018

timj May 1, 2018

TallJimbo May 2, 2018 •

edited

timj May 1, 2018

TallJimbo May 2, 2018

timj May 1, 2018

TallJimbo May 2, 2018

timj May 1, 2018

TallJimbo May 2, 2018

TallJimbo left a comment •

edited

timj commented May 2, 2018



		class DatabaseDict(MutableMapping):
		"""An abstract base class for dict-like objects backed by a database.

DM-14225: Make PosixDatastore's internal records persistent #35

DM-14225: Make PosixDatastore's internal records persistent #35

Conversation

TallJimbo commented Apr 30, 2018

timj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo May 2, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo left a comment • edited

Choose a reason for hiding this comment

timj commented May 2, 2018

TallJimbo May 2, 2018 •

edited

TallJimbo left a comment •

edited