New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-14225: Make PosixDatastore's internal records persistent #35
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there needs to be a few more tests. I've made some comments about that and I think there is a bad method call discard vs del. I was initially confused by key/value being everywhere but I think I've got the hang of it.
tests/test_sqlDatabaseDict.py
Outdated
d = DatabaseDict.fromConfig(self.config, key=self.key, fields=self.fields, value=value) | ||
self.checkDatabaseDict(d, data) | ||
|
||
def testFromRegistry(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docstring.
tests/test_sqlDatabaseDict.py
Outdated
self.checkDatabaseDict(d, data) | ||
|
||
def testExtraFields(self): | ||
"""Test when there are fields not in the value or the key.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this docstring. x
and y
are allowed so I don't see what "ExtraFields" means. "Extra" implies to me you define a namedtuple
with a
and b
in it and try to store that when the DatabaseDict
only expects x
, y
and z
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is to check that it's okay if the table has additional fields the DatabaseDict is not expected to set or retrieve. I've renamed it to testExtraFieldsInTable
. I'll add a new testExtraFieldsInValue
method for the case you described.
tests/test_sqlDatabaseDict.py
Outdated
"""Test when the value includes all fields, including the key.""" | ||
value = namedtuple("TestValue", ["x", "y", "z"]) | ||
data = { | ||
0: value(x=0, y="zero", z=0.0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like a test where the datatypes don't match those used in self.fields
.
tests/test_sqlDatabaseDict.py
Outdated
0: value(x=0, y="zero", z=0.0), | ||
1: value(x=1, y="one", z=0.1), | ||
} | ||
d = registry.makeDatabaseDict(table="TestTable", key=self.key, fields=self.fields, value=value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this table a different name to ensure that nothing odd is happening relating to the config version?
a tuple thereof. | ||
value : `type` | ||
The type used for the dictionary's values, typically a `namedtuple`. | ||
Must have a ``_field`` class attribute that is a tuple of field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this _fields
rather than _field
? I found this all a bit confusing until I worked out that _fields
and _make
came from namedtuple
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, _fields
. I've updated the documentation to clarify where _fields
and _make
come from.
pass | ||
# If we fail due to an IntegrityError (i.e. duplicate primary key values), | ||
# try to do an update instead. | ||
kwds.discard(self._key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test for this? I think kwds
is a dict (OrderedDict
probably) and that does not have a discard()
method. I think this should be pop()
or del
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - yes, the behavior I was looking for would be kwds.pop(self._key, None)
, and it does need a test.
kwds = value._asdict() | ||
with self._engine.begin() as connection: | ||
try: | ||
if self._key not in self._value._fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be documented that if the key field is found in value, that the supplied key will be ignored? Should we test if key is supplied and it's different to the key field from value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this question illustrates that we don't actually need to support the case where the value tuple includes the key. I think it'd simplify the code and the tests to just reject that possibility at construction.
# try to do an update instead. | ||
kwds.discard(self._key) | ||
with self._engine.begin() as connection: | ||
connection.execute(self._updateSql, key=key, **kwds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it inconsistent that set could use the internal key from value but if that fails update is done here with the supplied key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason I'm glad I removed support for having the key duplicated in the value.
|
||
|
||
class DatabaseDict(MutableMapping): | ||
"""An abstract base class for dict-like objects backed by a database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"dict-like objects" implies something a bit simpler than what this generally is. It's not simple key/value pairs in a dict that are persisted to a database, it's more of a python model of a database table. I feel like we are close to reimplementing an astropy.table or afw table database serialization system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, "dict-like object" would be both simpler and more flexible than than what this. But from the API perspective I think the big difference is the strong types for the key and value, especially the namedtuple
values. The implementation in SqlDatabaseDict
is getting a bit towards a Python model of a database table, but the very limited dict-like lookup we support is still a lot simpler. I'll update the docstring with a focus on the typing and use of namedtuples
.
def __iter__(self): | ||
with self._engine.begin() as connection: | ||
for row in connection.execute(self._keysSql).fetchall(): | ||
yield row[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised that sqlalchemy doesn't have a built in iterator return. On the other hand from the docs for fetchall()
I have no idea what it really returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does have a built-in iterator that yields full rows, but I think it does need to be adapted as above to obtain an iterator over a single column from each row (even if the row only has one column).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've addressed all review comments. I also renamed the "fields" argument to "types", since its purpose is really to provide the types of the fields provided in the "key" and "value". That makes the docs a lot clearer.
Looks great. Much clearer now. Thank you. |
Primarily intended for Datastore's internal records.
811e165
to
56711b9
Compare
56711b9
to
8bf463c
Compare
The approach I've taken here is:
DatabaseDict
ABC that just provides a mechanism for constructing persistent dict-like objects from Config.internalRegistry
in PosixDatastore with DatabaseDict it constructs from config.Because
DatabaseDict
usesnamedtuple
objects for its value type (since, being backed by a database, that needs to be known up front), there's a bit of redundancy betweenStoredFileInfo
and the newnamedtuple
PosixDatastore.RecordTuple
(basically, they only differ on whether to hold aStorageClass
instance or name). I imagine we'll want to address that eventually, but just letting them both exist for now minimizes the changes toPosixDatastore
, and I think that's a good thing.