Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-14225: Make PosixDatastore's internal records persistent #35

Merged
merged 3 commits into from May 2, 2018

Conversation

TallJimbo
Copy link
Member

The approach I've taken here is:

  • Add a DatabaseDict ABC that just provides a mechanism for constructing persistent dict-like objects from Config.
  • Add a SQL implementation thereof, and a way to get an instance of that SQL-backed dict from a Registry.
  • Replace the internalRegistry in PosixDatastore with DatabaseDict it constructs from config.

Because DatabaseDict uses namedtuple objects for its value type (since, being backed by a database, that needs to be known up front), there's a bit of redundancy between StoredFileInfo and the new namedtuple PosixDatastore.RecordTuple (basically, they only differ on whether to hold a StorageClass instance or name). I imagine we'll want to address that eventually, but just letting them both exist for now minimizes the changes to PosixDatastore, and I think that's a good thing.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there needs to be a few more tests. I've made some comments about that and I think there is a bad method call discard vs del. I was initially confused by key/value being everywhere but I think I've got the hang of it.

d = DatabaseDict.fromConfig(self.config, key=self.key, fields=self.fields, value=value)
self.checkDatabaseDict(d, data)

def testFromRegistry(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring.

self.checkDatabaseDict(d, data)

def testExtraFields(self):
"""Test when there are fields not in the value or the key."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this docstring. x and y are allowed so I don't see what "ExtraFields" means. "Extra" implies to me you define a namedtuple with a and b in it and try to store that when the DatabaseDict only expects x, y and z.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is to check that it's okay if the table has additional fields the DatabaseDict is not expected to set or retrieve. I've renamed it to testExtraFieldsInTable. I'll add a new testExtraFieldsInValue method for the case you described.

"""Test when the value includes all fields, including the key."""
value = namedtuple("TestValue", ["x", "y", "z"])
data = {
0: value(x=0, y="zero", z=0.0),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like a test where the datatypes don't match those used in self.fields.

0: value(x=0, y="zero", z=0.0),
1: value(x=1, y="one", z=0.1),
}
d = registry.makeDatabaseDict(table="TestTable", key=self.key, fields=self.fields, value=value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this table a different name to ensure that nothing odd is happening relating to the config version?

a tuple thereof.
value : `type`
The type used for the dictionary's values, typically a `namedtuple`.
Must have a ``_field`` class attribute that is a tuple of field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this _fields rather than _field? I found this all a bit confusing until I worked out that _fields and _make came from namedtuple.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, _fields. I've updated the documentation to clarify where _fields and _make come from.

pass
# If we fail due to an IntegrityError (i.e. duplicate primary key values),
# try to do an update instead.
kwds.discard(self._key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test for this? I think kwds is a dict (OrderedDict probably) and that does not have a discard() method. I think this should be pop() or del.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - yes, the behavior I was looking for would be kwds.pop(self._key, None), and it does need a test.

kwds = value._asdict()
with self._engine.begin() as connection:
try:
if self._key not in self._value._fields:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be documented that if the key field is found in value, that the supplied key will be ignored? Should we test if key is supplied and it's different to the key field from value?

Copy link
Member Author

@TallJimbo TallJimbo May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this question illustrates that we don't actually need to support the case where the value tuple includes the key. I think it'd simplify the code and the tests to just reject that possibility at construction.

# try to do an update instead.
kwds.discard(self._key)
with self._engine.begin() as connection:
connection.execute(self._updateSql, key=key, **kwds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it inconsistent that set could use the internal key from value but if that fails update is done here with the supplied key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason I'm glad I removed support for having the key duplicated in the value.



class DatabaseDict(MutableMapping):
"""An abstract base class for dict-like objects backed by a database.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"dict-like objects" implies something a bit simpler than what this generally is. It's not simple key/value pairs in a dict that are persisted to a database, it's more of a python model of a database table. I feel like we are close to reimplementing an astropy.table or afw table database serialization system.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, "dict-like object" would be both simpler and more flexible than than what this. But from the API perspective I think the big difference is the strong types for the key and value, especially the namedtuple values. The implementation in SqlDatabaseDict is getting a bit towards a Python model of a database table, but the very limited dict-like lookup we support is still a lot simpler. I'll update the docstring with a focus on the typing and use of namedtuples.

def __iter__(self):
with self._engine.begin() as connection:
for row in connection.execute(self._keysSql).fetchall():
yield row[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised that sqlalchemy doesn't have a built in iterator return. On the other hand from the docs for fetchall() I have no idea what it really returns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does have a built-in iterator that yields full rows, but I think it does need to be adapted as above to obtain an iterator over a single column from each row (even if the row only has one column).

Copy link
Member Author

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I've addressed all review comments. I also renamed the "fields" argument to "types", since its purpose is really to provide the types of the fields provided in the "key" and "value". That makes the docs a lot clearer.

@timj
Copy link
Member

timj commented May 2, 2018

Looks great. Much clearer now. Thank you.

@TallJimbo TallJimbo merged commit 37f97aa into master May 2, 2018
@ktlim ktlim deleted the tickets/DM-14225 branch August 25, 2018 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants