New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-13842: Implement access to DatasetStorage table #22
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Some minor comments and tweaks.
|
||
from .utils import slotValuesAreEqual | ||
|
||
__all__ = ("StorageInfo") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need the ,
to stop this being a str
.
|
||
class StorageInfo: | ||
"""Represents storage information. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameters
docstring required. Need to know the type of md5
and the units of size
, what do we mean by a datastore name?
def __init__(self, datastoreName, md5=None, size=None): | ||
assert isinstance(datastoreName, str) | ||
self._datastoreName = datastoreName | ||
assert md5 is None or isinstance(md5, str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using md5 because it is ubiquitous? Should it be datastore specific or must it be defined by gen3 butler to be md5? blake2 is meant to be as fast as md5 but better (and is part of python3.6). Does it need to be md5 because @MichelleGower requires that from NCSA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using md5
because that is what @TallJimbo put in the schema description. But it could/should probably be checksum
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no preference as to what kind of checksum we use. Probably best to make the name generic.
with self._engine.begin() as connection: | ||
connection.execute(datasetStorageTable.update().where(and_( | ||
datasetStorageTable.c.dataset_id == ref.id, | ||
datasetStorageTable.c.datastore_name == datastoreName)).values( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by the relationship between datastoreName
and storageInfo.datastoreName
. This may partly be because I'm not sure what a datastore name really is. In this code you update the row that has the supplied datastoreName
by changing it to be the name stored in the storage info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datastoreName
is the identifier of the particular Datastore
used to save the file (i.e. we haven't decided what that should be yet). The difference between datastoreName
and storageInfo.datastoreName
in update
is that we may want to change the datastoreName
to a new one for an existing entry. Thus we need to supply both the current datastore name (part of the primary key) to look it up (this is datastoreName
) and the new one which is part of storageInfo
just as everything else we want to update is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "we haven't decided yet" I mean that I don't really know off the top of my head. But @TallJimbo probably has some Idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment in the schema just says: "Name of the Datastore this entry corresponds to.".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registry definitely needs some kind of identifier for each associated Datastore. I added a string column for this to the Registry schema for this and called it datastore_name
in SQL, but I would have no objection to making it an integer (assuming we have an approach for assigning them) or changing the name.
6ab6a87
to
b99012f
Compare
d512d10
to
65e9eb5
Compare
Will be rebased to
master
after DM-13840 is merged.