Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-13842: Implement access to DatasetStorage table #22

Merged
merged 2 commits into from Mar 20, 2018

Conversation

pschella
Copy link
Collaborator

Will be rebased to master after DM-13840 is merged.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Some minor comments and tweaks.


from .utils import slotValuesAreEqual

__all__ = ("StorageInfo")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need the , to stop this being a str.


class StorageInfo:
"""Represents storage information.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters docstring required. Need to know the type of md5 and the units of size, what do we mean by a datastore name?

def __init__(self, datastoreName, md5=None, size=None):
assert isinstance(datastoreName, str)
self._datastoreName = datastoreName
assert md5 is None or isinstance(md5, str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using md5 because it is ubiquitous? Should it be datastore specific or must it be defined by gen3 butler to be md5? blake2 is meant to be as fast as md5 but better (and is part of python3.6). Does it need to be md5 because @MichelleGower requires that from NCSA?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using md5 because that is what @TallJimbo put in the schema description. But it could/should probably be checksum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no preference as to what kind of checksum we use. Probably best to make the name generic.

with self._engine.begin() as connection:
connection.execute(datasetStorageTable.update().where(and_(
datasetStorageTable.c.dataset_id == ref.id,
datasetStorageTable.c.datastore_name == datastoreName)).values(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the relationship between datastoreName and storageInfo.datastoreName. This may partly be because I'm not sure what a datastore name really is. In this code you update the row that has the supplied datastoreName by changing it to be the name stored in the storage info?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datastoreName is the identifier of the particular Datastore used to save the file (i.e. we haven't decided what that should be yet). The difference between datastoreName and storageInfo.datastoreName in update is that we may want to change the datastoreName to a new one for an existing entry. Thus we need to supply both the current datastore name (part of the primary key) to look it up (this is datastoreName) and the new one which is part of storageInfo just as everything else we want to update is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "we haven't decided yet" I mean that I don't really know off the top of my head. But @TallJimbo probably has some Idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment in the schema just says: "Name of the Datastore this entry corresponds to.".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registry definitely needs some kind of identifier for each associated Datastore. I added a string column for this to the Registry schema for this and called it datastore_name in SQL, but I would have no objection to making it an integer (assuming we have an approach for assigning them) or changing the name.

@pschella pschella changed the base branch from tickets/DM-13840 to master March 20, 2018 16:14
@pschella pschella merged commit 6028092 into master Mar 20, 2018
@ktlim ktlim deleted the tickets/DM-13842 branch August 25, 2018 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants