Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-24288: Do disassembly inside datastore #269

Merged
merged 8 commits into from Apr 30, 2020
Merged

DM-24288: Do disassembly inside datastore #269

merged 8 commits into from Apr 30, 2020

Conversation

timj
Copy link
Member

@timj timj commented Apr 29, 2020

No description provided.

@timj timj marked this pull request as ready for review April 30, 2020 00:20
@timj timj requested a review from TallJimbo April 30, 2020 00:20
Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few line comments, and you can feel free to defer any that seem involved to other tickets.

Bigger picture (and definitely out of scope for now), the path of reading and writing internal records through the various intermediate base classes is feeling pretty twisty now, and I think we may be approaching the point where we need to do some refactoring. I don't know what the solution is, but GenericBaseDatastore in particular now adds a ton of abstract methods and indirection while providing very little nontrivial logic to its subclasses.

# Add Registry Dataset entry. If not a virtual composite, add
# and attach components at the same time.
dataId = self.registry.expandDataId(dataId, graph=datasetType.dimensions, **kwds)
ref, = self.registry.insertDatasets(datasetType, run=run, dataIds=[dataId],
producer=producer, recursive=not isVirtualComposite)
producer=producer, recursive=False) # not isVirtualComposite)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relic comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I wasn't brave enough to disable it completely on the first go.

checksum=record["checksum"],
file_size=record["file_size"])

def getStoredItemsInfo(self, ref):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have both a getStoredItemsInfo and a getStoredItemInfo? Seems like one of those ought to be renamed or removed, though this seems preexisting so maybe it's out of scope for this ticket.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plural is the new one. The old one has to stay around until export and getUri are fixed.

# Use the path to determine the location
return [(self.locationFactory.fromPath(r.path), r) for r in records]

def _can_remove_dataset_artifact(self, ref, location):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this function only exists for cases like DECam raws, where there are multiple top-level datasets associated with the same artifact, not anything to do with composites?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was there previously (I think somewhere else) and is for DECam-like things but also to do with composites when they are stored in the registry. If you wanted to delete a component I had to make sure you didn't delete the file if there were multiple components associated with one file.

if parentRef is not None:
# Should already conform and we know no components
return DatasetRef(storage.datasetType, parentRef.dataId, id=parentRef.id,
run=parentRef.run, conform=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned that there's no way to tell from the DatasetRef itself that the ID refers to the parent. Maybe that concern just goes away when we drop in-Registry disassembly/components, but for now I think it might be good to add a field to DatasetRef to indicate when it's in this state. I suspect we'd also find that the existence of that flag might simplify logic elsewhere, and in the future (if in-Registry disassembly is gone) we could turn that field into a property that delegates to whether the DatasetType is a component.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you drop registry-composites then findDatasets should immediately drop the component name from the search and assume it's present. That would stop the "oh, you asked for a component but I know nothing about the components". You want some kind of "isParent" property?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say isComponent, but I suspect we're talking about the same meaning and just differing on whether the subject of the name is the ID or the dataset type.

Anyhow, maybe it would just make more sense to rip out Registry components sooner rather than later. I do think we'll want that related-but-not-the-same baggage gone when we try to actually support rawplus anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was saying "this may look like a component DatasetRef but the ID comes from the parent". Whereas I'm not sure "isComponent" is as explicit. You can tell it's a component by looking at ref.datasetType, what you can't tell is that there is no ID directly associated with this DatasetRef.

Now add components into the composite DatasetRef since this
is currently how registry gives that information to datastore.
This failure was previously masked by datastore tests not
properly emulating registry.
* Add component name to internal datastore records
* Allow a component to be retrieved even if
  registry does not know about component.
* Allow registry to find datasets by looking
  at the parent of a component.
* Do disassembly and assembly in datastore
* Disable deletion of components
  (if registry doesn't know about them and files
   may or may not exist what could it possibly do?)
The code itself is still there for now but is turned off.
This indicates that the DatasetRef was not directly associated
with a component in the registry but is associated with the
composite.
Also add some tests.
@timj timj merged commit 0d85109 into master Apr 30, 2020
@timj timj deleted the tickets/DM-24288 branch April 30, 2020 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants