Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-21451: Remove DatabaseDict and vectorize Datastore/Butler ingest #197

Merged
merged 10 commits into from Oct 25, 2019

Conversation

TallJimbo
Copy link
Member

No description provided.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good in general. My main concern is that I'm not entirely clear we have a story for cases where some files are not ingested due to constraints. Previously this triggered DatasetTypeNotSupporterdError but I can't see where that happens now (maybe it does and I've missed it).

"""Add an opaque (to the `Registry`) table for use by a `Datastore` or
other data repository client.

Opaque table record can be added via `insertOpaqueData`, retreived via
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: retreived


# A "move" is sometimes a "copy"
moveIsCopy = False
if transfer == "move" and os.path.isabs(path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You decided against keeping this logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. At first I was just going to move it around, but it was a lot messier after the split into two methods, and it wasn't clear to me we had a real need for it.

notAcceptedCounter = 0
# Filter down to just datasets the chained datastore's own
# configuration accepts.
okForParent: List[FileDataset] = [dataset for dataset in datasets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If okForParent is empty shouldn't we raise DatasetTypeNotSupportedError ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising that exception (in ingest) is now done exclusively by Datastore.ingest, which will check that the set of datasets actually ingested matches the set of datasets that the caller asked to be ingested and raise if that isn't true. That's because we need _prepIngest to still return the list of datasets it can ingest even when it can't ingest all of them, and it can't do that if it raises.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that now. Thanks.

if moveIsCopy:
dstransfer = "copy"
if constraints is not None:
okForChild: List[FileDataset] = [dataset for dataset in okForParent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned that we've lost any visibility into whether any of the supplied datasets were ingested or not. Don't we worry if a dataset was rejected by all the child datastores because of constraints? As written it seems we worry about transfer modes but not constraints. If I ask 10 datasets to be ingested and only 9 are accepted how do I tell that? Previously ingest worked if a dataset was accepted by at least one datastore (and then we had the issue of deciding what to do if it was only accepted by an ephemeral datastore).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's all still true (see reply at #197 (comment)). Unlike the constraint-based rejections, transfer-mode rejections mean that all datasets would be rejected by the nested datastore, so we don't have to return anything and we can still use the raise-and-catch logic that was in place before.

# Docstring inherited from GenericBaseDatastore
records = list(self.registry.fetchOpaqueData(self._tableName, dataset_id=ref.id))
if len(records) == 0:
raise KeyError("Unable to retrieve formatter associated with Dataset {}".format(ref.id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that error message should be referring to a formatter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just moved it here from 797a3df#diff-4b3b7a0376839e94233cd3a2bddd157bL112, but I think I agree. "Unable to retrieve location associated with Dataset {}" instead?

# Docstring inherited from Datastore._prepIngest.
filtered = []
for dataset in datasets:
if not self.constraints.isAcceptable(dataset.ref):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again how do we know that some of the datasets were not ingested?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think this is already answered above)

# self.registry internally. Probably need to add
# transactions to DatabaseDict to do better than that.
self.addStoredItemInfo(ref, itemInfo)
for ref, itemInfo in zip(refs, itemInfos):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think zip is going to complain if the number of refs is not equal to the number of infos. Would it be safer to accept a sequence of tuples instead?

TallJimbo and others added 9 commits October 24, 2019 14:05
Previously if a single test created two datastores they shared
a registry.
We're about to start using it for ingest, so the old name no longer
made sense.
This is safer than zipping over two sequences, which doesn't complain
if they have different lengths, and no harder for callers to provide.
@TallJimbo TallJimbo merged commit 80af466 into master Oct 25, 2019
@TallJimbo TallJimbo deleted the tickets/DM-21451 branch October 25, 2019 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants