Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-28636: Make data ID expansion in ingest sometimes optional. #502

Merged
merged 1 commit into from Apr 6, 2021

Conversation

TallJimbo
Copy link
Member

This would be better with a bit more work, but it already speeds up gen2to3 conversion (with direct or in-place) transfers a lot.

dataIds=groupForType.keys(),
run=run,
# TODO: ask Datastore if it needs expansion instead.
expand=(transfer not in ("direct", None)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would possibly be better to ask datastore the question as to whether it needs expanded dataIds or not given the transfer mode, rather than assuming in butler level but I'm sure we can sort this out later.

Copy link
Member Author

@TallJimbo TallJimbo Apr 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about I add this to base Datastore and use it here and in obs_base:

def needs_expanded_data_ids(
    self,
    transfer: Optional[str],
    entity: Optional[Union[DatasetRef, DatasetType, StorageClass]] = None,
) -> bool:
    """Test whether this datastore needs expanded data IDs to ingest the given entity.

    Parameters
    ----------
    transfer : `str` or `None`
        Transfer mode for ingest.
    entity, optional
        Object representing what will be ingested.  If not provided (or less informative),
        `True` may be returned even if expanded data IDs aren't necessary.

    Returns
    -------
    needed : `bool`
        If `True`, expanded data IDs may be needed.  `False` only if expansion
        definitely isn't necessary.
    """
    return True

and use (transfer not in ("direct", None)) as the initial implementation in FileDatastore?

Copy link
Member

@timj timj Apr 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work. I like how you added in a representative entity just in case we need to look up in a datastore config file.

Copy link
Member Author

@TallJimbo TallJimbo Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also added implementations for InMemoryDatastore (return False) and ChainedDatastore, and the latter is worth a look - I don't think it's actually able to forward that entity argument on to the child datastores while still responding conservatively, which is a bit unfortunate but (I think) not a big deal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chained implementation is fine I think.

@TallJimbo TallJimbo merged commit b3ce6e2 into master Apr 6, 2021
@TallJimbo TallJimbo deleted the tickets/DM-28636 branch April 6, 2021 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants