Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-17611: performance optimizations for data ID manipulations #126

Merged
merged 7 commits into from Feb 2, 2019

Conversation

TallJimbo
Copy link
Member

No description provided.

Callers of toNameSet didn't actually care that they got a
DimensionNameSet back; they just wanted something with .names, and
frequently the object they passed in would have already qualified.
In that case, we now just return that object directly, and toNameSet
has been renamed to conformSet to reflect that new behavior.
Doing this repeatedly for every data ID with a certain set of keys
was wasting a lot of time.
This separates operation unique to DimensionGraph construction and
make those a bit easier to follow and control.  This will be useful
for later commits that try to limit the time spent in DimensionGraph
construction via caching.
We only need the per-DatasetType data IDs expanded, because no one
ever looks at the extra metadata associated with the row-wide one.

In the future we should investigate whether we can get rid of the
row-wide data ID entirely, but it does currently appear to be used in
pipe.base.GraphBuilder.
@TallJimbo
Copy link
Member Author

This change was reviewed on DM-17496; I'm just moving it here to separate it from unrelated ongoing work on that ticket.

@TallJimbo TallJimbo merged commit 5d5c9dd into master Feb 2, 2019
@TallJimbo TallJimbo deleted the tickets/DM-17611 branch February 2, 2019 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant