Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare entities by ID in DFS #637

Merged
merged 13 commits into from Jul 8, 2019

Conversation

@CJStadler
Copy link
Contributor

commented Jul 3, 2019

This is much faster than fully comparing the entities.

Compare entities by ID in DFS
This is much faster than fully comparing the entities.
@codecov

This comment has been minimized.

Copy link

commented Jul 3, 2019

Codecov Report

Merging #637 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #637      +/-   ##
==========================================
- Coverage   97.43%   97.42%   -0.02%     
==========================================
  Files         118      118              
  Lines        9538     9539       +1     
==========================================
  Hits         9293     9293              
- Misses        245      246       +1
Impacted Files Coverage Δ
featuretools/feature_base/feature_base.py 97.63% <100%> (ø) ⬆️
featuretools/entityset/entity.py 96.09% <100%> (-0.41%) ⬇️
featuretools/variable_types/variable.py 98.23% <100%> (+0.02%) ⬆️
featuretools/entityset/relationship.py 98.68% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a990858...39c3819. Read the comment docs.

CJStadler added 3 commits Jul 3, 2019
@@ -108,7 +108,8 @@ def _is_unique(self):
"""Is there any other relationship with same parent and child entities?"""
es = self.child_entity.entityset
relationships = es.get_forward_relationships(self._child_entity_id)
n = len([r for r in relationships if r.parent_entity == self.parent_entity])
n = len([r for r in relationships
if r._parent_entity_id == self._parent_entity_id])

This comment has been minimized.

Copy link
@CJStadler

CJStadler Jul 3, 2019

Author Contributor

This case wasn't causing the observed performance drop, but I also don't see any reason to compare the actual entities here.

@CJStadler

This comment has been minimized.

Copy link
Contributor Author

commented Jul 3, 2019

Even though __eq__ wasn't being called with deep=True before, so it wasn't comparing dataframes, the comparison of variables is non-trivial (quadratic in the number of variables).

for v in self.variables:
if v not in other.variables:
return False

CJStadler added 4 commits Jul 3, 2019
for v in self.variables:
if v not in other.variables:
return False
if set(self.variables) != set(other.variables):

This comment has been minimized.

Copy link
@CJStadler

CJStadler Jul 3, 2019

Author Contributor

I removed most of the places where this would be hit, but I thought I might as well make this comparison faster while we're looking at it.

@kmax12
Copy link
Member

left a comment

LGTM

@@ -14,6 +14,7 @@ Changelog
* Keep dataframe sorted by time during feature calculation (:pr:`626`)
* Fix bug in encode_features that created duplicate columns of
features with multiple outputs (:pr:`622`)
* Fix performance regression in DFS (:pr:`637`)

This comment has been minimized.

Copy link
@kmax12

kmax12 Jul 7, 2019

Member

move to Future Release

CJStadler added 2 commits Jul 8, 2019
@kmax12
Copy link
Member

left a comment

LGTM

@kmax12
kmax12 approved these changes Jul 8, 2019

@CJStadler CJStadler merged commit 4add6be into master Jul 8, 2019

4 checks passed

codecov/patch 100% of diff hit (target 97.43%)
Details
codecov/project Absolute coverage decreased by -0.01% but relative coverage increased by +2.56% compared to a990858
Details
license/cla Contributor License Agreement is signed.
Details
test_all_python_versions Workflow: test_all_python_versions
Details

@CJStadler CJStadler deleted the compare-entity-ids branch Jul 8, 2019

@rwedge rwedge referenced this pull request Aug 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.