Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve duplicate features check in DFS #538

Merged
merged 6 commits into from May 10, 2019

Conversation

Projects
None yet
3 participants
@CJStadler
Copy link
Contributor

commented May 9, 2019

The previous version becomes very slow as the number of features
increases. Because it looked up the count for every element its run time
was quadratic. The implementation in this commit should be linear.

Is this check even necessary though?

CJStadler added some commits May 9, 2019

Improve duplicate features check in DFS
The previous version becomes very slow as the number of features
increases. Because it looked up the count for every element its run time
was quadratic. The implementation in this commit should be linear.
@codecov

This comment has been minimized.

Copy link

commented May 9, 2019

Codecov Report

Merging #538 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #538      +/-   ##
==========================================
- Coverage   96.26%   96.26%   -0.01%     
==========================================
  Files         114      114              
  Lines        9258     9256       -2     
==========================================
- Hits         8912     8910       -2     
  Misses        346      346
Impacted Files Coverage Δ
featuretools/synthesis/deep_feature_synthesis.py 96.75% <100%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f8a94f...2cfe24e. Read the comment docs.

@rwedge

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

Logic seems good.

Is this check even necessary though?

@kmax12 thoughts? Does the hash check in DeepFeatureSynthesis._handle_feature make this check redundant?

@kmax12

This comment has been minimized.

Copy link
Member

commented May 10, 2019

@rwedge i believe you're right. looks like _handle_new_feature won't add a duplicate feature and all features get added by calling that, so this check is unnecessary.

@rwedge rwedge self-requested a review May 10, 2019

@rwedge rwedge dismissed their stale review May 10, 2019

Jumped the gun

CJStadler and others added some commits May 10, 2019

@rwedge
Copy link
Contributor

left a comment

Looks good

CJStadler added some commits May 10, 2019

Import filter and object from builtin
For possible compatibility issues
Merge branch 'duplicate-features-check' of github.com:Featuretools/fe…
…aturetools into duplicate-features-check
@rwedge

rwedge approved these changes May 10, 2019

@CJStadler CJStadler merged commit 429adb0 into master May 10, 2019

4 checks passed

codecov/patch 100% of diff hit (target 96.26%)
Details
codecov/project Absolute coverage decreased by -<.01% but relative coverage increased by +3.73% compared to 8f8a94f
Details
license/cla Contributor License Agreement is signed.
Details
test_all_python_versions Workflow: test_all_python_versions
Details

@CJStadler CJStadler deleted the duplicate-features-check branch May 10, 2019

@rwedge rwedge referenced this pull request May 17, 2019

Merged

v0.8.0 #548

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.