Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache feature names #536

Merged
merged 3 commits into from May 9, 2019

Conversation

Projects
None yet
3 participants
@CJStadler
Copy link
Contributor

commented May 9, 2019

Computing the name of a feature can be expensive because primitives use
reflection to generate their names, and when features are stacked the
number of calls to get_name() grows. This commit caches the name when it
is computed the first time. Is it reasonable to assume that features are
immutable, or are there places where we should invalidate the cache?

With max_depth=2 this sped up ft.dfs (just building the features) by
about 3x, and the improvement is greater for higher values of max_depth.

Cache feature names
Computing the name of a feature can be expensive because primitives use
reflection to generate their names, and when features are stacked the
number of calls to get_name() grows. This commit caches the name when it
is computed the first time.

With max_depth=2 this sped up ft.dfs (just building the features) by
about 3x, and the improvement is greater for higher values of max_depth.
@codecov

This comment has been minimized.

Copy link

commented May 9, 2019

Codecov Report

Merging #536 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #536   +/-   ##
=======================================
  Coverage   96.26%   96.26%           
=======================================
  Files         114      114           
  Lines        9258     9258           
=======================================
  Hits         8912     8912           
  Misses        346      346
Impacted Files Coverage Δ
featuretools/feature_base/feature_base.py 96.95% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a7fcbd8...7e1fbfe. Read the comment docs.

@kmax12

This comment has been minimized.

Copy link
Member

commented May 9, 2019

I can't think of any reason the name would have to be invalidated since there are no public methods on features that allow you to change the underlying pieces that are used to generate a name. @rwedge can you think of anything?

@rwedge

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

I can't think of any reason the name would have to be invalidated since there are no public methods on features that allow you to change the underlying pieces that are used to generate a name. @rwedge can you think of anything?

I can't think of anything

if self._name is not None:
return self._name
return self.generate_name()
if not self._name:

This comment has been minimized.

Copy link
@CJStadler

CJStadler May 9, 2019

Author Contributor

@rwedge I realized that _name is defined as a class variable, but when we generate the name we then create an instance variable shadowing the class variable. This seems more confusing to me than putting self._name = None in the initializer, or is it a python idiom that I'm not familiar with?

This comment has been minimized.

Copy link
@kmax12

kmax12 May 9, 2019

Member

you're right. we should move to self._name = None in the init

CJStadler added some commits May 9, 2019

@kmax12

kmax12 approved these changes May 9, 2019

Copy link
Member

left a comment

LGTM

@CJStadler CJStadler merged commit d399898 into master May 9, 2019

4 checks passed

codecov/patch 100% of diff hit (target 96.26%)
Details
codecov/project 96.26% (+0%) compared to a7fcbd8
Details
license/cla Contributor License Agreement is signed.
Details
test_all_python_versions Workflow: test_all_python_versions
Details

@CJStadler CJStadler deleted the cache-feature-names branch May 9, 2019

@rwedge rwedge referenced this pull request May 17, 2019

Merged

v0.8.0 #548

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.