New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primitive refactor #364

Merged
merged 36 commits into from Jan 18, 2019

Conversation

Projects
None yet
2 participants
@kmax12
Copy link
Member

kmax12 commented Jan 7, 2019

This PR is separates the concept of a Primitive from a Feature. The internals of Featuretools change, but it is has minimal impact on the external API.

Compared to before, a primitive is now only aware of the data it take in. A feature is then defined by input variables and/or features, as well as the primitive that will be applied. Put another way, a feature takes the specific entities and variables of an entityset and the primitive to be applied as its input so the primitive doesn't have to work about it.

This has several advantages

  1. It is easier to unit test primitives. There is no need to have an entity set to test a primitive
  2. Primitive definitions become more reusable since they are not tied to the concept of entity set.
  3. Conceptually the user defining a primitive has to only think about input and output data which is just numpy arrays.

To give an example, here is how a feature is currently defined using the count primitive

from featuretools.primitive import Count
f = Count(es["logs"]["value"], parent_entity=es["customers"])

Now, you define the inputs to the feature and provide the primitive as an input.

import featuretools as ft
from featuretools.primitive import Count
f = ft.Feature(es["logs"]["value"], parent_entity=es["customers"], primitive=Count)

if a primitive has parameters it can be used like this

f = ft.Feature(es["logs"]["comment"], parent_entity=es["customers"], primitive=CountString(string="coffee"))

the API for calling DFS doesn't change with the exception of being able to provide primitive with arguments

ft.dfs(target_entity="customers",
       entityset=es,
       agg_primtives=["count", Sum],
       trans_primtiives=CountString(string="coffee"))
@codecov

This comment has been minimized.

Copy link

codecov bot commented Jan 7, 2019

Codecov Report

Merging #364 into master will increase coverage by 0.2%.
The diff coverage is 97.48%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #364     +/-   ##
=========================================
+ Coverage   95.33%   95.53%   +0.2%     
=========================================
  Files          86       89      +3     
  Lines        8032     7555    -477     
=========================================
- Hits         7657     7218    -439     
+ Misses        375      337     -38
Impacted Files Coverage Δ
featuretools/primitives/api.py 100% <ø> (ø) ⬆️
featuretools/wrappers/sklearn.py 95.65% <ø> (ø) ⬆️
featuretools/synthesis/dfs.py 100% <ø> (ø) ⬆️
featuretools/utils/pickle_utils.py 100% <ø> (ø) ⬆️
featuretools/selection/variance_selection.py 0% <ø> (ø) ⬆️
featuretools/synthesis/encode_features.py 98.03% <ø> (ø) ⬆️
featuretools/selection/selection.py 100% <ø> (ø) ⬆️
featuretools/synthesis/deep_feature_synthesis.py 93.52% <100%> (+0.06%) ⬆️
featuretools/feature_base/api.py 100% <100%> (ø)
...aturetools/tests/entityset_tests/test_timedelta.py 100% <100%> (ø) ⬆️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ffb8081...40946ba. Read the comment docs.

kmax12 and others added some commits Jan 7, 2019

Primitive refactor updates (#365)
* remove incorrect commutative attributes

* handle rsub override and test reverse overrides

* rename weekend primitive is_weekend

* updated weekend to is_weekend in docs

* test values for scalar_subtract_numeric

* rename subtract_numeric and scalar_subtract_numeric to subtract_numeric_feature and scalar_subtract_numeric_feature

* revert subtract_numeric_feature to subtract_numeric

kmax12 added some commits Jan 8, 2019

@kmax12 kmax12 changed the title [WIP] Primitive refactor Primitive refactor Jan 16, 2019

@kmax12 kmax12 requested a review from rwedge Jan 16, 2019

Args:
entity (Entity): entity this feature is being calculated for
base_featres (list[FeatureBase]): list of base features for primitive
primitive (): primitive to calculate. if not initilized when passed, gets initialized with no arguments

This comment has been minimized.

@rwedge

rwedge Jan 17, 2019

Contributor

missing the type information for primitive

# if self.use_previous and self.use_previous.is_absolute():
# entity = self.entity
# time_var = IdentityFeature(entity[entity.time_index])
# deps += [time_var]

This comment has been minimized.

@rwedge

rwedge Jan 17, 2019

Contributor

This has been commented out for a while, I think we should either remove it or make an issue about it.

This comment has been minimized.

@kmax12

kmax12 Jan 18, 2019

Author Member

the time index automatically gets added, so we don't need to put the time index feature as a dependent. removed


base_entity = set([f.entity for f in base_features])
assert len(base_entity) == 1, \
"More than one entity for base features"

This comment has been minimized.

@rwedge

rwedge Jan 17, 2019

Contributor

There's potentially two checks in this init for whether the base features share the same entity.

This comment has been minimized.

@kmax12

kmax12 Jan 18, 2019

Author Member

good catch. fixed

Show resolved Hide resolved featuretools/tests/computational_backend/test_calculate_feature_matrix.py Outdated
Show resolved Hide resolved featuretools/tests/computational_backend/test_pandas_backend.py Outdated
Show resolved Hide resolved featuretools/tests/computational_backend/test_pandas_backend.py Outdated
Show resolved Hide resolved featuretools/feature_base/feature_base.py Outdated
Show resolved Hide resolved featuretools/feature_base/feature_base.py Outdated
Show resolved Hide resolved featuretools/feature_base/feature_base.py Outdated
Show resolved Hide resolved featuretools/feature_base/feature_base.py Outdated
seed_feature_sessions = Count(es['log']["id"], es['sessions']) > 2
seed_feature_log = Hour(es['log']['datetime'])
session_agg = Last(seed_feature_log, es['sessions'])
seed_feature_sessions = ft.Feature(es['log']["id"], parent_entity=es['sessions'], primitive=Count)

This comment has been minimized.

@rwedge

rwedge Jan 17, 2019

Contributor

I don't think it changes the test but seed_feature_sessions is missing the > 2

This comment has been minimized.

@kmax12

kmax12 Jan 18, 2019

Author Member

ya, i noticed that. i dont think adding the >2 improved the test so i simplified it

@rwedge

rwedge approved these changes Jan 18, 2019

@rwedge rwedge merged commit 36ce3c3 into master Jan 18, 2019

3 checks passed

codecov/patch 97.48% of diff hit (target 95.33%)
Details
codecov/project 95.53% (+0.2%) compared to ffb8081
Details
license/cla Contributor License Agreement is signed.
Details

@rwedge rwedge referenced this pull request Jan 30, 2019

Merged

v0.6.0 #387

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment