New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uses full entity update for dependencies of uses_full_entity features #110

Merged
merged 5 commits into from Mar 14, 2018

Conversation

Projects
None yet
3 participants
@bschreck
Contributor

bschreck commented Mar 13, 2018

This PR fixes two issues related to dependencies of uses_full_entity features.

Issue 1:

If:

a feature X does not have the uses_full_entity attribute
a dependent feature Y does have the uses_full_entity attribute
another dependent feature Z does not have the uses_full_entity attribute, and is one of the requested output features
Then the output frame as a result of computing this feature X should be placed in both entity_frames, and large_entity_frames in pandas_backend.py. However, we previously only check if the feature X itself is a requested output feature, not if it has a dependent that is an output feature (and not a uses_full_entity feature). This means the output of X was only placed in large_entity_frames. The computation of feature Z then failed because it depended on X being in entity_frames.

Issue 2:
To make sure we don't have overlapping columns when we concatenate the output frame as a result of computing a feature with the existing entity_frames, we drop duplicated columns in feature computation output frame (called result_frame in the code). However, we removed these columns in-place, such that if we have to do 2 concats (placing the output frame in both entity_frames and large_entity_frames, it's possible that we remove some computed features that wouldn't get placed in the second output frame.

It is much cleaner to explicitly label each feature with the input frame is should be given, and which output frames its result should be placed in. This fixes issue 2 nicely

I wrote a test case that checks for both of these conditions.

@bschreck bschreck requested a review from kmax12 Mar 13, 2018

bschreck added some commits Mar 13, 2018

@codecov-io

This comment has been minimized.

codecov-io commented Mar 14, 2018

Codecov Report

Merging #110 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #110      +/-   ##
==========================================
+ Coverage   88.21%   88.24%   +0.03%     
==========================================
  Files          73       73              
  Lines        7393     7412      +19     
==========================================
+ Hits         6522     6541      +19     
  Misses        871      871
Impacted Files Coverage Δ
...turetools/computational_backends/pandas_backend.py 89.27% <100%> (+0.03%) ⬆️
...eaturetools/computational_backends/feature_tree.py 99.31% <100%> (+0.02%) ⬆️
.../feature_function_tests/test_transform_features.py 87.84% <100%> (+0.2%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b16155...b1b1107. Read the comment docs.

@kmax12

This comment has been minimized.

Member

kmax12 commented Mar 14, 2018

Looks great! Merging

@kmax12 kmax12 merged commit 6c9a011 into master Mar 14, 2018

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
license/cla Contributor License Agreement is signed.
Details

@rwedge rwedge referenced this pull request Mar 21, 2018

Merged

Release v0.1.19 #114

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment