New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check duplicate rows cutoff times #276

Merged
merged 3 commits into from Oct 4, 2018

Conversation

Projects
None yet
3 participants
@WillKoehrsen
Contributor

WillKoehrsen commented Oct 3, 2018

Addresses #275 by raising assertion error if there are duplicated rows in cutoff_time dataframe.

@codecov-io

This comment has been minimized.

codecov-io commented Oct 3, 2018

Codecov Report

Merging #276 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #276      +/-   ##
==========================================
+ Coverage   94.45%   94.45%   +<.01%     
==========================================
  Files          71       71              
  Lines        7700     7704       +4     
==========================================
+ Hits         7273     7277       +4     
  Misses        427      427
Impacted Files Coverage Δ
...computational_backends/calculate_feature_matrix.py 97.03% <100%> (+0.01%) ⬆️
...utational_backend/test_calculate_feature_matrix.py 99.28% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update da09b20...f41cdfc. Read the comment docs.

@@ -141,7 +141,8 @@ def calculate_feature_matrix(features, entityset=None, cutoff_time=None, instanc
cutoff_time = pd.DataFrame(map_args, columns=['instance_id', 'time'])
else:
cutoff_time = cutoff_time.reset_index(drop=True)
assert (cutoff_time.duplicated().sum() == 0), \

This comment has been minimized.

@kmax12

kmax12 Oct 3, 2018

Member

Let's make this check more specific. it should check if the instance id and time columns are duplicated regardless of the pass through columns being different.

@WillKoehrsen

This comment has been minimized.

Contributor

WillKoehrsen commented Oct 4, 2018

Updated check to make specific for instance_id and time.

@kmax12

This comment has been minimized.

Member

kmax12 commented Oct 4, 2018

Looks good. Merging

@kmax12 kmax12 merged commit 0e93f2a into master Oct 4, 2018

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
license/cla Contributor License Agreement is signed.
Details

@gsheni gsheni deleted the check_duplicate_rows_cutoff_times branch Oct 24, 2018

@rwedge rwedge referenced this pull request Oct 31, 2018

Merged

v0.4.0 #304

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment