Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary sorting from normalize_entity #535

Merged
merged 3 commits into from May 9, 2019

Conversation

Projects
None yet
2 participants
@CJStadler
Copy link
Contributor

commented May 9, 2019

We can assume the base entity dataframe is sorted by time index, so if
we are using the same time index for the new entity then there is no
need to sort.

Remove unnecessary sorting from normalize_entity
We can assume the base entity dataframe is sorted by time index, so if
we are using the same time index for the new entity then there is no
need to sort.
@@ -730,7 +730,7 @@ def normalize_entity(self, base_entity_id, new_entity_id, index,

if isinstance(make_time_index, str):
base_time_index = make_time_index
new_entity_time_index = base_entity[make_time_index].id

This comment has been minimized.

Copy link
@CJStadler

CJStadler May 9, 2019

Author Contributor

The only reason I can see for this is to check that the make_time_index exists in the base entity (if it doesn't then this would raise). If that should be checked here then I think it would be better to add an explicit assertion.

This comment has been minimized.

Copy link
@rwedge

rwedge May 9, 2019

Contributor

Yes let's add an explicit assertion

This comment has been minimized.

Copy link
@CJStadler

CJStadler May 9, 2019

Author Contributor

It looks like this is already checked when the entity is created:

if time_index is not None and time_index not in df.columns:
raise LookupError('Time index not found in dataframe')

Is it worthwhile to add an earlier check?

This comment has been minimized.

Copy link
@rwedge

rwedge May 9, 2019

Contributor

Let's split this off to a separate issue to be dealt with later. Looks like there are two situations where we could improve the error message:

  • Specifying an existing column for make_time_index but not adding that column to additional_variables or copy_variables
  • Setting make_time_index to a column that doesn't exist in the base frame

This comment has been minimized.

Copy link
@CJStadler

CJStadler May 9, 2019

Author Contributor

👍 I can open an issue.

@codecov

This comment has been minimized.

Copy link

commented May 9, 2019

Codecov Report

Merging #535 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #535      +/-   ##
==========================================
+ Coverage   96.26%   96.26%   +<.01%     
==========================================
  Files         114      114              
  Lines        9253     9258       +5     
==========================================
+ Hits         8907     8912       +5     
  Misses        346      346
Impacted Files Coverage Δ
featuretools/entityset/entityset.py 95.08% <100%> (+0.02%) ⬆️
featuretools/tests/entityset_tests/test_es.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b6a5a4...ba30d54. Read the comment docs.

Correct already_sorted
If make_time_index=True then the base time index will be used, so the df
is already sorted, but it will be given a new name.
@@ -790,6 +792,7 @@ def normalize_entity(self, base_entity_id, new_entity_id, index,
new_entity_id,
new_entity_df,
index,
already_sorted=already_sorted,

This comment has been minimized.

Copy link
@rwedge

rwedge May 9, 2019

Contributor

Can we update the normalize_entityset tests to confirm that created time indexes are sorted

@CJStadler CJStadler requested a review from rwedge May 9, 2019

@rwedge

rwedge approved these changes May 9, 2019

Copy link
Contributor

left a comment

Looks good

@CJStadler CJStadler merged commit a7fcbd8 into master May 9, 2019

4 checks passed

codecov/patch 100% of diff hit (target 96.26%)
Details
codecov/project 96.26% (+<.01%) compared to 4b6a5a4
Details
license/cla Contributor License Agreement is signed.
Details
test_all_python_versions Workflow: test_all_python_versions
Details

@CJStadler CJStadler deleted the remove-normalize-sort branch May 9, 2019

@rwedge rwedge referenced this pull request May 17, 2019

Merged

v0.8.0 #548

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.