New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved handling of integer time index #128

Merged
merged 27 commits into from Apr 6, 2018

Conversation

Projects
None yet
3 participants
@rwedge
Contributor

rwedge commented Apr 2, 2018

Addresses #99 and #125.

  • Time values in cutoff_time that are not numeric or datetime will raise an error
  • If target entity has an integer time index, feature matrix will also have integer time index
  • Extra columns in cutoff_time will now pass through correctly when using integer time index
  • Incompatible time index and cutoff time types will raise an error
@codecov-io

This comment has been minimized.

codecov-io commented Apr 2, 2018

Codecov Report

Merging #128 into master will increase coverage by 0.35%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
+ Coverage   88.47%   88.83%   +0.35%     
==========================================
  Files          73       73              
  Lines        7558     7730     +172     
==========================================
+ Hits         6687     6867     +180     
+ Misses        871      863       -8
Impacted Files Coverage Δ
featuretools/tests/testing_utils/mock_ds.py 87.02% <100%> (+1.07%) ⬆️
featuretools/variable_types/variable.py 79.06% <100%> (+0.24%) ⬆️
...computational_backends/calculate_feature_matrix.py 98.63% <100%> (+0.04%) ⬆️
...utational_backend/test_calculate_feature_matrix.py 99.11% <100%> (+0.18%) ⬆️
...aturetools/tests/entityset_tests/test_pandas_es.py 99.76% <100%> (+0.03%) ⬆️
featuretools/tests/dfs_tests/test_dfs_method.py 98.07% <100%> (+0.03%) ⬆️
featuretools/entityset/entity.py 79.37% <100%> (+1.3%) ⬆️
featuretools/utils/wrangle.py 53.5% <100%> (+3.04%) ⬆️
featuretools/entityset/base_entityset.py 91.16% <100%> (+0.04%) ⬆️
featuretools/entityset/entityset.py 90.26% <100%> (-0.03%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 59fc551...4fa9090. Read the comment docs.

if isinstance(group[target_time].iloc[0], _numeric_types):
grouped = [[np.inf, group]]
else:
grouped = [[datetime.now(), group]]

This comment has been minimized.

@kmax12

kmax12 Apr 3, 2018

Member

should we move this datetime.now() outside of calc_results. basically, we want every chunk that's part of a single cfm call to use the same time.

@@ -591,25 +591,19 @@ def _filter_and_sort(self, df, time_last=None,
"""
if self.time_index:
if time_last is not None and not df.empty:

This comment has been minimized.

@kmax12

kmax12 Apr 3, 2018

Member

is it possible for time last to be none by this point in the code? If not, let's remove

This comment has been minimized.

@rwedge

rwedge Apr 4, 2018

Contributor

Entity.get_all_instances and a few others call this without specifying a time last

@@ -110,6 +113,11 @@ def calculate_feature_matrix(features, cutoff_time=None, instance_ids=None,
if not isinstance(cutoff_time, pd.DataFrame):
if cutoff_time is None:
cutoff_time = datetime.now()
# if integer time index, use max value as cutoff time instead
if target_entity.time_index:

This comment has been minimized.

@kmax12

kmax12 Apr 4, 2018

Member

the target entity might not have a time index, but some other entity in the entityset might. perhaps it makes sense to add a method or attribute to an entityset that returns if it is a datetime or numeric time index.

we should assume all entities in an entityset are using the same units for the time index, so this method could do the check when being initialized or when set_time_index is called.

@@ -291,6 +291,11 @@ class TimeIndex(Variable):
_dtype_repr = "time_index"
class NumericTimeIndex(TimeIndex, Numeric):

This comment has been minimized.

@kmax12

kmax12 Apr 5, 2018

Member

can you add to API reference in documentation?

@kmax12

This comment has been minimized.

Member

kmax12 commented Apr 6, 2018

Looks great. Merging

@kmax12 kmax12 changed the title from Integer time index bugfixes to Improved handling of integer time index Apr 6, 2018

@kmax12 kmax12 merged commit 906777b into master Apr 6, 2018

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
license/cla Contributor License Agreement is signed.
Details

@rwedge rwedge referenced this pull request Apr 13, 2018

Merged

Release v0.1.20 #131

rwedge added a commit that referenced this pull request Apr 13, 2018

Release v0.1.20 (#131)
**v0.1.20** Apr 13, 2018
* Improved chunking when calculating feature matrices  (#121)
* Primitives as strings in DFS parameters (#129)
* Integer time index bugfixes (#128)
* Add make_temporal_cutoffs utility function (#126)
* Show all entities, switch shape display to row/col (#124)
* fixed num characters nan fix (#118)
* modify ignore_variables docstring (#117)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment