Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor get_pandas_data_slice to take single entity #547

Merged
merged 9 commits into from May 24, 2019

Conversation

Projects
None yet
3 participants
@CJStadler
Copy link
Contributor

commented May 17, 2019

Instead of getting the data for all entities at once. This hypothetically could reduce memory usage because we only need to have the data for one entity at a time in memory. It also makes the code
somewhat clearer (no more nested dictionary).

Refactor get_pandas_data_slice to take single entity
Instead of getting the data for all entities at once. This
hypothetically could reduce memory usage because we only need to have
the data for one entity at a time in memory. It also makes the code
somewhat clearer (no more nested dictionary).
@codecov

This comment has been minimized.

Copy link

commented May 17, 2019

Codecov Report

Merging #547 into master will decrease coverage by <.01%.
The diff coverage is 98.59%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #547      +/-   ##
==========================================
- Coverage   96.35%   96.35%   -0.01%     
==========================================
  Files         116      116              
  Lines        9246     9234      -12     
==========================================
- Hits         8909     8897      -12     
  Misses        337      337
Impacted Files Coverage Δ
...turetools/computational_backends/pandas_backend.py 98.05% <100%> (-0.01%) ⬇️
...computational_backends/calculate_feature_matrix.py 98.31% <100%> (-0.02%) ⬇️
featuretools/tests/entityset_tests/test_es.py 100% <100%> (ø) ⬆️
featuretools/entityset/entityset.py 94.86% <95%> (-0.08%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0fc1c00...a826220. Read the comment docs.

@kmax12 kmax12 requested a review from rwedge May 17, 2019

CJStadler added some commits May 20, 2019

CJStadler added some commits May 23, 2019

Remove unused verbose parameters
- In get_pandas_data_slice it was not used.
- It was used in calculate_all_features but the only caller does not
pass it so it will always be False.

@CJStadler CJStadler requested a review from rwedge May 23, 2019

@rwedge rwedge requested a review from kmax12 May 23, 2019

@@ -84,9 +84,6 @@ def calculate_all_features(self, instance_ids, time_last,

target_entity = self.entityset[self.target_eid]

if not target_entity.df.index.isin(instance_ids).any():
return self.generate_default_df(instance_ids=instance_ids)

This comment has been minimized.

Copy link
@CJStadler

CJStadler May 23, 2019

Author Contributor

Should we leave this as an optimization, since we can catch this case early?

This comment has been minimized.

Copy link
@rwedge

rwedge May 23, 2019

Contributor

I think it depends on the time necessary to perform this check. How does the run time change as the index grows / we add more missing instances? This code will be run for each unique time in cutoff_time. Quickly catching if every instance is not present in the data might not be worth optimizing for.

This comment has been minimized.

Copy link
@CJStadler

CJStadler May 23, 2019

Author Contributor

Good point, I didn't think about the cost of the check itself. Since it should be a rare case it seems unlikely to be beneficial, and adds complexity.

@kmax12

kmax12 approved these changes May 24, 2019

Copy link
Member

left a comment

LGTM

@rwedge

rwedge approved these changes May 24, 2019

@CJStadler CJStadler merged commit 8c0d698 into master May 24, 2019

4 checks passed

codecov/patch 98.59% of diff hit (target 96.35%)
Details
codecov/project Absolute coverage decreased by -<.01% but relative coverage increased by +2.23% compared to 0fc1c00
Details
license/cla Contributor License Agreement is signed.
Details
test_all_python_versions Workflow: test_all_python_versions
Details

@CJStadler CJStadler deleted the refactor-get-data-slice branch May 24, 2019

@rwedge rwedge referenced this pull request Jun 19, 2019

Merged

v0.9.0 #610

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.