Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add relationship_path to agg and direct features #544

Merged
merged 24 commits into from May 29, 2019

Conversation

Projects
None yet
2 participants
@CJStadler
Copy link
Contributor

commented May 16, 2019

This adds an optional constructor parameter, and stores the relationship_path on the feature. If no relationship_path is passed to the constructor we search for a path between the target entity and base entity. If none or more than one is found we raise an error. When there are more than one possible paths between entities the feature name will include the full path.

Also add parent_name and child_name to Relationship.

This does not update any other logic to use the paths, as this will be added in later commits.

See #543 for context.

@codecov

This comment has been minimized.

Copy link

commented May 16, 2019

Codecov Report

Merging #544 into master will increase coverage by 0.11%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #544      +/-   ##
==========================================
+ Coverage   96.35%   96.46%   +0.11%     
==========================================
  Files         116      117       +1     
  Lines        9234     9511     +277     
==========================================
+ Hits         8897     9175     +278     
+ Misses        337      336       -1
Impacted Files Coverage Δ
...eaturetools/computational_backends/feature_tree.py 100% <ø> (ø) ⬆️
...etools/tests/entityset_tests/test_serialization.py 100% <ø> (ø) ⬆️
...tests/computational_backend/test_pandas_backend.py 100% <ø> (ø) ⬆️
featuretools/synthesis/deep_feature_synthesis.py 96.76% <100%> (ø) ⬆️
featuretools/entityset/entityset.py 94.97% <100%> (+0.11%) ⬆️
.../tests/primitive_tests/test_features_serializer.py 100% <100%> (ø) ⬆️
featuretools/feature_base/features_deserializer.py 98.07% <100%> (+0.11%) ⬆️
featuretools/entityset/deserialize.py 100% <100%> (ø) ⬆️
featuretools/entityset/relationship.py 97.67% <100%> (+2.43%) ⬆️
featuretools/entityset/serialize.py 100% <100%> (ø) ⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8c0d698...d86b9dc. Read the comment docs.

@CJStadler CJStadler force-pushed the feature-relationship-paths branch from 6c9b644 to 79a3e41 May 16, 2019

@kmax12 kmax12 self-requested a review May 16, 2019

CJStadler added some commits May 13, 2019

Add relationship_path to agg and direct features
This adds an optional constructor parameter, and stores the
relationship_path on the feature. If no relationship_path is passed to
the constructor we search for a path between the target entity and base
entity. If none or more than one is found we raise an error.  When there
are more than one possible paths between entities the feature name will
include the full path.

Also add parent_name and child_name to Relationship.

This does not update any other logic to use the paths, as this will be
added in later commits.
Fix infinite loop in depth first search
Since we need to detect nodes to which there are multiple paths, we only
want to stop recursion the second time we visit a node.
Check entity equality using names
Comparing the entities directly is very expensive, and this was
significantly slowing down dfs.
Set relationship_path on FeatureBase
So that the caller can access it without needing to check the feature
type.

@CJStadler CJStadler force-pushed the feature-relationship-paths branch from bb5d1c1 to 3d087ea May 20, 2019

@kmax12
Copy link
Member

left a comment

the overall approach looks good. I left a few comments about methods that were implemented that look similar to stuff we already have. In particular, I wonder if there is overlap with the path find logic and the methods we already have on EntitySet.

Show resolved Hide resolved featuretools/tests/computational_backend/test_pandas_backend.py Outdated
Show resolved Hide resolved featuretools/entityset/relationship.py Outdated
Show resolved Hide resolved featuretools/entityset/relationship.py
Show resolved Hide resolved featuretools/feature_base/feature_base.py Outdated
Show resolved Hide resolved featuretools/feature_base/feature_base.py
Show resolved Hide resolved featuretools/feature_base/feature_base.py Outdated
Show resolved Hide resolved featuretools/feature_base/feature_base.py Outdated
Show resolved Hide resolved featuretools/entityset/relationship.py

CJStadler added some commits May 22, 2019

Update features JSON schema version
And add logic for detecting old version.
Consolidate relationship serialization
- Remove description_to_relationship and relationship_to_description.
- Rename get_arguments to to_dictionary.
Replace find_forward_path with find_forward_paths
And the same for the backward version.

The new methods are generators which yield each path. This allows us to
stop searching if we only need one path.

Note that we ignore paths with cycles, meaning that a path may be said
to be unique if the only other paths contain cycles.

@CJStadler CJStadler requested a review from kmax12 May 22, 2019

Show resolved Hide resolved featuretools/feature_base/feature_base.py
Show resolved Hide resolved featuretools/feature_base/feature_base.py Outdated
Show resolved Hide resolved featuretools/feature_base/feature_base.py Outdated
Show resolved Hide resolved featuretools/feature_base/feature_base.py Outdated
Show resolved Hide resolved featuretools/primitives/standard/aggregation_primitives.py Outdated
Show resolved Hide resolved featuretools/tests/entityset_tests/test_es_metadata.py Outdated
Show resolved Hide resolved featuretools/feature_base/feature_base.py Outdated
Show resolved Hide resolved featuretools/entityset/entityset.py Outdated

CJStadler added some commits May 28, 2019

@CJStadler CJStadler requested a review from kmax12 May 28, 2019

Update feature copy to use path
These were previously overlooked, and untested.
@kmax12
Copy link
Member

left a comment

Left one minor comment, but otherwise I think this looks good!

Replace if with else
This is equivalent, but makes the flow clearer.
@kmax12

kmax12 approved these changes May 29, 2019

Copy link
Member

left a comment

Looks great. Go ahead and merge!

@CJStadler CJStadler merged commit ecd722c into master May 29, 2019

4 checks passed

codecov/patch 100% of diff hit (target 96.35%)
Details
codecov/project 96.46% (+0.11%) compared to 8c0d698
Details
license/cla Contributor License Agreement is signed.
Details
test_all_python_versions Workflow: test_all_python_versions
Details

@CJStadler CJStadler deleted the feature-relationship-paths branch May 29, 2019

@rwedge rwedge referenced this pull request Jun 19, 2019

Merged

v0.9.0 #610

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.