Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drop_first attribute added in encode features #647

Merged
merged 10 commits into from Jul 14, 2019

Conversation

@ayushpatidar
Copy link
Contributor

commented Jul 8, 2019

Pull Request Description

drop_first attribute is added as param in encode_feature() function.
This feature will help user to control on redundant feature in dataframe

Fixes #635

After creating the pull request: in order to pass the changelog_updated check you will need to update the "Future Release" section of docs/source/changelog.rst to include this pull request.

@CLAassistant

This comment has been minimized.

Copy link

commented Jul 8, 2019

CLA assistant check
All committers have signed the CLA.

@codecov

This comment has been minimized.

Copy link

commented Jul 8, 2019

Codecov Report

❗️ No coverage uploaded for pull request base (master@6da8c8b). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #647   +/-   ##
=========================================
  Coverage          ?   97.42%           
=========================================
  Files             ?      118           
  Lines             ?     9560           
  Branches          ?        0           
=========================================
  Hits              ?     9314           
  Misses            ?      246           
  Partials          ?        0
Impacted Files Coverage Δ
featuretools/synthesis/encode_features.py 98.43% <100%> (ø)
...aturetools/tests/synthesis/test_encode_features.py 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6da8c8b...c57b7fa. Read the comment docs.

@kmax12

This comment has been minimized.

Copy link
Member

commented Jul 8, 2019

Thank you for the contribution, @ayushpatidar.

A couple things

  1. can you sign the CLA? (you may have to do it a second time for this PR).
  2. Please add a test case for this new parameter to https://github.com/Featuretools/featuretools/blob/master/featuretools/tests/synthesis/test_encode_features.py

Once you are passing all the github checks, we can review.

@kmax12
Copy link
Member

left a comment

Left a few comments. Once addressed, i believe this is ready to merge!

@@ -23,6 +23,9 @@ def encode_features(feature_matrix, features, top_n=10, include_unknown=True,
defaults to encode all necessary features.
inplace (bool): Encode feature_matrix in place. Defaults to False.
verbose (str): Print progress info.
drop_first (bool): Whether to get l-1 dummies out of l categorical

This comment has been minimized.

Copy link
@kmax12

kmax12 Jul 9, 2019

Member

let's use k instead of l

unique = val_counts.head(top_n).index.tolist()
select_n = top_n
if drop_first:
if len(val_counts) != 1 or top_n != 1:

This comment has been minimized.

Copy link
@kmax12

kmax12 Jul 9, 2019

Member

what if we remove the conditional and just do these two lines?

select_n = min(len(val_counts), top_n) 
select_n = max(select_n - 1, 1)  # make sure at least 1 category is selected

This comment has been minimized.

Copy link
@ayushpatidar

ayushpatidar Jul 11, 2019

Author Contributor

@kmax12
But we have to keep outer conditional statement ( if drop_first ).

This comment has been minimized.

Copy link
@kmax12

kmax12 Jul 11, 2019

Member

yep, sorry, I was just referring to inner conditional

es.entity_from_dataframe(entity_id='a', dataframe=df, index='index', make_index=True)
features, feature_defs = dfs(entityset=es, target_entity='a')
features_enc, feature_defs_enc = encode_features(features, feature_defs,
drop_first=True, top_n=10, include_unknown=False)

This comment has been minimized.

Copy link
@kmax12

kmax12 Jul 9, 2019

Member

no need to include top_n in this test case

ayushpatidar and others added 3 commits Jul 14, 2019
@kmax12
kmax12 approved these changes Jul 14, 2019
Copy link
Member

left a comment

Looks good to me. Thanks for the contribution!

@kmax12 kmax12 merged commit eb86522 into Featuretools:master Jul 14, 2019

4 checks passed

codecov/patch No report found to compare against
Details
codecov/project No report found to compare against
Details
license/cla Contributor License Agreement is signed.
Details
test_all_python_versions Workflow: test_all_python_versions
Details
@rwedge rwedge referenced this pull request Aug 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.