Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH ColumnTransformer.get_feature_names() handles passthrough #14048

Merged
merged 16 commits into from Apr 19, 2020
Merged

ENH ColumnTransformer.get_feature_names() handles passthrough #14048

merged 16 commits into from Apr 19, 2020

Commits on Jun 8, 2019

  1. Implemented get_feature_names() for ColumnTransformer to include 'pas…

    …sthough'
    
    Currently, if remainder='passthrough', then get_feature_names() will raise a NotImplementedError, but this pull request adds in that functionality. Now if the transformer is fit on a DataFrame, then the passthrough columns will appear in gte_feature_names() as the respective column names in the DataFrame, and if it is not a DataFrame then the column indices will be used instead.
    lrjball committed Jun 8, 2019
    Configuration menu
    Copy the full SHA
    c6e92bb View commit details
    Browse the repository at this point in the history

Commits on Jun 13, 2019

  1. Updated get_feature_names() to always return feature names.

    While making the changes for ‘passthrough’, it seems sensible to make an additional change to ColumnTransformer.get_feature_names(), to make it always returns something. This implementation adds feature names name__x0, …, name__xN for transformers without a get_feature_names method.
    
    It seems harsh to raise an error if there are some transformers which do have feature names, and even if none of them have a get_feature_names then it is still helpful to know which features came from which transformer.
    lrjball committed Jun 13, 2019
    Configuration menu
    Copy the full SHA
    76c8b66 View commit details
    Browse the repository at this point in the history

Commits on Jun 19, 2019

  1. Configuration menu
    Copy the full SHA
    728cc8a View commit details
    Browse the repository at this point in the history

Commits on Jun 26, 2019

  1. Implemented get_feature_names() for ColumnTransformer to include 'pas…

    …sthough'
    
    Currently, if remainder='passthrough', then get_feature_names() will raise a NotImplementedError, but this pull request adds in that functionality. Now if the transformer is fit on a DataFrame, then the passthrough columns will appear in gte_feature_names() as the respective column names in the DataFrame, and if it is not a DataFrame then the column indices will be used instead, where the feature names will be 'xi' for the ith index.
    lrjball committed Jun 26, 2019
    Configuration menu
    Copy the full SHA
    b9e3718 View commit details
    Browse the repository at this point in the history

Commits on Jul 26, 2019

  1. Configuration menu
    Copy the full SHA
    073848d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4cacd93 View commit details
    Browse the repository at this point in the history

Commits on Jul 28, 2019

  1. Updated the behaviour of get_feature_names to treat trans='passthroug…

    …h' the same way
    
    as remainder='passthrough'.
    
    The behaviour of get_feature_names for passthrough is now the following:
    - If fitted on a dataframe, then the columns passed with the trans='passthrough'
    will be treated as positional if int (in which case the feature name will be
    the column name at that position) or they will be used for the actual feature name.
    Any columns in remainder='passthrough' will be appended to the end of feature_names,
    as the column names from the fitted dataframe
    - If fitted on an array, then the column names for both when trans='passthrough' or
    when remainder='passthrough' will be 'xi' for each index value i. In terms of ordering,
    the remainder columns will again come after the rest of the feature names.
    lrjball committed Jul 28, 2019
    Configuration menu
    Copy the full SHA
    6e49c95 View commit details
    Browse the repository at this point in the history

Commits on Aug 18, 2019

  1. removed checks on scalar columns for passthrough

    Removed as this was waiting on PR #14495, which did not go ahead.
    lrjball committed Aug 18, 2019
    Configuration menu
    Copy the full SHA
    38f9db0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1b55757 View commit details
    Browse the repository at this point in the history

Commits on Mar 2, 2020

  1. Configuration menu
    Copy the full SHA
    e1c16ba View commit details
    Browse the repository at this point in the history
  2. Fixed missing import issue

    Fixed issue caused by missing import introduced when doing a merge in the browser. _check_key_type has been replaced with _determine_key_type.
    lrjball committed Mar 2, 2020
    Configuration menu
    Copy the full SHA
    76f47ac View commit details
    Browse the repository at this point in the history

Commits on Mar 3, 2020

  1. Update sklearn/compose/_column_transformer.py

    Co-Authored-By: Thomas J Fan <thomasjpfan@gmail.com>
    lrjball and thomasjpfan committed Mar 3, 2020
    Configuration menu
    Copy the full SHA
    07b9403 View commit details
    Browse the repository at this point in the history
  2. Separated pandas test into own function, and removed unused attribute.

    - Seperated the pandas part of the test into its own function to avoid the whole test being skipped when pandas is not installed.
    - Removed the unused _output_dims attribute.
    lrjball committed Mar 3, 2020
    Configuration menu
    Copy the full SHA
    762850e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d031673 View commit details
    Browse the repository at this point in the history

Commits on Mar 5, 2020

  1. Configuration menu
    Copy the full SHA
    319842e View commit details
    Browse the repository at this point in the history
  2. Added support for boolean masks and slices

    Added support for mask and slices for both dataframes and arrays, as well as tests for each case.
    lrjball committed Mar 5, 2020
    Configuration menu
    Copy the full SHA
    447a3ba View commit details
    Browse the repository at this point in the history