Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/ohe refactor #267

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Feature/ohe refactor #267

wants to merge 11 commits into from

Conversation

Chip2916
Copy link
Collaborator

@Chip2916 Chip2916 commented Jul 3, 2024

PR to resolve following issues:

Bring OHE transformer testing setup in line here
Alphabetise minimum attribute dict here

Nothing unusual needed to be done for the OHE testing steup, drop_original mixin already exists, so really just removing redundant implementation tests and removing tests which are covered by the generic test suite. Added a single type test for the separator argument as this didn't exist.


x = OneHotEncodingTransformer(columns="b")
class TestTransform(DropOriginalTransformMixinTests, GenericTransformTests):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this wants to inherit from test_BaseNominalTransformer.GenericBaseNominalTransformerTests (which in other parallel PRS #259 #261 I have renamed to GenericNominalTransformTests because the name is currently a bit confusing and not following the pattern set elsewhere)

Copy link
Collaborator Author

@Chip2916 Chip2916 Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey David - might need to discuss this further.

On the 'test_not_fitted_error_raised' which is inherited from 'GenericBaseNominalTransformerTests' the test fails for OHE as trying to call transform without fitting throws the following error for OHE 'AttributeError: 'OneHotEncodingTransformer' object has no attribute 'categories_''' - I came across this with some other tests but added in a fit method to resolve.

I think this is because in 'sklearn/preprocessing/_encoders.py' the '_BaseEncoder' class creates an empty list for self.categories during fit, which is called in transform. I think the Ordinal encoder would also fall over on this but don't think we've bought it up to speed in the new framework.

Also the 'test_exception_raised' passes due to OHE not actually requiring a mappings attribute. I know your off today so we can catch up next week!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Options here are to either fix the code to generate the same error as the test is expecting or to overwrite the test (i.e. create a test in the child test class with the same name that just contains pass, which will then supersede the test from the parent).

For test_not_fitted_error_raised I think we should add in a check that the transformer is fitted which should be caught before the call to sklearn.

For the other one, I suggest we overwrite the test. Note this test has been given a more descriptive name in #259 test_non_mappable_rows_exception_raised

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David - have addressed these comments here.

In terms of the 'test_exception_raised' (which I know has been renamed in another PR so will need to be resolved when merging) I have overwritten the test as suggested.

After our conversation in relation to 'test_not_fitted_error_raised ' I have used the check_is_fitted wrapper we have created for the sklearn.utils check_is_fitted function to check the attribute '_categories' exists before the .transform method is called. '_categories' is created by the Sklearn base encoder so will only be there if our call to the OHE transformer is successfully fit.

As per our conversation, this has introduced a dependency on the sklearn OHE implementation. Think we are both agreed that this is something we would rather catch so that if OHE is changed, we get notification rather than a change of behaviour not highlighted. Added this as a note to the changelog also.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed 'test_exception_raised' here - didn't realise new naming convention was already in main!

@davidhopkinson26
Copy link
Contributor

reminder to update changelog :)

x = OneHotEncodingTransformer(columns="b")

mocker.patch("sklearn.preprocessing.OneHotEncoder.fit")
def test_separator_column_not_str_error(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No probs!

@@ -319,6 +330,60 @@ def create_MeanResponseTransformer_test_df_unseen_levels():
return df


def create_OneHotEncoderTransformer_test_df_1():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S: If these are specific to the OneHotEncoderTransformer tests I think it might be better to create them in that test module.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey David - there are many functions in here which seem to be specific to only one test module e.g. 'create_datediff_test_df', 'create_date_test_df', 'create_NearestMeanResponseImputer_test_df'. Would you prefer that I just move the OHE specific functions here and then create a fast follow ticket to move the others? Or I could do them all as part of this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please do that. I think that fixtures should probably be co-located with tests in the scope they serve.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dealt with that in this commit. If a fixture is used in just a single test have moved it to that test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants