Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take care of the long tail of attributes in init #157

Merged
merged 1 commit into from
Sep 15, 2020

Conversation

timokau
Copy link
Collaborator

@timokau timokau commented Sep 7, 2020

Description

After removing some big sets (#154, #155), this PR is supposed to take care of the "long tail" of failures of the check_no_attributes_set_in_init sklearn estimator check.

This is a work in progress. Builds on #154, #155 which should be reviewed first.

Also see the commit message:

Most of these attributes are actually initialized during fit and the
values do not need to be set in init. To conform the the scikit-learn
estimator API, we should not set them in init and postfix values that
are learned from the data with an underscore.

This takes care of most of the remaining attributes in init. The
remaining attributes are due to construct_model being called in
__init__, which cannot currently be easily fixed. A workaround for
that will follow.

Motivation and Context

Trying to meet the scikit-learn estimator requirements.

How Has This Been Tested?

Work in progress.

Does this close/impact existing issues?

Related to #94, #116.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.

@@ -170,6 +172,9 @@ def fit(
logger.info(
"Fitting utility function finished. Start tuning threshold."
)
self.threshold_ = self._tune_threshold(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was a bug that this was previously missing. Without this line, the threshold would always be 0.5 as set in __init__. The bug was revealed when I removed this default value from __init__. @kiudee am I right here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(All the other classes follow the pattern of using _tune_threshold here.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably easier to see after #155 is done and the diff is cleaned up a bit.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this seems to be a bug.

@timokau
Copy link
Collaborator Author

timokau commented Sep 8, 2020

Rebased with #155 in master.

csrank/core/fate_network.py Outdated Show resolved Hide resolved
@timokau
Copy link
Collaborator Author

timokau commented Sep 10, 2020

This PR is ready for review. There are two unresolved questions that I have commented on inline. After this is done, there are just a handful of "no parameters in init" failures remaining:

E       AssertionError: Estimator <class 'csrank.choicefunction.fate_choice.FATEChoiceFunction'> should not set any attribute apart from parameters during init. Found attributes ['joint_layers', 'kernel_regularizer_', 'optimizer_', 'scorer', 'set_layer'].
E       AssertionError: Estimator <class 'csrank.discretechoice.fate_discrete_choice.FATEDiscreteChoiceFunction'> should not set any attribute apart from parameters during init. Found attributes ['joint_layers', 'kernel_regularizer_', 'optimizer_', 'scorer', 'set_layer'].
E       AssertionError: Estimator <class 'csrank.objectranking.fate_object_ranker.FATEObjectRanker'> should not set any attribute apart from parameters during init. Found attributes ['joint_layers', 'kernel_regularizer_', 'optimizer_', 'scorer', 'set_layer'].

All of these have the same cause: The FATE core is calling _construct_layers in __init__. It is not trivial to work around that right now, since the core's construct_layers has to be called before fitand there is no easy way to do that except copying the call into all thefit` functions.

I plan to fix that by renaming all our fit functions to _fit and then create a wrapper fit in learner.py that first calls something like _fit_init and then _fit. That way we would be able to inherit pre-fit stateful initialization. The wrapper should pass on the documentation. I will do that in a separate PR, since its largely unrelated to the changes here and I think it is easier to review them separately.

@timokau timokau changed the title [WIP] Take care of the long tail of attributes in init Take care of the long tail of attributes in init Sep 15, 2020
@timokau
Copy link
Collaborator Author

timokau commented Sep 15, 2020

Rebased to test the new CI. Still ready for review / feedback.

kiudee
kiudee previously approved these changes Sep 15, 2020
Copy link
Owner

@kiudee kiudee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

@@ -170,6 +172,9 @@ def fit(
logger.info(
"Fitting utility function finished. Start tuning threshold."
)
self.threshold_ = self._tune_threshold(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this seems to be a bug.

csrank/core/fate_network.py Outdated Show resolved Hide resolved
Most of these attributes are actually initialized during fit and the
values do not need to be set in init. To conform the the scikit-learn
estimator API, we should not set them in init and postfix values that
are learned from the data with an underscore.

This takes care of most of the remaining attributes in init. The
remaining attributes are due to `construct_model` being called in
`__init__`, which cannot currently be easily fixed. A workaround for
that will follow.
@timokau
Copy link
Collaborator Author

timokau commented Sep 15, 2020

The two threads are now resolved. I only changed the way the is_variadic parameter (which is no longer a parameter) is handled. Please have another look, I think this is good to go now.

@timokau timokau merged commit bf138c5 into kiudee:master Sep 15, 2020
@timokau timokau deleted the misc-init-attributes branch September 15, 2020 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants