Take care of the long tail of attributes in init #157

timokau · 2020-09-07T16:53:10Z

Description

After removing some big sets (#154, #155), this PR is supposed to take care of the "long tail" of failures of the check_no_attributes_set_in_init sklearn estimator check.

~~This is a work in progress. Builds on #154, #155 which should be reviewed first.~~

Also see the commit message:

Most of these attributes are actually initialized during fit and the
values do not need to be set in init. To conform the the scikit-learn
estimator API, we should not set them in init and postfix values that
are learned from the data with an underscore.

This takes care of most of the remaining attributes in init. The
remaining attributes are due to construct_model being called in
__init__, which cannot currently be easily fixed. A workaround for
that will follow.

Motivation and Context

Trying to meet the scikit-learn estimator requirements.

How Has This Been Tested?

Work in progress.

Does this close/impact existing issues?

Related to #94, #116.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.

timokau · 2020-09-08T15:04:38Z

csrank/choicefunction/cmpnet_choice.py

@@ -170,6 +172,9 @@ def fit(
                logger.info(
                    "Fitting utility function finished. Start tuning threshold."
                )
+                self.threshold_ = self._tune_threshold(


I think it was a bug that this was previously missing. Without this line, the threshold would always be 0.5 as set in __init__. The bug was revealed when I removed this default value from __init__. @kiudee am I right here?

(All the other classes follow the pattern of using _tune_threshold here.)

This is probably easier to see after #155 is done and the diff is cleaned up a bit.

Yes, this seems to be a bug.

timokau · 2020-09-08T15:58:50Z

Rebased with #155 in master.

csrank/core/fate_network.py

timokau · 2020-09-10T12:48:34Z

This PR is ready for review. There are two unresolved questions that I have commented on inline. After this is done, there are just a handful of "no parameters in init" failures remaining:

E       AssertionError: Estimator <class 'csrank.choicefunction.fate_choice.FATEChoiceFunction'> should not set any attribute apart from parameters during init. Found attributes ['joint_layers', 'kernel_regularizer_', 'optimizer_', 'scorer', 'set_layer'].
E       AssertionError: Estimator <class 'csrank.discretechoice.fate_discrete_choice.FATEDiscreteChoiceFunction'> should not set any attribute apart from parameters during init. Found attributes ['joint_layers', 'kernel_regularizer_', 'optimizer_', 'scorer', 'set_layer'].
E       AssertionError: Estimator <class 'csrank.objectranking.fate_object_ranker.FATEObjectRanker'> should not set any attribute apart from parameters during init. Found attributes ['joint_layers', 'kernel_regularizer_', 'optimizer_', 'scorer', 'set_layer'].

All of these have the same cause: The FATE core is calling _construct_layers in __init__. It is not trivial to work around that right now, since the core's construct_layers has to be called before fitand there is no easy way to do that except copying the call into all thefit` functions.

I plan to fix that by renaming all our fit functions to _fit and then create a wrapper fit in learner.py that first calls something like _fit_init and then _fit. That way we would be able to inherit pre-fit stateful initialization. The wrapper should pass on the documentation. I will do that in a separate PR, since its largely unrelated to the changes here and I think it is easier to review them separately.

timokau · 2020-09-15T12:58:50Z

Rebased to test the new CI. Still ready for review / feedback.

kiudee

Looks good to me 👍

kiudee · 2020-09-15T13:05:14Z

csrank/choicefunction/cmpnet_choice.py

@@ -170,6 +172,9 @@ def fit(
                logger.info(
                    "Fitting utility function finished. Start tuning threshold."
                )
+                self.threshold_ = self._tune_threshold(


Yes, this seems to be a bug.

csrank/core/fate_network.py

csrank/discretechoice/generalized_nested_logit.py

Most of these attributes are actually initialized during fit and the values do not need to be set in init. To conform the the scikit-learn estimator API, we should not set them in init and postfix values that are learned from the data with an underscore. This takes care of most of the remaining attributes in init. The remaining attributes are due to `construct_model` being called in `__init__`, which cannot currently be easily fixed. A workaround for that will follow.

timokau · 2020-09-15T16:43:19Z

The two threads are now resolved. I only changed the way the is_variadic parameter (which is no longer a parameter) is handled. Please have another look, I think this is good to go now.

timokau force-pushed the misc-init-attributes branch from 987610a to 838770f Compare September 8, 2020 15:00

timokau commented Sep 8, 2020

View reviewed changes

timokau force-pushed the misc-init-attributes branch from 838770f to 0f57c22 Compare September 8, 2020 15:58

timokau force-pushed the misc-init-attributes branch from c13b58f to 456905a Compare September 10, 2020 12:39

timokau commented Sep 10, 2020

View reviewed changes

csrank/core/fate_network.py Outdated Show resolved Hide resolved

timokau force-pushed the misc-init-attributes branch from 456905a to ba27c64 Compare September 15, 2020 12:58

timokau changed the title ~~[WIP] Take care of the long tail of attributes in init~~ Take care of the long tail of attributes in init Sep 15, 2020

kiudee previously approved these changes Sep 15, 2020

View reviewed changes

timokau dismissed kiudee’s stale review via ced7673 September 15, 2020 16:40

timokau force-pushed the misc-init-attributes branch from ba27c64 to ced7673 Compare September 15, 2020 16:40

timokau mentioned this pull request Sep 15, 2020

Investigate GeneralizedNestedLogit threshold #158

Open

kiudee approved these changes Sep 15, 2020

View reviewed changes

timokau merged commit bf138c5 into kiudee:master Sep 15, 2020

timokau deleted the misc-init-attributes branch September 15, 2020 17:16

timokau mentioned this pull request Sep 16, 2020

Move stateful initialization to a pre_fit function #159

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take care of the long tail of attributes in init #157

Take care of the long tail of attributes in init #157

timokau commented Sep 7, 2020 •

edited

Loading

timokau Sep 8, 2020

timokau Sep 8, 2020

timokau Sep 8, 2020

kiudee Sep 15, 2020

timokau commented Sep 8, 2020

timokau commented Sep 10, 2020

timokau commented Sep 15, 2020

kiudee left a comment

kiudee Sep 15, 2020

timokau commented Sep 15, 2020

Take care of the long tail of attributes in init #157

Take care of the long tail of attributes in init #157

Conversation

timokau commented Sep 7, 2020 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

Does this close/impact existing issues?

Types of changes

Checklist:

timokau Sep 8, 2020

Choose a reason for hiding this comment

timokau Sep 8, 2020

Choose a reason for hiding this comment

timokau Sep 8, 2020

Choose a reason for hiding this comment

kiudee Sep 15, 2020

Choose a reason for hiding this comment

timokau commented Sep 8, 2020

timokau commented Sep 10, 2020

timokau commented Sep 15, 2020

kiudee left a comment

Choose a reason for hiding this comment

kiudee Sep 15, 2020

Choose a reason for hiding this comment

timokau commented Sep 15, 2020

timokau commented Sep 7, 2020 •

edited

Loading