Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API design with penalty and datafit jitted inside fit #44

Merged
merged 41 commits into from
Jul 28, 2022

Conversation

mathurinm
Copy link
Collaborator

@mathurinm mathurinm commented Jul 22, 2022

This attempts to simplify the API of datafits by jit complying classes when fitting models.

Currently, we have a distinct jit class for each type (e.g. Quadratic, Quadratic_32)

@mathurinm mathurinm marked this pull request as draft July 22, 2022 14:06
test_design.py Outdated Show resolved Hide resolved
debug_script.py Outdated Show resolved Hide resolved
debug_script.py Outdated Show resolved Hide resolved
@PABannier
Copy link
Collaborator

I pushed a streamlined version of our_jit_compiler that uses built-in LRU cache.
The performance remains the same:
Screen Shot 2022-07-23 at 11 00 45 AM

Minor change to make it work: change all specs to tuples instead of lists.

@mathurinm
Copy link
Collaborator Author

@PABannier thanks!

I have merged my design proposal into the existing code. It seems to be working OK.

For now I did it only for L1 and Quadratic. It should be straightforward to extend to other datafits and penalties.
It removes then need for the *_32 classes, and the need to hardcode them into generalized estimator.

Something needs to be fixed to handle path correctly (there is duplicated code currently that I could not fix easily)

WDYT of the design @QB3 @Klopfe ?

@mathurinm mathurinm changed the title WIP test design with penalty and datafit jitted inside fit API design with penalty and datafit jitted inside fit Jul 25, 2022
@PABannier
Copy link
Collaborator

Looks very clean to me. The kind of PR that removes a lot of hurdles at once.
You mentioned an issue with path @mathurinm , what is it?

@mathurinm
Copy link
Collaborator Author

I was surprised at first that fit does not call self.path (there is duplicated code in these functions). But path is designed for coherence with the Lasso (I think) and does not use penalty nor datafit.

I think in the end we can/should just call self.path in fit without defining the jitted penalty and datafit there, just define them in the path. That should be OK. @Badr-MOUFAD if you read this :)

Copy link
Collaborator

@Badr-MOUFAD Badr-MOUFAD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the design. Big appreciation @mathurinm!

Before we proceed forward, we need to enhance spec_to_float32 to handle non-float attributes (e.g. spec of QuadraticGroup ). Indeed, in this case, it returns a wrong spec (cf. debug_script).

skglm/datafits/base.py Outdated Show resolved Hide resolved
skglm/estimators.py Outdated Show resolved Hide resolved
skglm/penalties/separable.py Show resolved Hide resolved
skglm/estimators.py Outdated Show resolved Hide resolved
skglm/estimators.py Outdated Show resolved Hide resolved
skglm/datafits/single_task.py Outdated Show resolved Hide resolved
Datafit : datafit class, inheriting from BaseDatafit
A datafit class, to be compiled.
klass : class
Un instantiated Datafit or Penalty.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what is meant here, can you explain?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to indicate that Datafit, resp Penalty, is passed and not Datafit(), resp Penalty().

skglm/datafits/base.py Outdated Show resolved Hide resolved
skglm/datafits/base.py Outdated Show resolved Hide resolved
skglm/datafits/__init__.py Outdated Show resolved Hide resolved
skglm/tests/test_estimators.py Outdated Show resolved Hide resolved
skglm/tests/test_estimators.py Outdated Show resolved Hide resolved
test_design.py Outdated Show resolved Hide resolved
skglm/estimators.py Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@mathurinm mathurinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last thing to do is to add get_spec and params_to_dict to BaseDatafit and BasePenalty, to add docstring explaining what these methods do. then LGTM!

skglm/utils.py Show resolved Hide resolved
skglm/utils.py Show resolved Hide resolved
@mathurinm mathurinm marked this pull request as ready for review July 28, 2022 05:58
@mathurinm mathurinm merged commit f7a1983 into scikit-learn-contrib:main Jul 28, 2022
@mathurinm mathurinm deleted the design_check branch July 28, 2022 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants