Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: do not require an empty struct in cross_validate to define … #231

Open
wants to merge 2 commits into
base: development
Choose a base branch
from

Conversation

morenol
Copy link
Collaborator

@morenol morenol commented Nov 4, 2022

…which is the estimator

Fixes #230

Checklist

  • My branch is up-to-date with development branch.
  • Everything works and tested on latest stable Rust.
  • Coverage and Linting have been applied

Current behaviour

New expected behaviour

Change logs

@morenol
Copy link
Collaborator Author

morenol commented Nov 4, 2022

Even though, it looks like more easy to use this passing the parameter, it looks weird to me to pass an unitialized empty struct, instead we can take advantage of that we can explicitly define which is the Estimator that we want to use using turbofish syntax

@morenol morenol marked this pull request as ready for review November 4, 2022 21:02
@morenol morenol requested a review from Mec-iS as a code owner November 4, 2022 21:02
@Mec-iS
Copy link
Collaborator

Mec-iS commented Nov 4, 2022

@morenol from a programming perspective I agree, but writing ::<_, _, _, _, _, LogisticRegression<_, _, _, _>, _, _> is really not viable for an interface that has to face an end-user. Think if you had to write that to issue a request using Python requests: requests.request::<_,_,_,_, POST,_,_,_>(...) ?!!? would you ever use something like that?

If you want to remove the new there are these options imo:

  • passing the single fit function using Fn or FnOnce but it is hard as every estimator has different signatures
  • passing the string name "LogisticRegression" and instantiate an object inside fit using an enumerator (as proposed for Kernel in Deserialization for Kernel #221) or an enumerator key like EstimatorEnum::LogisticRegression
  • refactoring cross_validate in a way that the user has to write only cross_validate::<LogisticRegression>(...)
  • anything else I don't know about

I used new because there are some other constraints in place and the alternatives are not straightforward. As we have as a main point to keep an easy "pythonic" API we cannot allow Rust's types to reach the end-user's level.

PS: I don't like how cross_validate and cross_validate_predict are implemented, so maybe we should rewrite them in a better way. This could also lead to a better definition of the SupervisedEstimator and Predictor traits. For an example see fit and predict.

@codecov-commenter
Copy link

Codecov Report

Merging #231 (e95798d) into development (aab3817) will increase coverage by 0.11%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##           development     #231      +/-   ##
===============================================
+ Coverage        43.93%   44.05%   +0.11%     
===============================================
  Files               85       85              
  Lines             7290     7268      -22     
===============================================
- Hits              3203     3202       -1     
+ Misses            4087     4066      -21     
Impacted Files Coverage Δ
src/ensemble/random_forest_classifier.rs 53.54% <ø> (+0.34%) ⬆️
src/ensemble/random_forest_regressor.rs 57.37% <ø> (+0.46%) ⬆️
src/linear/elastic_net.rs 50.00% <ø> (+0.39%) ⬆️
src/linear/lasso.rs 50.00% <ø> (+0.45%) ⬆️
src/linear/linear_regression.rs 35.41% <ø> (+1.41%) ⬆️
src/linear/logistic_regression.rs 38.14% <ø> (+0.19%) ⬆️
src/linear/ridge_regression.rs 40.17% <ø> (+0.34%) ⬆️
src/model_selection/mod.rs 72.22% <ø> (ø)
src/naive_bayes/bernoulli.rs 41.56% <ø> (+0.84%) ⬆️
src/naive_bayes/categorical.rs 34.50% <ø> (-1.16%) ⬇️
... and 14 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@morenol
Copy link
Collaborator Author

morenol commented Nov 5, 2022

@morenol from a programming perspective I agree, but writing ::<_, _, _, _, _, LogisticRegression<_, _, _, _>, _, _> is really not viable for an interface that has to face an end-user. Think if you had to write that to issue a request using Python requests: requests.request::<_,_,_,_, POST,_,_,_>(...) ?!!? would you ever use something like that?

If you want to remove the new there are these options imo:

  • passing the single fit function using Fn or FnOnce but it is hard as every estimator has different signatures
  • passing the string name "LogisticRegression" and instantiate an object inside fit using an enumerator (as proposed for Kernel in Deserialization for Kernel #221) or an enumerator key like EstimatorEnum::LogisticRegression
  • refactoring cross_validate in a way that the user has to write only cross_validate::<LogisticRegression>(...)
  • anything else I don't know about

I used new because there are some other constraints in place and the alternatives are not straightforward. As we have as a main point to keep an easy "pythonic" API we cannot allow Rust's types to reach the end-user's level.

PS: I don't like how cross_validate and cross_validate_predict are implemented, so maybe we should rewrite them in a better way. This could also lead to a better definition of the SupervisedEstimator and Predictor traits. For an example see fit and predict.

I agree, I am not fan of having an API were something like this is required ::<_, _, _, _, _, LogisticRegression<_, _, _, _>, _, _>. I will keep this open until I find a better way

@Sammyreal1
Copy link

Even though, it looks like more easy to use this passing the parameter, it looks weird to me to pass an unitialized empty struct, instead we can take advantage of that we can explicitly define which is the Estimator that we want to use using turbofish syntax

@Mec-iS
Copy link
Collaborator

Mec-iS commented May 7, 2023

Even though, it looks like more easy to use this passing the parameter, it looks weird to me to pass an unitialized empty struct, instead we can take advantage of that we can explicitly define which is the Estimator that we want to use using turbofish syntax

see discussion at #230

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use Default::default instead of new in traits
4 participants