Skip to content
This repository has been archived by the owner on Aug 12, 2020. It is now read-only.

Default and optional parameters #2

Open
rth opened this issue May 8, 2019 · 11 comments
Open

Default and optional parameters #2

rth opened this issue May 8, 2019 · 11 comments

Comments

@rth
Copy link

rth commented May 8, 2019

Related to the goal of defining a set of common ML traits as discussed in #1, another question is how to define default and optional parameters in ML models.

For instance, let's take as an example the LogisticRegression estimator that takes as input following parameters (with their default values),

  • regularization penalty: penalty="l2"
  • Tolerance for stopping criteria: tol=0.0001
  • inverse of the regularization strenght: C=1.0
  • etc.

There are then multiple choices for initializing the model with some of these parameters,

1. Passing parameters explicitly + default method

// initialize the model with default parameters
let model = LogisticRegression::default();
// otherwise initialize all parameters explicitly
let model = LogisticRegression::new("l2", 0.001, 0.1);

From a quick look, this pattern (or something similar) appears to be used in the rusty-machine crate (e.g. here). For models with 10+ parameters passing all of them explicitly seems hardly practical.

2. Builder pattern

One could also use the builder pattern and either construct a LogisticRegressionBuilder or maybe even use it directly with LogisticRegression,

use some_crate::linear_model::logistic_regression::Hyperparameters

// Other parameters are left at their default
let params = Hyperparameters::new()
                .penalty("l1")
                .tol(0.01);
// build a model with these parameters
let model = params.build();

Is used by rustlearn as far as I can tell (e.g. here).

The advantage is that hyperparameters are already in a struct, which helps if you want to pass them between estimators or serialize them. The disadvantage is it requires to create a builder for each model. Also, I find that having multiple objects called Hyperparameters in the code base somewhat confusing (and it will definitely be an issue when searching the code for something).

3. Using struct update syntax

Possibly something around the struct update syntax, though I have not explored this topic too much.

struct LogisticRegression {
   penalty: String,
   tol: f64,
   C: f64
}

impl LogisticRegression {
    fn default() -> Self {
        LogisticRegression {
            penalty: "l2".to_string(),
            tol: 0.001,
            C: 1.0
        }
    }

// update one parameter, keep others as defaults
let model = LogisticRegression {tol: 0.1, .. LogisticRegression::default()};

(Note: I have not tried to complile it to see if this actually works)

4. Other approaches

This topic was discussed at length in rust-lang/rfcs#323 and related RFCs, but I'm not sure what was accepted as of now or could be used in rust stable now (or in near future).

Comments would be very welcome. Please let me know if I missed something.

@ehsanmok
Copy link

ehsanmok commented May 8, 2019

Builder pattern is better suited in my opinion and is more adapted in Rust than any other languages. For straightforward builders like plain hyperparams, one can use a handy derive_builder along side Default.

@kazimuth
Copy link

kazimuth commented May 8, 2019

In some cases macros can work for stuff like this, but honestly the more i use macros the more i think they should be avoided except for very special cases. derive_builder is quite good enough for most use cases.

@rth
Copy link
Author

rth commented May 8, 2019

Thanks for the feedback! For the above example, would you say that the builder struct should be named LogisticRegressionBuilder or something similar to Hyperparameters as in rustlearn?

The former case could lead to somewhat long names e.g. LatentDirichletAllocationBuilder but I suppose it's better to be explicit.

@ehsanmok
Copy link

ehsanmok commented May 8, 2019

Perhaps a better way is to make a distinction between hyperparam/config and the actual logistic regression via LogisticRegressionConfig and LogisticRegression. Something like

#[derive(Builder)]
struct LogisticRegressionConfig { ... }

struct LogisticRegression {
    weights: NDArray,
    bias: Option<NDArray>,
    config: LogisticRegressionConfig
}

@rth
Copy link
Author

rth commented May 8, 2019

@ehsanmok How would you instantiate the LogisticRegression struct in the above case then,

let config = LogisticRegressionConfigBuilder::default()
               .some_option(value)
               .build();

let model = LogisticRegression::new(config);

?

That would still have an issue with overly verbose config builder names, wouldn't it?

@ehsanmok
Copy link

ehsanmok commented May 8, 2019

Oh, yes, yes! let me rephrase. These are the options

  1. All in one
struct LogisticRegression {
    ws: NDArray, 
    penatly: String,
    ....
}
  1. Separation of hyperparams/config (for easier builder) and the model itself
#[derive(Builder)]
struct LogisticRegressionConfig { ... }

struct LogisticRegression {
    ws: NDArray,
    config: LogisticRegressionConfig,
}

impl LogisticRegression {
    fn new() -> Self { ... } // or init method with specific initializer type!
    fn with_hyperparams(self, hp: LogisticRegressionConfig) -> Self { ... }
}
  1. Bridge and refactor with trait
#[derive(Builder)]
struct LogisticRegressionConfig { ... }

struct LogisticRegression {
    ws: NDArray,
    config: LogisticRegressionConfig
}

trait HyperParams {
    type Config;
    fn with_hyperparams(config: Self::Config) -> Self;
    // or
    fn with_hyperparams(self, config: Self::Config) -> Self;
}

impl HyperParams for LogisticRegression { 
    type Config = LogisticRegressionConfig;
    .... 
}
  1. Overkill maybe to specify hyperparams in proc-macro like!
#[derive(HyperParams(type=LinearModel, params=...))]  // needs succinct definition though!
struct LogisticRegression {
    ws: NDArray,
}

Among all these, I prefer the 3rd option which provides more flexibility. Ultimately, we wish to have a clean client scikit-learn setup like

let hp = LogisticRegressionConfig::default(). ... .build();
let model = LogisticRegression::new().with_hyperparams(hp);
// or cleaner with `LogisticRegression::with_hyperparams(hp)`
model.train(X_train, y_train)?;
let y_hat = model.predict(X_test)?;

@jblondin
Copy link

jblondin commented May 9, 2019

This is stepping away from the exact builder pattern a little bit, but I like the idea of a config-taking constructor for a given estimate, pretty similar to the with_hyperparams method:

trait FromConfig<C> {
    fn from_config(config: C) -> Self;
}

struct LogisticRegression {
    config: LogisticRegressionConfig,
    ...
}

impl FromConfig<LogisticRegressionConfig> for LogisticRegression {
    fn from_config(config: LogisticRegressionConfig) -> Self {
        LogisticRegression { config, ... }
    }
}

You could add an easy default LogisticRegression constructor:

impl Default for LogisticRegressionConfig { ... }

// This could even be derived -- derive(DefaultFromConfig(config=LogisticRegressionConfig))
impl Default for LogisticRegression {
    fn default() -> Self {
        Self::from_config(LogisticRegressionConfig::default())
    }
}

Also, if the config object is just a holding area for parameters, I don't think we necessarily need to have a explicit 'build' method, but then we're getting close to just having a plain struct with struct update syntax, which... I think I'm warming up to?

Just brainstorming, really.

One other note, and a bit off-topic, but I really would prefer that any config specification of a regularizer (or common loss function, error function, whatever), like "l2", be done using an enum as opposed to a string-based config operation. It saves us doing string parsing and needing to return a run-time error when someone typos their desired penalty when building the config struct.

@LukeMathWalker
Copy link
Contributor

I have two issues with struct update syntax:

  • it's not "obvious", as in something you would do if you were given an API that allows it. Most people would probably go ahead and specify all parameters. This can probably be mitigated by good docs;
  • the builder pattern, with a clear separation between model and configs (as in @ehsanmok's third example or @jblondin's last comment), gives you flexibility: you either expose all parameters as settable using a derive macro or you can explicitly code in relationships between different parameters, preventing people from specifying non-sensical hyperparams (e.g. don't specify a regularizer coefficient if you have no regularizer!).

On the other hand, one could argue that the encoding of constraints can be done by grouping together related params in a struct, e.g.

trait FromConfig<C> {
    fn from_config(config: C) -> Self;
}

struct LogisticRegression {
    config: LogisticRegressionConfig,
    ...
}

struct LogisticRegressionConfig {
    regularizer_config: RegularizerConfig;
    optimizer_config: OptimizerConfig;
    ...
}

enum RegularizerConfig {
    L2(f64),
    L1(f64),
    ...
}

struct OptimizerConfig {
   ...
}

impl FromConfig<LogisticRegressionConfig> for LogisticRegression {
    fn from_config(config: LogisticRegressionConfig) -> Self {
        LogisticRegression { config, ... }
    }
}

Being even more aggressive though, having configs for what actually are different entities (regularizer, loss function, ...) in the same struct probably means that we are conflating too much behaviour in our LogisticRegression and we could probably extract it, e.g. having a Loss trait and Optimizer trait, like Keras does.
Probably having more than 3 or 4 optional parameters is actually a smell that should warn us that we are probably putting together a bunch of things that could actually be separated and fleshed out independently.
In my case as well, just food for thought :)

@ehsanmok
Copy link

ehsanmok commented May 9, 2019

Being even more aggressive though, having configs for what actually are different entities (regularizer, loss function, ...) in the same struct probably means that we are conflating too much behaviour in our LogisticRegression and we could probably extract it, e.g. having a Loss trait and Optimizer trait, like Keras does.

Yes, agree! I have actually thought about this that having traits like Loss, Optimizer, Initializer is perhaps the best way of providing separation of concerns to making disjoint hyperparams/configs. After that maybe having a compile like method (in keras) which is essentially a builder to make the learnable model at the end. They're in fact, common in many DL framework as opposed to scikit-learn where they put everything in the constructor.

Overall, I like this discussion and it makes me think more about the design 🙂

@uberblah
Copy link

Perhaps what Rust is missing to make builder pattern a little bit smoother, is something for generating builders, similar to Lombok from Java.

@nestordemeure
Copy link

In some cases macros can work for stuff like this

The duang crate might provide a viable macro solution to this problem.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants