Design exploration #2

LukeMathWalker · 2019-05-11T17:20:16Z

I have started to play around with some traits to explore how we could structure the different concepts in a ML workflow.

For now I have kept it very simple:

a Model trait (should it be renamed to Transformer?);
a Blueprint trait (serving as initializer for Model types, it holds the model configuration);
an Optimizer trait (encoding the training step).

For the same Model type we could potentially have multiple Blueprints, each one providing a different parametrization of the space of possible models, as well as multiple Optimizers.

Model, as defined here, could potentially be used to represent any kind of transformation (e.g. preprocessing steps).

I am now trying to come down with something to encode the concept of pipeline or network of transformations, but I have not nailed it down yet.

…ssible to just "drop in" a loss function.

…eprintGenerator for Blueprints

LukeMathWalker · 2019-05-12T17:42:48Z

I have added another trait, called BlueprintGenerator, to mark the possible parametrizations of a Blueprint - mostly with the final aim of performing some kind of hyperparameter optimization routine (grid search, random search, bayesian fancy search, etc.).

jblondin

Looking good!

I think I need to see a few example workflows using this trait set to really consider the ergonomics. If I have some time over the next couple days, I'll take a crack at adding some example code to this pull request (most likely with no-op estimators).

jblondin · 2019-05-12T18:43:25Z

src/lib.rs

+/// In the same way, it has no notion of loss or "correct" predictions.
+/// Those concepts are embedded elsewhere.
+pub trait Model {
+    type Input;


I'm trying to think whether Input and Output should be associated types or struct generics (Model<Input, Output>). It's definitely possible that a trained model could be implemented to provide predictions over multiple types of input / output. For instance, we could have a model defined over ndarray input, or dataframe input, or even a Vec<T>.

I could also see a case for Model<Input> with Output being an associated type -- given a particular input, the output could only be a specific type.

jblondin · 2019-05-12T18:45:59Z

src/lib.rs

+/// This means that there is no difference between one-shot training and incremental training.
+/// Furthermore, the optimizer doesn't have to "own" the model or know anything about its hyperparameters,
+/// because it never has to initialize it.
+pub trait Optimizer<M>


Wording: Optimizer or something like Estimator? Optimizer might be confusing given that some algorithms are actually optimization algorithms, but others aren't.

jblondin · 2019-05-12T19:16:10Z

src/lib.rs

+/// Each of these strategies can take different (hyper)parameters, even though they return an
+/// instance of the same model type in the end.
+///
+/// The initialization procedure could be data-dependent, hence the signature of `initialize`.


I'm a bit concerned about potential user confusion about what should be put in the Blueprint's initialize method vs Optimizer's train method, given the similarities in method signatures (they both take input and targets, they both return models).

What would be an example of a workflow with a data-dependent initialization? Are there any other options for handling that initialization?

LukeMathWalker · 2019-05-14T08:02:46Z

I have been trying to sketch out some usage and I ran into some of the issues you have identified.
So let me take a step back - what do I want to achieve?
My goals are:

use the type system to avoid incorrect or inconsistent usage of models. From a design perspective, I'd like to have models as state machines, e.g. they can only be in a finite set of states and there are clear transition path between states. Nonsensical transitions are not allowed (making predictions with an untrained model);
for the same model class, it should be possible to specify configuration APIs at different levels of granularity/control, without having all the extensions in the main library (another crate could provide an alternative constructor for our SVMs with fewer hyperparams because some interesting/smart defaults have been found out to work very well);
for the same model class, it should be possible to use different training methodologies, without having to duplicate the model code itself. For example, you might want to do the first training pass of your recommendation engine on a lot of public data using a fairly standard linear-algebra method, while you might want to resort to gradient descent to fine-tune it on your much smaller proprietary dataset. In the same spirit, some training algorithms could be more or less high-level, thus allowing the re-use of this crate as the engine of a more off-the-shelf solution (see PyTorch vs Fast.ai or Keras vs Tensorflow);
for the same model class, we should be able to provide different training routines depending on the data coming in (Is it batched? Is it just a huge table?);
the space of primitive "concepts" should be small. For example, it should be possible to express pipelines and pipeline optimization using the same set of traits used to express single models and their optimization routines. They should be, in a certain sense, recursive.

Looking back at what I have written, and at your comments @jblondin, I can see how this draft fails to accomodate some of these requirements. I'll try to put down a revised sketch tonight, ideally with some example code using no-op estimators or very simple estimators (computing the mean).

jblondin · 2019-05-14T15:50:38Z

Some brief thoughts on the goals:

use the type system to avoid incorrect or inconsistent usage of models. From a design perspective, I'd like to have models as state machines, e.g. they can only be in a finite set of states and there are clear transition path between states. Nonsensical transitions are not allowed (making predictions with an untrained model);

Agreed. I also would prefer a mutation-free workflow (which you already have with the optimizer consuming the model and creating a new one). In other words, nothing like this:

// would NOT prefer this style
let mut model = Model::from(blueprint);
model.train(train_data, targets).;
let predictions = model.predict(test_data);

for the same model class, it should be possible to specify configuration APIs at different levels of granularity/control, without having all the extensions in the main library (another crate could provide an alternative constructor for our SVMs with fewer hyperparams because some interesting/smart defaults have been found out to work very well);

So, a trait-based Blueprint concept, like you have? A downstream crate could just create a new struct that implements Blueprint that can be used to create a new model?

for the same model class, it should be possible to use different training methodologies, without having to duplicate the model code itself. For example, you might want to do the first training pass of your recommendation engine on a lot of public data using a fairly standard linear-algebra method, while you might want to resort to gradient descent to fine-tune it on your much smaller proprietary dataset. In the same spirit, some training algorithms could be more or less high-level, thus allowing the re-use of this crate as the engine of a more off-the-shelf solution (see PyTorch vs Fast.ai or Keras vs Tensorflow);

This seems like it would be useful for transfer learning tasks -- taking a model trained with one algorithm / data set, and then updated it (or a subset of it) with another algorithm. The model could even support different components that are trained differently. In the deep learning / CNN use case, the convolutional layers are usually transferred, and the fully-connected neural network at the 'end' of the network is retrained for the new learning problem.

the space of primitive "concepts" should be small. For example, it should be possible to express pipelines and pipeline optimization using the same set of traits used to express single models and their optimization routines. They should be, in a certain sense, recursive.

Agreed. A pipeline could have a pipeline component. I would love to be able to just define a SVM pipeline, then use that as a component in a bayesian optimization pipeline for model selection without needed new 'concepts'.

LukeMathWalker · 2019-05-15T08:05:53Z

Some brief thoughts on the goals:

use the type system to avoid incorrect or inconsistent usage of models. From a design perspective, I'd like to have models as state machines, e.g. they can only be in a finite set of states and there are clear transition path between states. Nonsensical transitions are not allowed (making predictions with an untrained model);

Agreed. I also would prefer a mutation-free workflow (which you already have with the optimizer consuming the model and creating a new one). In other words, nothing like this:
// would NOT prefer this style
let mut model = Model::from(blueprint);
model.train(train_data, targets).;
let predictions = model.predict(test_data);

Same feeling - Rust gives us move semantics, which we can use to have optimized routines (using mutation inside the method) while still providing a more side-effect free API to consumers.

for the same model class, it should be possible to use different training methodologies, without having to duplicate the model code itself. For example, you might want to do the first training pass of your recommendation engine on a lot of public data using a fairly standard linear-algebra method, while you might want to resort to gradient descent to fine-tune it on your much smaller proprietary dataset. In the same spirit, some training algorithms could be more or less high-level, thus allowing the re-use of this crate as the engine of a more off-the-shelf solution (see PyTorch vs Fast.ai or Keras vs Tensorflow);

This seems like it would be useful for transfer learning tasks -- taking a model trained with one algorithm / data set, and then updated it (or a subset of it) with another algorithm. The model could even support different components that are trained differently. In the deep learning / CNN use case, the convolutional layers are usually transferred, and the fully-connected neural network at the 'end' of the network is retrained for the new learning problem.

This was exactly one of my driving examples.

I have done another iteration, unfortunately I didn't master the time to provide a code example, but I'd still appreciate your feedback @jblondin.
What have I changed:

I have changed Model to Transformer, given that they could either be proper estimators or preprocessing steps;
I have changed Optimizer to Fit, as you suggested that optimization could be seen as a too narrow concept;
I have introduced IncrementalFit, to distinguish between an additional training round on the same transformer and the initial training round from a Blueprint;
Transformer is now an associated type of Blueprint, I couldn't find any meaningful example where I would reuse the same configuration for different model types. If we find one, it's not difficult to change it into a generic parameter again;
Transformer is now generic over both input and output. The reason for being generic over input is quite clear, when it comes to output my main concern was predicting classes/single-value vs returning a probability distribution. We could potentially support both in a single trait with a generic output type.

…e reference

LukeMathWalker · 2019-05-19T17:16:11Z

I have added a first, very simple example: standard scaler, supporting one-off and incremental computation of both mean and standard deviation.
Let me know how it feels @jblondin.

The main issue I experienced is around optimizers: I had to modify both Fit and IncrementalFit to take self as a mutable reference, otherwise there was no way for me to record the number of samples that had been seen so far. How should we work around this? Returning a tuple, Result<(T: Transformer, F: Fit), Self::Error>?

jblondin · 2019-05-29T00:07:50Z

Sorry for the delay in getting to this! I've been a bit backed up the past week or so. I'm going to have to give this some thought, but here's a few quick comments...

The main issue I experienced is around optimizers: I had to modify both Fit and IncrementalFit to take self as a mutable reference, otherwise there was no way for me to record the number of samples that had been seen so far. How should we work around this? Returning a tuple, Result<(T: Transformer, F: Fit), Self::Error>?

My initial reaction is that the number of samples should actually be an update to the config (Blueprint) instead of the Fit object itself. But this would require adding a Blueprint parameter to the IncrementalFit method. I feel like this should be the case, though, since you're basically using the Transformer object to carry through the ddof config setting to IncrementalFit -- the ddof isn't actually used in the prediction (transformation) step at all.

Of course, even if you do pass a configuration to incremental_fit, this configuration would need to either be mutated or returned by the initial fit, which gets us back to your initial issue, which I don't have a good solution for.

I feel like this demonstrates that this Fit -> IncrementalFit methodology may be a bit unwieldy. I don't have concrete suggestions to resolve this at the moment; I've started playing with some of my own examples on this framework locally to see if I can come up with something. I'll hopefully have some more thoughts in the next couple days as I work with it!

jblondin · 2019-05-29T00:12:49Z

One more thought - I like using the generic name Transformer for something that transforms an input set to an output set, but it's less readable as to what it does (when functioning as a predictive model). Some of this can be resolved with good documentation.

We could also have Model be a separate trait with Transformer as a supertrait -- I'm secretly hoping we find some functionality that Model has that Transformer doesn't need that would necessitate this separation of traits 😆

jblondin · 2019-05-30T14:09:56Z

I think I prefer the original workflow, without the separate IncrementalFit. I'm not sure what the advantage is of moving the model initialization code outside the Blueprint -- the blueprint seems like a good place to represent the initial model state.

pub trait Blueprint<I, O> {
    type Transformer: Transformer<I, O>;
    fn initialize(&self) -> Self::Transformer;
}

pub trait Fit<T, I, O> 
where
    T: Transformer<I, O>
{
    type Error: error::Error;
    fn fit(&self, inputs: &I, targets: &O, transformer: T) -> Result<T, Self::Error>;
}

with an example workflow

let blueprint = SomeConfig::new();
let model = my_algorithm.fit(&train, &targets, blueprint.initialize())?;
let preds = model.transform(&test)?;
// generate new batch of input
let model  = my_algorithm.fit(&new_train, &new_targets, model)?;
let better_preds = model.transform(&self)?;

jblondin · 2019-05-30T14:43:26Z

Sorry, should've added a couple more thoughts to my last comment.

This would require a bit more 'weight' to the Transformer object -- it would have to carry any relevant configuration information that would needed by the optimizer. In the case of your ScandardScaler, it would need to include the ddof (which you already have in there) as well as the n_samples seen so far.

On the plus side, this avoids any modification to the Fit object or the Blueprint object (which, in hindsight, seems like a bad idea on my part) and keeps the method signature cleaner (no returning of tuples).

I'm sure there are workflow quirks we're not considering at this point -- I feel like we're close to the point where we should prototype something and start iterating as we implement different models, algorithms, and data science workflows.

rth · 2019-06-10T12:33:22Z

examples/running_mean/main.rs

+    let (x, y) = generate_batch(n_samples);
+
+    let mut optimizer = OnlineOptimizer::default();
+    let standard_scaler = optimizer.fit(&x, &y, Config::default())?;


Passing the config in fit time might make difficult to compose estimators. Say if the esimator is a pipeline of estimators we wouldn't want to pass all the config in a fit. Having two steps a) building the pipeline b) fitting it is more natural IMO

The two things are not mutually exclusive I'd say. You could compose the configuration of all steps in the pipeline and then pass that in when you want to fit it, it shouldn't look very different.
But I have yet to actually prototype it, so take it with a grain of salt.

rth · 2019-06-10T12:38:02Z

examples/running_mean/main.rs

+    check(&standard_scaler, &x)?;
+
+    let (x2, y2) = generate_batch(n_samples);
+    let standard_scaler = optimizer.incremental_fit(&x2, &y2, standard_scaler)?;


I find this conceptually difficult to follow. If anything I would have expected,

standard_scaler.incremental_fit(&x2, &y2, &optimizer)

not the other way around.

Yeah, that's a good point. It should be easy enough to flip it.

rth · 2019-06-10T12:39:08Z

examples/running_mean/standard_scaler/optimizer.rs

+        &mut self,
+        inputs: &Input<S>,
+        _targets: &Output,
+        blueprint: Config,


why not call this config or params? Is there any other ML library that uses the "blueprint" vocabulary?

params as in scikit-learn would be more accurate IMO -- we are not providing model configuration but model parameters.

Params is kind of an overloaded term: in this case, I'd say that we are passing hyperparameters (e.g. number of convolutional layers in a CNN), not parameters (e.g. the network weights).
I think it's quite natural to call the set of model hyperparameters model configuration.
We can safely discard the blueprint terminology, but I'd try to stick to terms that are not ambiguous.

Why do you prefer "model configuration" to "hyperparameter" for that purpose?
(Is it the sole fact that Config is smaller?)

rth · 2019-06-10T12:52:16Z

examples/running_mean/main.rs

+#[macro_use]
+extern crate derive_more;
+
+use crate::standard_scaler::{Config, OnlineOptimizer, ScalingError, StandardScaler};


So if we have several models, this means we would need to use the full paths, e.g.

use crate::standard_scaler; use crate::linear_model::logistic_regression; let mut standard_scaler_optimizer = standard_scaler::OnlineOptimizer::default(); let standard_scaler = standard_scaler_optimizer.fit(&x, &y, standard_scaler::Config::default())?; let X_tr, y_tr = optimizer.transform(&x, &y); let mut logregr_optimizer = logistic_regression::OnlineOptimizer::default(); let standard_scaler = logregr_optimizer.fit(&x, &y, logistic_regression::Config::default())?;

which might become somewhat difficult to manage?

Also purely from the user experience and readability (I understand this has other advantages) I find the builder pattern in rustlearn somewhat simpler because one doesn't have to deal with the optimizer.

LukeMathWalker · 2019-06-11T07:20:56Z

I think I prefer the original workflow, without the separate IncrementalFit. I'm not sure what the advantage is of moving the model initialization code outside the Blueprint -- the blueprint seems like a good place to represent the initial model state.

The main issue is correctness: after you have called blueprint.initialize(), can that Transformer be used to do predictions? If you don't go and actually check that a fit call has happened, you risk proceeding with something that returns nonsensical predictions.
Ideally, it should be the type system to tell you that something has never been fit, so it shouldn't be used to predict.

I'm sure there are workflow quirks we're not considering at this point -- I feel like we're close to the point where we should prototype something and start iterating as we implement different models, algorithms, and data science workflows.

I do agree 100%. I have been a little bit busy lately with a bunch of side projects, but now I should be able to get focused on it again. Should we do a list of models that we should start with, ideally giving us a sufficiently diverse range of quirks to enable design validation @jblondin?

jblondin · 2019-06-11T16:14:07Z

The main issue is correctness: after you have called blueprint.initialize(), can that Transformer be used to do predictions? If you don't go and actually check that a fit call has happened, you risk proceeding with something that returns nonsensical predictions.

If initialize returns the base non-optimized model, that model could indeed be used to do predictions (although likely poor ones since you haven't done any training yet). Those predictions may even be useful in tracking the effectiveness of your optimizer by giving a baseline for your prior.

Conceptually, I don't think the model and optimizer should be so interwoven that the optimizer is absolutely required to help build the initial model (which would demand a fit call on the optimizer to produce a valid model). I would prefer to keep the concepts separate.

Can you give me an example of a violation of 'correctness' in this context? I feel like you could still effectively apply local reasoning in my example workflow.

Should we do a list of models that we should start with, ideally giving us a sufficiently diverse range of quirks to enable design validation @jblondin?

Sounds good. I'll start giving it some thought!

LukeMathWalker · 2019-07-12T09:04:24Z

I have found a couple of repos that should allow me to get a sizeable collection of algorithms up and running in a short amount of time:

They use just NumPy and vanilla Python, so it should be quite straight-forward to port them to Rust using ndarray 👍
It should give us a sense of what abstractions will work on a sufficiently vast array of algorithms @jblondin @rth

LukeMathWalker · 2019-07-17T22:16:37Z

I have started with rust-ndarray/ndarray-linalg#166 👀

LukeMathWalker added 17 commits May 11, 2019 14:54

Initialize project

1345273

Start from rusty-machine approach

c602eec

Explain rationale

7407331

Restructure the trainer class

a808405

Cargo fmt + comment

866b1ed

Add rationale for Optimizer trait (former Trainer)

0fa4e2b

Introduce Blueprint trait and add details to Optimizer docs.

6bc7a71

Cargo fmt

60e649b

Ignore IDE-related files

1c4bcc4

Remove the loss parameter from Optimizer: for most models it's not po…

5d03be2

…ssible to just "drop in" a loss function.

Typo

2b0bbf6

Typos

d56a5a0

Add BlueprintGenerator

fbe9a66

Remove parameters from generate

6f1b906

Refine BlueprintGenerator, moving I to associated type. Implement Blu…

f1a0312

…eprintGenerator for Blueprints

Re-org

d39a6e3

Add blanket implementation of Blueprint for Model types

cbfa74c

jblondin reviewed May 12, 2019

View reviewed changes

LukeMathWalker added 4 commits May 15, 2019 08:42

Refactor

afebfe1

Add docsc, fix typos

7a2c4a9

Make input and output generic parameters

f7f8f50

Add comments

da9a9d7

LukeMathWalker added 4 commits May 19, 2019 15:45

Doc minor fix

834ffdd

Add examples folder

9ffa8cd

Basic transformer implementation for standard scaling

dfe62f4

Add other structs

93ad90a

LukeMathWalker added 9 commits May 19, 2019 16:38

Skeleton of Fit and IncrementalFit implementation

39b7379

Implemented Fit trait

6aa09e0

Convert ddof to f64. Make fit and incremental_fit take self as mutabl…

ace68b1

…e reference

Implement IncrementalFit

d473588

Add very basic usage example

c494d3c

Move into folder

8616d44

Restructure into a proper module

2a54ddd

Restructure into a proper module

3d23c4b

Fix stdd update

a29de61

Clean up code for stdd update

eef5f6f

rth reviewed Jun 10, 2019

View reviewed changes

LukeMathWalker mentioned this pull request Jul 17, 2019

Linear regression example rust-ndarray/ndarray-linalg#166

Closed

LukeMathWalker mentioned this pull request Dec 8, 2019

Clarify conventions for public interfaces rust-ml/linfa#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design exploration #2

Design exploration #2

LukeMathWalker commented May 11, 2019

LukeMathWalker commented May 12, 2019

jblondin left a comment

jblondin May 12, 2019

jblondin May 12, 2019

jblondin May 12, 2019

LukeMathWalker commented May 14, 2019

jblondin commented May 14, 2019

LukeMathWalker commented May 15, 2019 •

edited

Loading

LukeMathWalker commented May 19, 2019

jblondin commented May 29, 2019

jblondin commented May 29, 2019 •

edited

Loading

jblondin commented May 30, 2019

jblondin commented May 30, 2019

rth Jun 10, 2019

LukeMathWalker Jun 11, 2019

rth Jun 10, 2019

LukeMathWalker Jun 11, 2019

rth Jun 10, 2019

LukeMathWalker Jun 11, 2019

Ten0 Dec 26, 2019 •

edited

Loading

rth Jun 10, 2019

LukeMathWalker commented Jun 11, 2019

jblondin commented Jun 11, 2019

LukeMathWalker commented Jul 12, 2019

LukeMathWalker commented Jul 17, 2019

Design exploration #2

Are you sure you want to change the base?

Design exploration #2

Conversation

LukeMathWalker commented May 11, 2019

LukeMathWalker commented May 12, 2019

jblondin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LukeMathWalker commented May 14, 2019

jblondin commented May 14, 2019

LukeMathWalker commented May 15, 2019 • edited Loading

LukeMathWalker commented May 19, 2019

jblondin commented May 29, 2019

jblondin commented May 29, 2019 • edited Loading

jblondin commented May 30, 2019

jblondin commented May 30, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ten0 Dec 26, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LukeMathWalker commented Jun 11, 2019

jblondin commented Jun 11, 2019

LukeMathWalker commented Jul 12, 2019

LukeMathWalker commented Jul 17, 2019

LukeMathWalker commented May 15, 2019 •

edited

Loading

jblondin commented May 29, 2019 •

edited

Loading

Ten0 Dec 26, 2019 •

edited

Loading