Rework to conversion and allowing many input types #49

JonathanWoollett-Light · 2022-02-16T19:26:29Z

In essence the traits:

pub trait Input: InternalConvert<nalgebra::base::DMatrix<f64>> + Clone
{
    type Output: InternalConvert<nalgebra::base::DVector<f64>> + Clone;
}
pub trait InternalConvert<T>
{
    fn i_into(self) -> T;
    fn i_from(x: T) -> Self;
}

are in my opinion a more explicit and extensible approach than:

pub trait Input: Sized
{
    type InVector: Sized;
    type OutVector;

    fn to_dmatrix(m: &Self) -> DMatrix<f64>;
    fn into_dmatrix(m: Self) -> DMatrix<f64>
    {
        Self::to_dmatrix(&m)
    }
    fn to_dvector(v: &Self::InVector) -> DVector<f64>;
    fn into_dvector(v: Self::InVector) -> DVector<f64>
    {
        Self::to_dvector(&v)
    }
    fn from_dvector(v: &DVector<f64>) -> Self::OutVector;
}

While there are no criterion benchmarks to guarantee no performance regression I do not believe these change would have any affect.

While clones may seem more common I believe they have simply been made more explicit (not hiding inside the respective `into()` method).

nestordemeure · 2022-02-16T20:01:57Z

The design is simpler and I like that. The thing I am not a fan of is that there is no conversion from reference in this version leading to clones in the code that might be avoided (if the data will be copied during the convertion process anyway).

The two potential fixes I see are:

adding a transformation from reference (fn i_to(&self) -> T; that is basically what the previous implementation did)
taking things that we need to clone as value so that the user can decide wether they need to clone or can just release ownership of them (which makes for a weird API on the user side, there is no reason for some of those things to be taken as value)

I would go with the first one.

nestordemeure · 2022-02-16T20:02:38Z

Also, adding tests to cover the various conversions would be good, that is a hole in the test coverage.

JonathanWoollett-Light · 2022-02-16T20:24:11Z

Considering:

there is no conversion from reference in this version leading to clones in the code

I would note that in the conversion via reference in the old approach it was just obfuscating a clone, I tend to be of the opinion it is best to bubble up things like clones so they are as explicit as possible.

nestordemeure · 2022-02-16T20:28:21Z

I would note that in the conversion via reference in the old approach it was just obfuscating a clone, I tend to be of the opinion it is best to bubble up things like clones so they are as explicit as possible.

The old approach used a clone as the default implementation but few of the actual implementations used a clone (from memory only the ones taking an nalgebra type as reference and returning an owned nalgebra type).

Most conversion functions used under the hood take a reference so a lot of clones are not needed.

JonathanWoollett-Light · 2022-02-16T20:34:38Z

Okay, I'll look into it and commit some changes to sort it.

JonathanWoollett-Light · 2022-02-17T20:39:07Z

I can't find where data is now being cloned that it was previously reinterpreted.

Notably all the cases of conversion functions I've found as you mention with:

Most conversion functions used under the hood take a reference so a lot of clones are not needed.

Are the same as clones, just less explicit. They not reinterpret castes or creating a DMatrix which individually borrows elements/row/columns.

nestordemeure · 2022-02-17T21:02:58Z

Your version does a clone of the initial Input type (in the GP code) followed by a clone when copying to the nalgebra type (in the Input trait). The previous implementation (in the reference case) only did the copying to the nalgebra type (in the Input trait).

I can give you some line numbers if my explanation is not clear.

JonathanWoollett-Light · 2022-02-17T21:15:44Z

Copying to nalgebra types is no different to cloning (copying is just implicit cloning after all), it will compile to the same. Except implementing a From<&T> may mislead the user in suggesting it is not in fact clone/copying the underlying data when it is.

But if you have any specific examples you disagree with if you reference some code that'd be good.

nestordemeure · 2022-02-17T21:43:15Z

Copying to nalgebra types is no different to cloning (copying is just implicit cloning after all)

That's my point, your version does a clone followed by the conversion (so two clones, here for example: inputs.clone().i_into()) whereas the conversion can usually be done from a reference letting us skip the first clone (we are indeed still paying for the second one, there for example).

My suggestion would be to have an fn i_clone_into(&self) -> T; operation in the trait. This way the fact that it will clone is explicit in the name but we can insure that there is only one clone.

JonathanWoollett-Light · 2022-02-17T22:14:58Z

It doesn't do 2 clone, for instance see this code:

use std::collections::HashSet;
fn main() {
    let a = vec![String::from("1"),String::from("2"),String::from("3")]
    // Without using `a`  and `b` further it will simply be moved here, not cloned
    let b = some_function(a.clone());
   // let b = some_function_2(&a);
    // If we use both `a` and `b`, we have effectively cloned `a` once.
   
}
fn some_function(x: Vec<String>) -> HashSet<String> {
    x.into_iter().collect::<HashSet<_>>()
}
// If we replace our `some_function` with something like, we have obfuscated the clone 
//  from the user and produced the exact same result.
fn some_function_2(x: &[String]) -> HashSet<String> {
    x.iter().cloned().collect::<HashSet<_>>()
}

Some of your previous implementations are like some_function_2.

There is a reason std::convert::From notes:

Used to do value-to-value conversions while consuming the input value.

This is effectively mimicking that (it may be helpful to look up move semantics).

Saying:

conversion can usually be done from a reference

Is entirely counter to this point and the design of conversion, if a clone is happening, which it is, the user should know about it (and further you don't need to be particularly concerned about things clone/copying twice, unless it is necessary it will not happen and will almost always be optimized out by the compiler in any possible case).

You should not pass a reference to a conversion in which values are cloned or copied as this gives the user the implicit impression a copy/clone is not occurring when it is. All the methods for converting to nalgebra structures involve cloning/copying all of the given data.

nestordemeure · 2022-02-17T23:12:43Z

I understand your point. There are two aspect to it, the implementation and the user interface:

Implementation

When you call nalgebra::base::DVector::from_iterator(self.len(), self.into_iter()) inside i_into (here) you take the data by value but it will be copied into a continuous buffer by nalgebra (here).
Thus, inputs.clone().i_into() is equivalent to two clones.

Now, if we have a function that takes the data by reference (fn i_clone_into(&self) -> T), gets its flat buffer as a slice (no clone needed) and uses nalgebra's from_row_slice function (or the equivalent column major function) then we do a single clone (the copy done inside nalgebra).

(Yes, the previous ndarray conversion functions didn't do that and did some useless cloning along the .iter().cloned() lines. I should have pointed out that one could remove the cloning from those functions when accepting the relevant PR but it seemed okay at the time.)

User interface

For all versions (the main branch as well as your PR), the conversion functions are used inside friedrich's code. On the API side, this design decision makes no change to the user who passes their value by reference.

Introducing fn i_clone_into(&self) -> T reduces the number of clones being performed inside friedrich while not obscuring the fact that a clone will happen to friedrich's developpers (we are not calling .clone but I believe the name is sufficiently explicit).

The other alternative is to take user input by value eliminating the case for i_clone_into but your PR doesn't do that and I am not sure it is the best way forward (as needing ownership for those inputs seems conterintuitive API wise).

Summary

My point is that introducing i_clone_into make it possible to do a single clone (rather than two) when you need to call .clone().i_into() while not changing the API nor making it obscure to friedrich's developpers that a clone is happening.

Am I missing something?

JonathanWoollett-Light · 2022-02-17T23:28:30Z

If I changed my implementation to implement std::convert::From the logic would be the same. Do you disagree with all libraries that clone into std::convert::From? One of the reason there is no convert from reference is because logically it is not useful.

The point being, it is okay to clone. This is not an unoptimized malloc, clone is ultimately no different than a lifetime when compiled. There will almost never be 2 actual copies of the underlying memory, when you do x.clone() in the underlying compiled optimized code the x identifier is no different to a pointer. I would even suggest (although have not tested this) the compiled machine code would be the exact same.

nestordemeure · 2022-02-17T23:49:31Z

There will almost never be 2 actual copies of the underlying memory,

I am not convinced of that in this particular case. The clone and actual use of the data would happen in two different crates and the datastructure might be non-trivial. I usually trust the compiler but here I would bet on an actual memcopy happening.

But I might be shaving yacks here. The code is fine and, ultimately, even the cost of two clones is small compared to the cost of the Cholesky decomposition.

You can add some tests and I will merge :)

Jonathan added 3 commits February 16, 2022 19:22

Conversion rework

ed095a5

While clones may seem more common I believe they have simply been made more explicit (not hiding inside the respective `into()` method).

Merge remote-tracking branch 'JonathanWoollett-Light/master'

03ae661

readme input table consistent with Input rustdoc

b9fafbc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework to conversion and allowing many input types #49

Rework to conversion and allowing many input types #49

JonathanWoollett-Light commented Feb 16, 2022

nestordemeure commented Feb 16, 2022

nestordemeure commented Feb 16, 2022

JonathanWoollett-Light commented Feb 16, 2022

nestordemeure commented Feb 16, 2022 •

edited

JonathanWoollett-Light commented Feb 16, 2022

JonathanWoollett-Light commented Feb 17, 2022 •

edited

nestordemeure commented Feb 17, 2022 •

edited

JonathanWoollett-Light commented Feb 17, 2022 •

edited

nestordemeure commented Feb 17, 2022 •

edited

JonathanWoollett-Light commented Feb 17, 2022

nestordemeure commented Feb 17, 2022 •

edited

JonathanWoollett-Light commented Feb 17, 2022

nestordemeure commented Feb 17, 2022

Rework to conversion and allowing many input types #49

Are you sure you want to change the base?

Rework to conversion and allowing many input types #49

Conversation

JonathanWoollett-Light commented Feb 16, 2022

nestordemeure commented Feb 16, 2022

nestordemeure commented Feb 16, 2022

JonathanWoollett-Light commented Feb 16, 2022

nestordemeure commented Feb 16, 2022 • edited

JonathanWoollett-Light commented Feb 16, 2022

JonathanWoollett-Light commented Feb 17, 2022 • edited

nestordemeure commented Feb 17, 2022 • edited

JonathanWoollett-Light commented Feb 17, 2022 • edited

nestordemeure commented Feb 17, 2022 • edited

JonathanWoollett-Light commented Feb 17, 2022

nestordemeure commented Feb 17, 2022 • edited

Implementation

User interface

Summary

JonathanWoollett-Light commented Feb 17, 2022

nestordemeure commented Feb 17, 2022

nestordemeure commented Feb 16, 2022 •

edited

JonathanWoollett-Light commented Feb 17, 2022 •

edited

nestordemeure commented Feb 17, 2022 •

edited

JonathanWoollett-Light commented Feb 17, 2022 •

edited

nestordemeure commented Feb 17, 2022 •

edited

nestordemeure commented Feb 17, 2022 •

edited