Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

impressions #12

Closed
nilsbecker opened this issue Dec 12, 2016 · 8 comments
Closed

impressions #12

nilsbecker opened this issue Dec 12, 2016 · 8 comments

Comments

@nilsbecker
Copy link
Contributor

i played around with owl today - great job on making pulling a docker image easy.

overall, i find it impressively comprehensive already! i have some comments, coming from a scientist with a background in numpy/scipy etc. i list them below. my main concern is that i'm unsure if the design is sufficiently extensible to be a general purpose numerical library.

  • the filter function returns tuples, but setter functions do not accept tuples. would it make sense to abandon curried get i j in favor of get (i,j) ? one would lose partial application but it seems otherwise maybe more consistent?

  • reshape in Ndarray copies - most of the time i would use it to do a flat iteration. is this handled by the N.iter function instead?

  • i did not find an easy way to make 2d Ndarray slices into matrices.

  • it seems a bit Matlab-style backwards to have vectors as 1-row matrices.

  • ideally there would be a seamless interoperability between 1d, 2d, nd arrays, so that 1d slices are in fact of the same type as 1d arrays etc.

  • can one get a Bigarray back from a Ndarray.t?

  • slices with a non-1 stride would be great.

  • also, slices to select subarrays would be great.

(actually i'm not sure if doing all of these numpy things efficiently would required some blocked memory layout for arrays?)

  • in numpy, indexing arrays with boolean arrays is useful. one creates a boolean array by a filtering operation and then picks these elements. how would this be done here? with filter?

  • int and bool arrays can serve useful purposes. is there a way to get these? it seems Bigarray does not provide for such a thing natively...

  • (it seems odd that uniform_int should return actually floats - i would throw this function out)

  • the plural of axis is axes.

@hcarty
Copy link

hcarty commented Dec 12, 2016

the filter function returns tuples, but setter functions do not accept tuples. would it make sense to abandon curried get i j in favor of get (i,j) ? one would lose partial application but it seems otherwise maybe more consistent?

Maybe provide get_tuple or similar instead or ditching get i j entirely?

@ryanrhymes
Copy link
Member

ryanrhymes commented Dec 13, 2016

hi, @nilsbecker thank you very much for trying owl, also a big thank for your comments, they are really useful. I will spend some time in digesting them then come back to you with possible answers @hcarty also thanks for the possible solution, I will think about it :)

Yes, I do acknowledge that many interfaces need to be fine tuned to make owl even easier to use. The overall structure of owl may undertake a lot of changes next year based on the feedback received. At the moment, I am still optimistic about making owl an extensible and high-performance numerical library :)

@ryanrhymes
Copy link
Member

ryanrhymes commented Dec 13, 2016

@nilsbecker for these comments:

  • reshape in Ndarray copies - most of the time i would use it to do a flat iteration. is this handled by the N.iter function instead?

a: there are two functions in ndarray module: flatten and reshape, both are cheap and fast and will not make a copy of the data. or you can use iter function which will iterate the elements one by one anyway, also fast.

  • i did not find an easy way to make 2d Ndarray slices into matrices.

a: this is actually very easy. Ndarray can operate on Bigarray.Genarray.t directly, whereas Dense.Complex and Dense.Real operate on Bigarray.Array2.t directly. You only need to call Bigarray.array2_of_genarray x to transform an ndarray into Array2.t, then you can pass it to Matrix module.

I have provided interfaces like to_ndarray, of_ndarray in both Dense.Real and Dense.Complex as below. This is essentially just one line of code :)

  • Dense.Real.to_ndarray
  • Dense.Real.of_ndarray
  • Dense.Complex.to_ndarray
  • Dense.Complex.of_ndarray
  • can one get a Bigarray back from a Ndarray.t?

a: yes, Ndarray.t is equivalent to Bigarray.Genarray.t

  • ideally there would be a seamless interoperability between 1d, 2d, nd arrays, so that 1d slices are in fact of the same type as 1d arrays etc.

a: i think you can do that in ndarray, why not? it is just Genarray.t, you can certainly create one dimensional ndarray, e.g., Dense.Ndarray.zeros Float64 [|5|];;

@nilsbecker
Copy link
Contributor Author

a: there are two functions in ndarray module: flatten and reshape, both are cheap and fast and will not make a copy of the data. or you can use iter function which will iterate the elements one by one anyway, also fast.

ah, ok. i misunderstood reshape to make a copy. now i see that Dense.Real.reshape is actually the one that makes a copy. then i would suggest maybe renaming one of them to avoid confusion? Is there a way for (2d) matrices to get reshaped without a copy?

i did not find an easy way to make 2d Ndarray slices into matrices.

a: this is actually very easy. Ndarray can operate on Bigarray.Genarray.t directly, whereas Dense.Complex and Dense.Real operate on Bigarray.Array2.t directly. You only need to call Bigarray.array2_of_genarray x to transform an ndarray into Array2.t, then you can pass it to Matrix module.

ah, ok. i didn't catch that the types are actually equal to the Bigarray types. is that in the docs?

I have provided interfaces like to_ndarray, of_ndarray in both Dense.Real and Dense.Complex as below. This is essentially just one line of code :)

Dense.Real.to_ndarray
Dense.Real.of_ndarray
Dense.Complex.to_ndarray
Dense.Complex.of_ndarray

ideally there would be a seamless interoperability between 1d, 2d, nd arrays, so that 1d slices are in fact of the same type as 1d arrays etc.

a: i think you can do that in ndarray, why not? it is just Genarray.t, you can certainly create one dimensional ndarray, e.g., Dense.Ndarray.zeros Float64 [|5|];;

ok, good to know.

another point is: why have separate ndarray and matrix modules at all? i guess it's because linear algebra generally only makes sense on (2d) matrices. that's fair enough, and maybe this does justify the design.

alternatively one could have all the linear algebra functions operate always on the last 2 dimensions of an ndarray by default, and broadcast over the others. an optional argument would allow to specify different dimensions to consider as the matrix to operate on. one would then run into runtime errors when trying that on a 1d ndarray which cannot interpreted as a matrix in a meaningful way. so the simplicity comes at the cost of less static safety.

generally, i'm still not sure what the best matrix or array type system for ocaml would be. one idea that i found intriguing is a matrix type that actually has structural information attached to it. either a parametrized type, like ('a,'b) matrix where 'a might be `Symmetric or `Upper_triangular etc, and 'b Real or Complex. or maybe a record with a field indicating the extra info.

linear algebra functions should be able to dispatch to the most efficient BLAS/LAPACK routine, and the type system would have to know that adding diagonal matrices gives a diagonal matrix, etc, preserving as much structural information as possible. apparently sciruby does something like that. the default for creation functions would then give a general matrix, on which all functions would work with the most general, potentially slower, algorithm, and with minimal user overhead. a user could optionally inform the type system that a matrix is e.g. hermitian, and this information will propagate as far as possible through the type system. in addition to function dispatch, also more efficient storage for structured matrices could be used on the backend, with no user input.

anyway, not sure if such a design is a good idea or if it's feasible but i just though i'd vent my ideas...

@nilsbecker
Copy link
Contributor Author

another comment: i think what would be really useful for shaping the api is (one or a few) small but real use case which really exercises a variety of matrix and array manipulations, including getting data into an appropriate shape and the nitty-gritty that is often required. a numpy and maybe a matlab, julia or fortran implementation should also be available. then one could get a feeling for how well the api works (in the repl and in a program), compared to the state of the art. unfortunately i can't think of something like that right now.

@ryanrhymes
Copy link
Member

ryanrhymes commented Dec 14, 2016

These are really good points, thank you. There is a trade-off between verbosity and efficiency. Certainly the more type information we can get, the better choice we can make in choosing the best blas/lapack functions to call. At the moment, I am leaning to the ease of use if the performance is not bad (actually pretty good in many tests I have done).

However, due to the new Ndarray module which works similar to Bigarray, wherein you need to pass in the type and precision info using Float32, Float64, and etc., Owl's matrix apis don't look very consistent any more. Currently, I am reworking on the Dense.Complex and Dense.Real to unify them in a Dense.Matrix module so that you can also pass in type and precision information to create different types of matrices.

I will update the documents and tutorial accordingly after this work is finished.

@ryanrhymes
Copy link
Member

ryanrhymes commented Dec 14, 2016

another comment: i think what would be really useful for shaping the api is (one or a few) small but real use case which really exercises a variety of matrix and array manipulations, including getting data into an appropriate shape and the nitty-gritty that is often required. a numpy and maybe a matlab, julia or fortran implementation should also be available. then one could get a feeling for how well the api works (in the repl and in a program), compared to the state of the art. unfortunately i can't think of something like that right now.

This is a good idea! Usually, there are a lot of inconsistency and discontinuity in the beginning. Some API will be changed and some will be ditched (which I think is totally fine :). The API should start stabilising later next year when more feedback arrive :)

@nilsbecker
Copy link
Contributor Author

nilsbecker commented Dec 15, 2016 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants