Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NDArray support #3

Closed
mikera opened this issue Jan 9, 2013 · 13 comments
Closed

NDArray support #3

mikera opened this issue Jan 9, 2013 · 13 comments

Comments

@mikera
Copy link
Owner

mikera commented Jan 9, 2013

Add support for general purpose N-dimensional arrays (like NumPy ndarray)

Features:

  • Allow arbitrary objects (not just numbers)
  • Allow in-place modifications
  • Allow "views" - i.e. slices / subsets of other arrays that can be modified via the view
  • Can be used as 1D / 2D / 3D vectors / matrices / tensors if filled with java.lang.Numbers

Initial implementation stubs:

@mjwillson
Copy link
Collaborator

This looks like a good start, nice one. Something based on Java arrays seems like a good dependency-free baseline implementation. (Fair bit faster than basing it on clojure vectors I'd imagine!)

By the way: does the ^:longs type annotation work for a higher-dimensional arrays? I was under the impression it was only a plain 1D array, and that you might need to annotate with something a bit more clunky like (Class/forName "[[[D") or similar for higher-dimensional array types. Couldn't spot much documentation on it.

@mikera
Copy link
Owner Author

mikera commented Jan 9, 2013

The Java array implementation is necessary for fast immutability more than anything else, so we can do decent default implementation of mutable array operations.

^longs doesn't work for higher dimensional arrays. But for our NDArray we don't care about multi-dimensional Java arrays since we are going to flatten the array into a long 1D array (with variable strides, like NumPy).

@heffalump
Copy link
Contributor

Should this just replace the PersistentVector implementation? Otherwise we may well end up doubling up on a whole load of matrix algorithms internally, to little benefit. The matrix constructor should take nested persistent vectors for ease of use, of course.

@mikera
Copy link
Owner Author

mikera commented Jan 10, 2013

@heffalump: I'm not sure.

Clearly the NDArray has the potential to be a more "serious" implementation. It give us a good base implementation that approximates NumPy style functionality. And it lets us test a "mutable" array implementation.

At the same time the persistent vector implementation is very useful for quick tests / interop with idiomatic Clojure where it is very easy to construct and use vectors. It's also good for testing as an "immutable" array implementation.

I'm hoping that many of the generic implementations of matrix functions can be written in a way that works on both. Consider my primitive trace implementation for example:

    (trace [m]
      (if-not (square? m) (error "Can't compute trace of non-square matrix"))
      (let [dims (long (row-count m))]
        (loop [i 0 res 0.0]
          (if (>= i dims)
            res
            (recur (inc i) (+ res (double (mp/get-2d m i i))))))))

That should work fine on both NDArrays and persistent vectors (and anything else that implements the standard matrix access protocols)

So for the moment I suggest keeping both in... they both have their uses, they are relative standalone and it doesn't seem likely to cost us too much in terms of extra code.

@mikera
Copy link
Owner Author

mikera commented Jan 12, 2013

Worth looking at in this context:

@mjwillson
Copy link
Collaborator

Re flattened array -- ah of course, should've spotted that :) and strided access makes views possible, nice.

@mikera
Copy link
Owner Author

mikera commented Jan 13, 2013

Strided arrays are an awesome trick: probably NumPy's secret weapon in fact.

Some of the things you can do with them:

  • Copy-free array broadcasting (set strides to zero)
  • Copy-free trasposes (permute dimensions / strides)
  • Copy-free submatrices (combine original strides with an offset and reduced dimensions)

The technique originates in computer graphics I think: I certainly remember coding strided image access in assembler during the 80s/90s.... funny to see the same technique being used now for matrix computations!

@mjwillson
Copy link
Collaborator

Another library I spotted for NDArray stuff on the JVM: http://code.google.com/p/array4j/

"a vector, matrix and N-dimensional array library for Java that combines ideas from JAMA, Matrix Toolkits for Java, JScience and NumPy. ... uses JNA to interface with vendor BLAS implementations".

Doesn't seem to've been much activity in the last 5 years or so mind.

@mikera
Copy link
Owner Author

mikera commented Jan 24, 2013

egads..... more of them coming out of the woodwork.

Unless I'm missing something though, this one looks only barely started, e.g. the double array implementation has hardly any code in it:

Still, looks like it would still fit the core.matrix API if it ever got finished :-)

@mjwillson
Copy link
Collaborator

Just starting to look at NDArray support, and I think some terminology tweaks might be a good start: #18

@mikera
Copy link
Owner Author

mikera commented Jan 25, 2013

Good idea on the terminology tweaks. I'll keep this issue open to represent the need for a proper NDArray implementation inside core.matrix itself.

@mikera
Copy link
Owner Author

mikera commented Feb 18, 2013

Have got a basic implementation working in core/maptrix/impl/ndarray.clj as of release 0.2.0

@mikera
Copy link
Owner Author

mikera commented Feb 18, 2013

Closing this issue as it doesn't seem actionable - discussion on further enhancements should move to the google group.

https://groups.google.com/forum/?fromgroups#!forum/numerical-clojure

@mikera mikera closed this as completed Feb 18, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants