Skip to content

Commit

Permalink
Merge d2c6326 into b8e4a40
Browse files Browse the repository at this point in the history
  • Loading branch information
chewxy committed Feb 8, 2017
2 parents b8e4a40 + d2c6326 commit cf34dbb
Show file tree
Hide file tree
Showing 35 changed files with 670 additions and 1,653 deletions.
108 changes: 87 additions & 21 deletions tensor/README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,97 @@
## Known Bugs ##
Tests:
- Inverse tests fail, and have been disabled (not generated)
- Bugs involving changing from int/uint type to float type and back
- Identity tests for Pow
# Package Tensor #
Package `tensor` is a package that provides efficient, generic (by some definitions of generic) n-dimensional arrays in Go. Also in this package are functions and methods that are used commonly in arithmetic, comparison and linear algebra operations.

The main purpose of this package is to support the operations required by [Gorgonia](https://github.com/chewxy/gorgonia).

## Introduction ##
In the data analysis world, [Numpy](http://http://www.numpy.org/) and [Matlab](https://www.mathworks.com/products/matlab.html) currently reign supreme. Both tools rely heavily on having performant n-dimensional arrays, or tensors. **There is an obvious need for multidimensional arrays in Go**.

While slices are cool, a large majority of scientific and numeric computing work relies heavily on matrices (two-dimensional arrays), three dimensional arrays and so on. In Go, the typical way of getting multidimensional arrays is to use something like `[][]T`. Applications that are more math heavy may opt to use the very excellent Gonum [`matrix` package](https://github.com/gonum/matrix). What then if we want to go beyond having a `float64` matrix? What if we wanted a 3-dimensional `float32` array?

It comes to reason then there should be a data structure that handles these things. The `tensor` package fits in that niche.

### Basic Concepts: Tensor ###
A tensor is a multidimensional array.

With slices, there are usage patterns that are repeated enough that warrant abstraction - `append`, `len`, `cap`, `range` are abstrations used to manipulate and query slices. Additionally slicing operations (`a[:1]` for example) are also abstractions provided by the language. Andrew Gerrand wrote a very good write up on [Go's slice usage and internals](https://blog.golang.org/go-slices-usage-and-internals).

Tensors come with their own set of usage patterns and abstractions. Most of these have analogues in slices, enumerated below (do note that certain slice operation will have more than one tensor analogue - this is due to the number of options available):

| Slice Operation | Tensor Operation |
|:---------------:|:----------------:|
| `len(a)` | `T.Shape()` |
| `cap(a)` | `T.DataSize()` |
| `a[:]` | `T.Slice(...)` |
| `a[0]` | `T.At(x,y)` |
| `append(a, ...)`| `T.Stack(...)`, `T.Concat(...)` |
| `copy(dest, src)`| `T.CopyTo(dest)`, `tensor.Copy(dest, src)` |
| `for _, v := range a` | `for i, err := iterator.Next(); err == nil; i, err = iterator.Next()` |

Some operations for a tensor does not have direct analogues to slice operations. However, they stem from the same idea, and can be considered a superset of all operations common to slices. They're enumerated below:

| Tensor Operation | Basic idea in slices |
|:----------------:|:--------------------:|
|`T.Strides()` | The stride of a slice will always be one element |
|`T.Dims()` | The dimensions of a slice will always be one |
|`T.Size()` | The size of a slice will always be its length |
|`T.Dtype()` | The type of a slice is always known at compile time |
|`T.Reshape()` | Given the shape of a slice is static, you can't really reshape a slice |
|`T.T(...)` / `T.Transpose()` / `T.UT()` | No equivalent with slices |


## The Types of Tensors ##

As of the current revision of this package, only dense tensors are supported. Support for sparse matrix (in form of a sparse column matrix and dictionary of keys matrix) will be coming shortly.


### Dense Tensors ###

The `*Dense` tensor is the primary tensor and is represented by a singular flat array, regardless of dimensions.

Edge Cases:

This fails due to loss of accuracy from conversion:
## Generic Features ##

```go
// identity property of exponentiation: a ^ 1 == a
a := New(WithBacking([]int(1,2,3)))
b, _ := a.PowOf(1) // or a.Pow(New(WithBacking([]{1,1,1})))
t := a.ElemEq(b) // false
Example:

```go

x := New(WithBacking([]string{"hello", "world", "hello", "world"}), WithShape(2,2))
x = New(WithBacking([]int{1,2,3,4}), WithShape(2,2))
```

Large number float operations - inverse of Vector-Scalar ops have not been generated because tests to handle the correctness of weird cases haven't been written
The above code will not cause a compile error, because the structure holding the underlying array (of `string`s and then of `int`s) is a `*Dense`.

One could argue that this sidesteps the compiler's type checking system, deferring it to runtime (which a number of people consider dangerous). However, tools are being developed to type check these things, and until Go does support typechecked generics, unfortunately this will be the way it has to be.


Currently, the tensor package supports limited type of genericity - limited to a tensor of any primitive type.

## How This Package is Developed ##


## Things Knowingly Untested For ##
- Inverse tests fail, and have been disabled (not generated)
- Bugs involving changing from int/uint type to float type and back
- Identity tests for Pow

### Edge Cases: ###

Due to use of `testing/quick`, a number of edge cases were found, and primarily are caused by loss of accuracy. Handling these edge cases is deferred to the user of this package, hence all the edge cases are enumerated here:

1. In `Pow` related functions, there are loss of accuracy issues
This fails due to loss of accuracy from conversion:

```go
// identity property of exponentiation: a ^ 1 == a
a := New(WithBacking([]int(1,2,3)))
b, _ := a.PowOf(1) // or a.Pow(New(WithBacking([]{1,1,1})))
t := a.ElemEq(b) // []bool{false, false, false}
```

2. Large number float operations - inverse of Vector-Scalar ops have not been generated because tests to handle the correctness of weird cases haven't been written

TODO:

* Identity optimizations for op
* Zero value optimizations
* fix SVD tests
* fix Random() - super dodgy


Interesting things:
Memset(0xdeadbeef) -> memset(uintptr(0))
Memset(uintptr(0xdeadbeef)) -> correct!
this is because 0xdeadbeef is interpreted as an int
* fix Random() - super dodgy
26 changes: 19 additions & 7 deletions tensor/ap.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,19 +60,30 @@ func (ap *AP) SetShape(s ...int) {
ap.strides = nil
}
ap.shape = Shape(s).Clone()
ap.strides = ap.shape.CalcStrides()
ap.strides = ap.shape.calcStrides()
}
}

func (ap *AP) Lock() { ap.fin = true }
func (ap *AP) Unlock() { ap.fin = false }
// locking and unlocking is used to ensure that the shape and stride doesn't change (it's not really safe though, as a direct mutation of the strides/shape would still mutate it, but at least the dimensions cannot change)
func (ap *AP) lock() { ap.fin = true }
func (ap *AP) unlock() { ap.fin = false }

func (ap *AP) Shape() Shape { return ap.shape }
// Shape returns the shape of the AP
func (ap *AP) Shape() Shape { return ap.shape }

// Strides returns the strides of the AP
func (ap *AP) Strides() []int { return ap.strides }
func (ap *AP) Dims() int { return ap.shape.Dims() }
func (ap *AP) Size() int { return ap.shape.TotalSize() }

// Dims returns the dimensions of the shape in the AP
func (ap *AP) Dims() int { return ap.shape.Dims() }

// Size returns the expected array size of the shape
func (ap *AP) Size() int { return ap.shape.TotalSize() }

// String implements fmt.Stringer and runtime.Stringer
func (ap *AP) String() string { return fmt.Sprintf("%v", ap) }

// Format implements fmt.Formatter
func (ap *AP) Format(state fmt.State, c rune) {
fmt.Fprintf(state, "Shape: %v, Stride: %v, Lock: %t", ap.shape, ap.strides, ap.fin)
}
Expand Down Expand Up @@ -174,7 +185,7 @@ func (ap *AP) S(size int, slices ...Slice) (newAP *AP, ndStart, ndEnd int, err e
// scalars are a special case
newAP = new(AP)
newAP.SetShape() // make it a Scalar
newAP.Lock()
newAP.lock()
} else {

// drop any dimension with size 1, except the last dimension
Expand Down Expand Up @@ -275,6 +286,7 @@ func TransposeIndex(i int, oldShape, pattern, oldStrides, newStrides []int) int
return index
}

// UntransposeIndex returns the old index given the new index
func UntransposeIndex(i int, oldShape, pattern, oldStrides, newStrides []int) int {
newPattern := make([]int, len(pattern))
for i, p := range pattern {
Expand Down
6 changes: 3 additions & 3 deletions tensor/ap_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -91,12 +91,12 @@ func TestAccessPatternBasics(t *testing.T) {
assert.Equal([]int{6, 2, 1}, ap.Strides())
assert.Equal(12, ap.Size())

ap.Lock()
ap.lock()
ap.SetShape(1, 2, 3)
assert.Equal(Shape{2, 3, 2}, ap.shape)
assert.Equal([]int{6, 2, 1}, ap.strides)

ap.Unlock()
ap.unlock()
ap.SetShape(1, 2)
assert.Equal(Shape{1, 2}, ap.Shape())
assert.Equal([]int{1}, ap.Strides())
Expand Down Expand Up @@ -223,7 +223,7 @@ func TestAccessPatternS(t *testing.T) {
var err error

for _, sts := range sliceTests {
ap = NewAP(sts.shape, sts.shape.CalcStrides())
ap = NewAP(sts.shape, sts.shape.calcStrides())
if apS, ndStart, ndEnd, err = ap.S(sts.shape.TotalSize(), sts.slices...); err != nil {
t.Errorf("%v errored: %v", sts.name, err)
continue
Expand Down
1 change: 1 addition & 0 deletions tensor/api_matop.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ func Concat(axis int, t Tensor, others ...Tensor) (retVal Tensor, err error) {
panic("Unreachable")
}

// Copy copies a tensor to another. For *Dense views, only the relevant slots are copied.
func Copy(dst, src Tensor) error {
switch st := src.(type) {
case *Dense:
Expand Down
5 changes: 3 additions & 2 deletions tensor/api_unary.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import (

// Square squares the elements of the Tensor. This function used to be called PointwiseSquare instead of Square.
// If you want to achieve a Matrix Square as defined:
// A^2 = A · A,
// A^2 = A · A
// You should call this function instead:
// A.MatMul(A)
//
Expand All @@ -37,6 +37,7 @@ func Square(a Tensor, opts ...FuncOpt) (retVal Tensor, err error) {
var ret *Dense
if ret, err = at.Mul(at); err != nil {
err = errors.Wrapf(err, opFail, "Mul")
return
}
return reuse.Add(ret, UseUnsafe())
case toReuse:
Expand Down Expand Up @@ -129,7 +130,7 @@ func Sqrt(a Tensor, opts ...FuncOpt) (retVal Tensor, err error) {
return
}

// InvSqrt calculates 1/sqrt(v) of each element in the *Tensor. Does not support incr option yet
// InvSqrt calculates 1/sqrt(v) of each element in the Tensor.
func InvSqrt(a Tensor, opts ...FuncOpt) (retVal Tensor, err error) {
switch t := a.(type) {
case *Dense:
Expand Down
2 changes: 1 addition & 1 deletion tensor/api_utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import (
"github.com/chewxy/math32"
)

// similar to numpy argsort
// SortIndex is similar to numpy's argsort
// TODO: tidy this up
func SortIndex(in interface{}) (out []int) {
switch list := in.(type) {
Expand Down
43 changes: 14 additions & 29 deletions tensor/dense.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"github.com/pkg/errors"
)

// Dense represents a dense tensor - this is the most common form of tensors. It can be used to represent vectors, matrices.. etc
type Dense struct {
*AP

Expand Down Expand Up @@ -90,14 +91,21 @@ func (t *Dense) fromSlice(x interface{}) {
t.hdr = hdr
}

func (t *Dense) Info() *AP { return t.AP }
// Info returns the accesspattern which explains how the data in the underlying array is accessed. This is mostly used for debugging.
func (t *Dense) Info() *AP { return t.AP }

// Dtype returns the data type of the *Dense tensor.
func (t *Dense) Dtype() Dtype { return t.t }

// Data returns the underlying array. If the *Dense represents a scalar value, the scalar value is returned instead
func (t *Dense) Data() interface{} {
if t.IsScalar() {
return t.Get(0)
}
return t.v
}

// DataSize returns the size of the array. Typically t.DataSize() == t.Shape().TotalSize()
func (t *Dense) DataSize() int {
if t.IsScalar() {
return 0
Expand Down Expand Up @@ -151,30 +159,6 @@ func (t *Dense) IsMaterializable() bool {
return t.viewOf != nil || t.old != nil
}

// // Eq checks that any two things are equal. If the shapes are the same, but the strides are not the same, it's will still be considered the same
// func (t *Dense) Eq(other interface{}) bool {
// if ot, ok := other.(*Dense); ok {
// if ot == t {
// return true
// }

// if ot.len() != t.len() {
// return false
// }

// if !t.Shape().Eq(ot.Shape()) {
// return false
// }

// if t.data != ot.data {
// return false
// }

// return true
// }
// return false
// }

// Clone clones a *Dense. It creates a copy of the data, and the underlying array will be allocated
func (t *Dense) Clone() interface{} {
retVal := recycledDense(t.t, t.Shape().Clone())
Expand All @@ -186,17 +170,17 @@ func (t *Dense) Clone() interface{} {
}

copyDense(retVal, t)
retVal.Lock()
retVal.lock()
return retVal
}

func (t *Dense) cap() int { return t.hdr.Cap }
func (t *Dense) len() int { return t.hdr.Len } // exactly the same as DataSize

func (t *Dense) setShape(s ...int) {
t.Unlock()
t.unlock()
t.SetShape(s...)
t.Lock()
t.lock()
return
}

Expand All @@ -219,7 +203,7 @@ func (t *Dense) fix() {
size := t.Shape().TotalSize()
t.makeArray(size)
}
t.Lock() // don't put this in a defer - if t.data == nil and t.Shape() == nil. then leave it unlocked
t.lock() // don't put this in a defer - if t.data == nil and t.Shape() == nil. then leave it unlocked
}

// sanity is a function that sanity checks that a tensor is correct.
Expand All @@ -237,6 +221,7 @@ func (t *Dense) sanity() error {
return nil
}

// oshape returns the original shape
func (t *Dense) oshape() Shape {
if t.old != nil {
return t.old.Shape()
Expand Down
4 changes: 2 additions & 2 deletions tensor/dense_argmethods.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ GENERATED FILE. DO NOT EDIT

/* Argmax */

// Argmax finds the index of the max value along the axis provided
func (t *Dense) Argmax(axis int) (retVal *Dense, err error) {
if axis == AllAxes {
return t.argmax(nil)
Expand Down Expand Up @@ -39,7 +40,6 @@ func (t *Dense) Argmax(axis int) (retVal *Dense, err error) {
if _, ok := err.(NoOpError); !ok && err != nil {
return
} else if ok {
err = nil // reset errs
newAP = t.AP.Clone()
}
defer ReturnAP(newAP)
Expand Down Expand Up @@ -367,6 +367,7 @@ func (t *Dense) argmax(it *FlatIterator) (retVal *Dense, err error) {

/* Argmin */

// Argmin finds the index of the min value along the axis provided
func (t *Dense) Argmin(axis int) (retVal *Dense, err error) {
if axis == AllAxes {
return t.argmin(nil)
Expand Down Expand Up @@ -394,7 +395,6 @@ func (t *Dense) Argmin(axis int) (retVal *Dense, err error) {
if _, ok := err.(NoOpError); !ok && err != nil {
return
} else if ok {
err = nil // reset errs
newAP = t.AP.Clone()
}
defer ReturnAP(newAP)
Expand Down
2 changes: 1 addition & 1 deletion tensor/dense_argmethods_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ import (
"math"
"testing"

"github.com/chewxy/math32"
"github.com/stretchr/testify/assert"
"github.com/chewxy/math32"
)

/*
Expand Down
Loading

0 comments on commit cf34dbb

Please sign in to comment.