New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: multidimensional slices #6282

Closed
dadkins opened this Issue Aug 29, 2013 · 66 comments

Comments

Projects
None yet
@dadkins

dadkins commented Aug 29, 2013

As per Rob and Andrew's request:
https://groups.google.com/forum/#!topic/golang-nuts/Q7lwBDPmQh4
@kisielk

This comment has been minimized.

Show comment
Hide comment
@kisielk

kisielk Aug 29, 2013

Contributor

Comment 1:

Some more discussion on gonum-dev:
https://groups.google.com/d/topic/gonum-dev/WnptzWjqhmk/discussion
Contributor

kisielk commented Aug 29, 2013

Comment 1:

Some more discussion on gonum-dev:
https://groups.google.com/d/topic/gonum-dev/WnptzWjqhmk/discussion
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Aug 29, 2013

Comment 2 by nagle@animats.com:

There's been some discussion about this.  Main points so far:
1. The general goal is to make it easy to do Matlab/Octave like work in Go.  Most
   engineering math today is prototyped in one of those languages, then translated
   to something faster for production use.   
2. It should be possible to easily convert standard libraries from "Numerical Methods" 
   (the book series, "Numerical Methods in Fortran, ... in C, in C++ etc.") to Go.
   That would get Go a needed set of standard numerical libraries.  And it should
   be reasonably easy to translate Matlab code to Go code. 
3. It's helpful to have only one standard way to represent matrices.  Otherwise,
   you get math libraries where a matrix coming out of one library won't go into
   another.  (This is one of Matlab's selling points.)
4. Performance is a major issue.  These constructs should go fast, approaching
   or exceeding C/FORTRAN performance.  
Language issues:
1. In general, multidimensional array/slice access syntax would look like "arr[i,j]".
2. Multidimensional slices would be supported, derived from multidimensional arrays.
3. While there is interest in generics and overloading, that's probably too radical.
   However, extending "+", "-", and "*" to vectors and matrices might be reasonable.
   Many machines have hardware that can speed up such operations (MMX, etc.) so
   that's something a compiler can do that a library alone cannot.
4. There's interest in "reshaping", where a subarray or sub-slice is derived from
   an array.  This can easily get overcomplicated (it did in FORTRAN 77) and needs
   to be carefully designed.  Further discussion and use cases are needed for this.
The machine learning community within Google might be consulted on what they want in
this area.  The machine learning theorists write in Matlab, and then someone has to
make that work in production code. 
It would be helpful to get agreement on needed functionality before putting too much
effort into syntax, so there are few specific syntax suggestions here yet.

gopherbot commented Aug 29, 2013

Comment 2 by nagle@animats.com:

There's been some discussion about this.  Main points so far:
1. The general goal is to make it easy to do Matlab/Octave like work in Go.  Most
   engineering math today is prototyped in one of those languages, then translated
   to something faster for production use.   
2. It should be possible to easily convert standard libraries from "Numerical Methods" 
   (the book series, "Numerical Methods in Fortran, ... in C, in C++ etc.") to Go.
   That would get Go a needed set of standard numerical libraries.  And it should
   be reasonably easy to translate Matlab code to Go code. 
3. It's helpful to have only one standard way to represent matrices.  Otherwise,
   you get math libraries where a matrix coming out of one library won't go into
   another.  (This is one of Matlab's selling points.)
4. Performance is a major issue.  These constructs should go fast, approaching
   or exceeding C/FORTRAN performance.  
Language issues:
1. In general, multidimensional array/slice access syntax would look like "arr[i,j]".
2. Multidimensional slices would be supported, derived from multidimensional arrays.
3. While there is interest in generics and overloading, that's probably too radical.
   However, extending "+", "-", and "*" to vectors and matrices might be reasonable.
   Many machines have hardware that can speed up such operations (MMX, etc.) so
   that's something a compiler can do that a library alone cannot.
4. There's interest in "reshaping", where a subarray or sub-slice is derived from
   an array.  This can easily get overcomplicated (it did in FORTRAN 77) and needs
   to be carefully designed.  Further discussion and use cases are needed for this.
The machine learning community within Google might be consulted on what they want in
this area.  The machine learning theorists write in Matlab, and then someone has to
make that work in production code. 
It would be helpful to get agreement on needed functionality before putting too much
effort into syntax, so there are few specific syntax suggestions here yet.
@adg

This comment has been minimized.

Show comment
Hide comment
@adg

adg Aug 30, 2013

Contributor

Comment 3:

Thanks for filing the issue.

Labels changed: added feature, languagechange, removed priority-triage.

Status changed to Thinking.

Contributor

adg commented Aug 30, 2013

Comment 3:

Thanks for filing the issue.

Labels changed: added feature, languagechange, removed priority-triage.

Status changed to Thinking.

@dadkins

This comment has been minimized.

Show comment
Hide comment
@dadkins

dadkins Aug 31, 2013

Comment 5:

My desires are modest. I'd like to be able to write something like
// LUPDecompose performs an in-place LU factorization of a square
// matrix A. It returns a permutation vector P, such that PA = LU.
// On completion, A[i][j] = L[i][j] if i > j, U[i][j] if i <= j.
func LUPDecompose(A [][]float64) (P []float64, err error)
When writing this kind of code in C, I used to define a macro
#define A_(i,j) (A[(i)*rowsep + (j)])
That is precisely the kind of thing I wish the compiler would do for me.

dadkins commented Aug 31, 2013

Comment 5:

My desires are modest. I'd like to be able to write something like
// LUPDecompose performs an in-place LU factorization of a square
// matrix A. It returns a permutation vector P, such that PA = LU.
// On completion, A[i][j] = L[i][j] if i > j, U[i][j] if i <= j.
func LUPDecompose(A [][]float64) (P []float64, err error)
When writing this kind of code in C, I used to define a macro
#define A_(i,j) (A[(i)*rowsep + (j)])
That is precisely the kind of thing I wish the compiler would do for me.
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Sep 9, 2013

Comment 6 by gexarcha1:

This document contains a collection of different proposals described in different levels
of detail:
https://docs.google.com/document/d/1gejfouITT25k29eHYTgdvi4cCLVt5dhYiJVvF46gp78/edit
It also contains links to any proposal or information relevant to the topic that we
found interesting. Feel free to edit the Document to clarify any questions relevant to
implementation.
I think most of the people in need of multi dimensional arrays in Go are not used to
writing proposals for programming language changes. If you have any input that would
make the process less awkward it would be highly appreciated.

gopherbot commented Sep 9, 2013

Comment 6 by gexarcha1:

This document contains a collection of different proposals described in different levels
of detail:
https://docs.google.com/document/d/1gejfouITT25k29eHYTgdvi4cCLVt5dhYiJVvF46gp78/edit
It also contains links to any proposal or information relevant to the topic that we
found interesting. Feel free to edit the Document to clarify any questions relevant to
implementation.
I think most of the people in need of multi dimensional arrays in Go are not used to
writing proposals for programming language changes. If you have any input that would
make the process less awkward it would be highly appreciated.
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Sep 10, 2013

Comment 7 by nagle@animats.com:

There's been considerable discussion of this subject on "comp.lang.go.general".
Areas of agreement:
   There seems to be a consensus that multidimensional arrays and slices in Go would be useful for numerical work.
   The array element access syntax generally proposed is 
       arr[n,m]     // 2D
       arr[n,m,o]   // 3D
etc.  Existing arrays of arrays would be retained in Go. Arrays of arrays can be
"ragged",
(rows of different length) but multidimensional arrays are always of uniform length in
each dimension, allowing FORTRAN-type optimizations. 
   Both fixed-size multidimensional array types defined at compile time and arrays sized
at run time have use cases.  The former are widely used for graphics, and the latter are
widely used for general numerical work.  It should be possible to pass both as parameters
to functions, although not necessarily the same functions.
Areas of disagreement:
   There's much disagreement over what facilities for reslicing and reshaping should
be provided. 
   The minimum capability discussed is the ability to pass an entire array or 
multidimensional slice object to a function.  Creating a slice which refers
to a portion of an array, though, is controversial. Syntactically, this refers
to expressions of the form
         arr[a0:a1, b0:b1]
The problem is that values of b0 and b1 which are smaller than the array bounds
result in the need to represent a kind of sparse array, where the memory
distance between adjacent elements is not the same as the size of an element.
This complicates the internal representation of slices; even the lowest dimension
must have a "stride" value.  There's a performance penalty for this feature when it
not being used.  But it is convenient to have.  
   The other big issue is the ability to "grow" multidimensional slices. 
Should "append" be allowed for multidimensional arrays, and if so, how
general should it be?  Is growth allowed in all dimensions?  It's a useful
capability but complicates the implementation considerably. 
   Those are the two biggest issues being discussed.
   Lesser issues include whether operators "+", "-", and "*" should be provided
as built-in operations for multidimensional arrays and slices.  Providing them
permits easy transliteration for math libraries in other languages (C++, Matlab
especially).  But they're not essential; NumPy uses "A.dot(B)" for matrix multiply.
There are a lot of cases to handle for "*" - vector*scalar, vector*matrix, 
matrix*matrix...  "+" and "-" are somewhat simpler.
   That's roughly where the discussions are.  Once the slice and growth issues are
decided, the rest should fall into place.

gopherbot commented Sep 10, 2013

Comment 7 by nagle@animats.com:

There's been considerable discussion of this subject on "comp.lang.go.general".
Areas of agreement:
   There seems to be a consensus that multidimensional arrays and slices in Go would be useful for numerical work.
   The array element access syntax generally proposed is 
       arr[n,m]     // 2D
       arr[n,m,o]   // 3D
etc.  Existing arrays of arrays would be retained in Go. Arrays of arrays can be
"ragged",
(rows of different length) but multidimensional arrays are always of uniform length in
each dimension, allowing FORTRAN-type optimizations. 
   Both fixed-size multidimensional array types defined at compile time and arrays sized
at run time have use cases.  The former are widely used for graphics, and the latter are
widely used for general numerical work.  It should be possible to pass both as parameters
to functions, although not necessarily the same functions.
Areas of disagreement:
   There's much disagreement over what facilities for reslicing and reshaping should
be provided. 
   The minimum capability discussed is the ability to pass an entire array or 
multidimensional slice object to a function.  Creating a slice which refers
to a portion of an array, though, is controversial. Syntactically, this refers
to expressions of the form
         arr[a0:a1, b0:b1]
The problem is that values of b0 and b1 which are smaller than the array bounds
result in the need to represent a kind of sparse array, where the memory
distance between adjacent elements is not the same as the size of an element.
This complicates the internal representation of slices; even the lowest dimension
must have a "stride" value.  There's a performance penalty for this feature when it
not being used.  But it is convenient to have.  
   The other big issue is the ability to "grow" multidimensional slices. 
Should "append" be allowed for multidimensional arrays, and if so, how
general should it be?  Is growth allowed in all dimensions?  It's a useful
capability but complicates the implementation considerably. 
   Those are the two biggest issues being discussed.
   Lesser issues include whether operators "+", "-", and "*" should be provided
as built-in operations for multidimensional arrays and slices.  Providing them
permits easy transliteration for math libraries in other languages (C++, Matlab
especially).  But they're not essential; NumPy uses "A.dot(B)" for matrix multiply.
There are a lot of cases to handle for "*" - vector*scalar, vector*matrix, 
matrix*matrix...  "+" and "-" are somewhat simpler.
   That's roughly where the discussions are.  Once the slice and growth issues are
decided, the rest should fall into place.
@adg

This comment has been minimized.

Show comment
Hide comment
@adg

adg Sep 10, 2013

Contributor

Comment 8:

The discussion referred to by nagle@ is on golang-nuts ("comp.lang.go.general" is just a
name made up by gmane.org):
https://groups.google.com/forum/#!topic/golang-nuts/Q7lwBDPmQh4
Contributor

adg commented Sep 10, 2013

Comment 8:

The discussion referred to by nagle@ is on golang-nuts ("comp.lang.go.general" is just a
name made up by gmane.org):
https://groups.google.com/forum/#!topic/golang-nuts/Q7lwBDPmQh4
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Oct 3, 2013

Comment 9 by norman.yarvin:

My proposal, at
https://docs.google.com/document/d/1xHyOK3hSxwMtyMH-pbXSXxo7Td9A_Teoj6cbr9ECeLM/edit?pli=1#
has now reached a reasonably mature state.  I'd appreciate if people had a look at it. 
It's a rather long proposal, but that's because this is not a small addition to the
language and I've tried to be thorough.  The proposal is purely about the basics of
multidimensional slices; math operations on whole slices are not part of it.  (It does
not preclude later adding them, but does not set up a situation where they'd be
necessary, either.)
As regards comments that people have made so far on the document, please keep in mind
that most of them were made against earlier versions, in which some things were not
explained as well.  Thus if any appear to be foolish that should not be held against the
commenter, or at least not too much.  (Though some comments have been marked as
resolved, I haven't marked them that way just because I disagree and have improved my
explanation, as that seems too dictatorial; I figure people can retire their own
comments if their objections have truly been addressed.)
As a personal comment, it is rather rare to have an opportunity to add a major feature
to a language so cleanly.  Usually one either has to break compatibility or introduce
some serious ugliness or severe compromise.  But here it's almost like Go was designed
for this feature to just be plugged in and almost seamlessly fill a gap in its
capabilities.  Or so I believe; go ahead, hammer on the proposal and prove me wrong.

gopherbot commented Oct 3, 2013

Comment 9 by norman.yarvin:

My proposal, at
https://docs.google.com/document/d/1xHyOK3hSxwMtyMH-pbXSXxo7Td9A_Teoj6cbr9ECeLM/edit?pli=1#
has now reached a reasonably mature state.  I'd appreciate if people had a look at it. 
It's a rather long proposal, but that's because this is not a small addition to the
language and I've tried to be thorough.  The proposal is purely about the basics of
multidimensional slices; math operations on whole slices are not part of it.  (It does
not preclude later adding them, but does not set up a situation where they'd be
necessary, either.)
As regards comments that people have made so far on the document, please keep in mind
that most of them were made against earlier versions, in which some things were not
explained as well.  Thus if any appear to be foolish that should not be held against the
commenter, or at least not too much.  (Though some comments have been marked as
resolved, I haven't marked them that way just because I disagree and have improved my
explanation, as that seems too dictatorial; I figure people can retire their own
comments if their objections have truly been addressed.)
As a personal comment, it is rather rare to have an opportunity to add a major feature
to a language so cleanly.  Usually one either has to break compatibility or introduce
some serious ugliness or severe compromise.  But here it's almost like Go was designed
for this feature to just be plugged in and almost seamlessly fill a gap in its
capabilities.  Or so I believe; go ahead, hammer on the proposal and prove me wrong.
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Oct 17, 2013

Comment 10 by A.Vansteenkiste:

In my opinion, instead of changing the language, it might be useful if the standard
library simply provided a standard matrix format. That should guide the creation of
idiomatic, compatible matrix libraries. Cfr. database/sql, which guided the creation of
excellent database drivers.
The image package already provides nice guidance on how to handle 2D matrices. It uses
contiguous underlying storage which is needed when passing to external libraries. I
modelled a matrix (storage) package after image, this is how it looks like:
http://godoc.org/github.com/barnex/mat. However, many people wrote such kind of packages
with slightly different, incompatible, conventions. Hence it might be nice to have on a
standard convention in the standard library, once and for all.

gopherbot commented Oct 17, 2013

Comment 10 by A.Vansteenkiste:

In my opinion, instead of changing the language, it might be useful if the standard
library simply provided a standard matrix format. That should guide the creation of
idiomatic, compatible matrix libraries. Cfr. database/sql, which guided the creation of
excellent database drivers.
The image package already provides nice guidance on how to handle 2D matrices. It uses
contiguous underlying storage which is needed when passing to external libraries. I
modelled a matrix (storage) package after image, this is how it looks like:
http://godoc.org/github.com/barnex/mat. However, many people wrote such kind of packages
with slightly different, incompatible, conventions. Hence it might be nice to have on a
standard convention in the standard library, once and for all.
@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Nov 27, 2013

Contributor

Comment 11:

Labels changed: added go1.3maybe.

Contributor

rsc commented Nov 27, 2013

Comment 11:

Labels changed: added go1.3maybe.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Nov 27, 2013

Contributor

Comment 12:

Labels changed: removed feature.

Contributor

rsc commented Nov 27, 2013

Comment 12:

Labels changed: removed feature.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Dec 4, 2013

Contributor

Comment 13:

Labels changed: added release-none, removed go1.3maybe.

Contributor

rsc commented Dec 4, 2013

Comment 13:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Dec 4, 2013

Contributor

Comment 14:

Labels changed: added repo-main.

Contributor

rsc commented Dec 4, 2013

Comment 14:

Labels changed: added repo-main.

@btracey

This comment has been minimized.

Show comment
Hide comment
@btracey

btracey May 5, 2014

Contributor

Comment 15:

About a month ago, I put together a proposal for this. Putting this comment as a
reference.
Here is a reduced proposal for multi-dimensional arrays
https://docs.google.com/document/d/1eHm7KqfKP9_s4vR1zToxq-FBazdUQ9ZYi-YhcEtdfR0/edit
Golang-nuts discussion thread
https://groups.google.com/forum/#!searchin/golang-nuts/tables$20in$20go/golang-nuts/osTLUEmB5Gk/3-f9_UKfE9MJ
In the thread, Dan Kortschak proposed a nice syntax for range which could be an addition
to the proposal (https://groups.google.com/d/msg/golang-nuts/osTLUEmB5Gk)/-A15bJpuTzsJ
Contributor

btracey commented May 5, 2014

Comment 15:

About a month ago, I put together a proposal for this. Putting this comment as a
reference.
Here is a reduced proposal for multi-dimensional arrays
https://docs.google.com/document/d/1eHm7KqfKP9_s4vR1zToxq-FBazdUQ9ZYi-YhcEtdfR0/edit
Golang-nuts discussion thread
https://groups.google.com/forum/#!searchin/golang-nuts/tables$20in$20go/golang-nuts/osTLUEmB5Gk/3-f9_UKfE9MJ
In the thread, Dan Kortschak proposed a nice syntax for range which could be an addition
to the proposal (https://groups.google.com/d/msg/golang-nuts/osTLUEmB5Gk)/-A15bJpuTzsJ
@MichaelTJones

This comment has been minimized.

Show comment
Hide comment
@MichaelTJones

MichaelTJones Jun 12, 2014

Contributor

Comment 16:

Related discussion in my go-nuts post:
https://groups.google.com/forum/#!topic/golang-nuts/ScFRRxqHTkY
Contributor

MichaelTJones commented Jun 12, 2014

Comment 16:

Related discussion in my go-nuts post:
https://groups.google.com/forum/#!topic/golang-nuts/ScFRRxqHTkY
@btracey

This comment has been minimized.

Show comment
Hide comment
@btracey

btracey Jun 27, 2014

Contributor

Comment 17:

My proposal has been updated and now includes a proposal for range syntax.
Contributor

btracey commented Jun 27, 2014

Comment 17:

My proposal has been updated and now includes a proposal for range syntax.
@adg

This comment has been minimized.

Show comment
Hide comment
@adg

adg Jun 18, 2015

Contributor

@nigeltao's response to the proposal: https://groups.google.com/d/msg/golang-dev/T2oH4MK5kj8/Kwpe8NfD45IJ

We appreciate the amount of work and care that went into this
proposal, but we're sorry that it's not likely to be adopted, at least
for Go 1.

The problem isn't with the proposal itself. The reason is that the
language is stable: the bar for new language features is high, and is
only getting higher. The "Changes to the language" sections of
http://golang.org/doc/go1.1, http://golang.org/doc/go1.2 and
http://golang.org/doc/go1.3 are getting smaller, not bigger. The
changes that did happen were also minor compared to introducing
tables, which touch make, literals, indexing, slices, len/cap, copy,
range, reflect and the type system in general.

We don't doubt that adding tables to the language has benefits. Any
reasonable language feature has benefits, and the gonum-dev mailing
list's existence clearly shows that it would make some people very
happy. Yet even if tables were part of the language, it's hard to see
any packages in the standard library, or even in the
code.google.com/p/go.foo sub-repositories, that would use them. There
is a chicken-and-egg factor here, but it's still not encouraging for
tables.

The only candidate seems to be the image-related packages, but even
then, we can't change, for instance, the image.RGBA type in the Go 1.x
time frame, and even for a hypothetical Go 2.0, it's not clear that
changing the design of the image package is a win. One of the
motivations for the current design is that, when decoding from or
encoding to formats like GIF or PNG, it's useful to linearize the
rectangle of pixels as a []byte, as spoken by general-purpose
compressors like LZW and ZLIB. Another deliberate design decision,
based on Plan 9 GUI experience, was that the top-left of an image
isn't necessarily at (0, 0).

In any case, debating the proposal's benefits is secondary. To repeat
the main point, we value API and language stability very highly. Yes,
the proposal is backwards-compatible, but it's a feature request, not
a bug fix, and we err on the side of making no changes.

As an alternative, one could define a computational language a la
halide-lang, and write a program that worked with "go generate". This
program would parse the specialized code and generate Go 1.x code
(which possibly uses package unsafe for pointer arithmetic), or
generate C code, or generate 6a-compatible assembly code, or generate
GPU-specific code. Of course, this still requires finding someone to
do the work, but that person or group of people don't have to be
familiar with the runtime and compilers, blocked on Go's release
cycles, or bound by the Go 1 compatibility promise.

Contributor

adg commented Jun 18, 2015

@nigeltao's response to the proposal: https://groups.google.com/d/msg/golang-dev/T2oH4MK5kj8/Kwpe8NfD45IJ

We appreciate the amount of work and care that went into this
proposal, but we're sorry that it's not likely to be adopted, at least
for Go 1.

The problem isn't with the proposal itself. The reason is that the
language is stable: the bar for new language features is high, and is
only getting higher. The "Changes to the language" sections of
http://golang.org/doc/go1.1, http://golang.org/doc/go1.2 and
http://golang.org/doc/go1.3 are getting smaller, not bigger. The
changes that did happen were also minor compared to introducing
tables, which touch make, literals, indexing, slices, len/cap, copy,
range, reflect and the type system in general.

We don't doubt that adding tables to the language has benefits. Any
reasonable language feature has benefits, and the gonum-dev mailing
list's existence clearly shows that it would make some people very
happy. Yet even if tables were part of the language, it's hard to see
any packages in the standard library, or even in the
code.google.com/p/go.foo sub-repositories, that would use them. There
is a chicken-and-egg factor here, but it's still not encouraging for
tables.

The only candidate seems to be the image-related packages, but even
then, we can't change, for instance, the image.RGBA type in the Go 1.x
time frame, and even for a hypothetical Go 2.0, it's not clear that
changing the design of the image package is a win. One of the
motivations for the current design is that, when decoding from or
encoding to formats like GIF or PNG, it's useful to linearize the
rectangle of pixels as a []byte, as spoken by general-purpose
compressors like LZW and ZLIB. Another deliberate design decision,
based on Plan 9 GUI experience, was that the top-left of an image
isn't necessarily at (0, 0).

In any case, debating the proposal's benefits is secondary. To repeat
the main point, we value API and language stability very highly. Yes,
the proposal is backwards-compatible, but it's a feature request, not
a bug fix, and we err on the side of making no changes.

As an alternative, one could define a computational language a la
halide-lang, and write a program that worked with "go generate". This
program would parse the specialized code and generate Go 1.x code
(which possibly uses package unsafe for pointer arithmetic), or
generate C code, or generate 6a-compatible assembly code, or generate
GPU-specific code. Of course, this still requires finding someone to
do the work, but that person or group of people don't have to be
familiar with the runtime and compilers, blocked on Go's release
cycles, or bound by the Go 1 compatibility promise.

@adg adg added Go2 and removed Thinking labels Jun 18, 2015

@btracey

This comment has been minimized.

Show comment
Hide comment
@btracey

btracey Nov 10, 2015

Contributor

Design document uploaded https://go-review.googlesource.com/16801

Contributor

btracey commented Nov 10, 2015

Design document uploaded https://go-review.googlesource.com/16801

@ianlancetaylor

This comment has been minimized.

Show comment
Hide comment
@ianlancetaylor

ianlancetaylor Nov 14, 2015

Contributor

A somewhat different suggestion at #13253.

Contributor

ianlancetaylor commented Nov 14, 2015

A somewhat different suggestion at #13253.

@sbinet

This comment has been minimized.

Show comment
Hide comment
@sbinet

sbinet Aug 24, 2016

Member

@griesemer: I suppose people having dealt with the sawzall -> lingo migration would also have some insights into data science use cases. (I suppose you may have facilities to tap into their brains :P)

before Go, I was into C++ and (C)python. There, numpy is king and builds on top of the "buffer protocol" to provide data interop' (between C-land and CPython, but also between various CPython extension modules, e.g. a SQL module and numpy, numpy and PIL/pillow (python image libraries)) and cheap slice'n'dice operations.

having a ndim-slice like the buffer protocol (so, with strides), without any allocation possible (so w/o cap) but with reshape would fit the bill in image processing, ML, high energy physics and, probably, all science-y data crunching applications.

having said that, between having this proposal implemented and the current status quo, I would go with this proposal (even if to somehow repeat the mistakes of the std::valarray<T> from C++).

Member

sbinet commented Aug 24, 2016

@griesemer: I suppose people having dealt with the sawzall -> lingo migration would also have some insights into data science use cases. (I suppose you may have facilities to tap into their brains :P)

before Go, I was into C++ and (C)python. There, numpy is king and builds on top of the "buffer protocol" to provide data interop' (between C-land and CPython, but also between various CPython extension modules, e.g. a SQL module and numpy, numpy and PIL/pillow (python image libraries)) and cheap slice'n'dice operations.

having a ndim-slice like the buffer protocol (so, with strides), without any allocation possible (so w/o cap) but with reshape would fit the bill in image processing, ML, high energy physics and, probably, all science-y data crunching applications.

having said that, between having this proposal implemented and the current status quo, I would go with this proposal (even if to somehow repeat the mistakes of the std::valarray<T> from C++).

@btracey

This comment has been minimized.

Show comment
Hide comment
@btracey

btracey Aug 24, 2016

Contributor

Thanks for the comments @yiyus . It's a good conversation to have.

It would be difficult to explain such a limitation to someone coming from Python, Fortran or Matlab, for example.

In Matlab, such slicing creates a copy, so the behavior in Go is going to be surprising no matter what.

Could you give some examples in Fortran? I've only worked in old Fortran codebases. In BLAS and LAPACK, for instance, data arrays are contiguous, as in this proposal. Many of the operations do work on column vectors, of course, but that is implemented by explicitly passing the stride and doing the strided indexing manually. See the implementation of Dasum https://github.com/gonum/blas/blob/master/native/level1double.go#L91 or http://www.netlib.org/lapack/explore-html/de/d05/dasum_8f_source.html

Python does have the behavior you propose. Do you know how they deal with it? I suspect that they have to make a copy of the array before passing it to Lapack. I tried to read the source, but I when I got to the call for _makearray, I get lost in the c code for the array function and in wrap = getattr(a, "__array_prepare__", new.__array_wrap__)

I ask because the goals of Go are different than those of python/matlab/julia. Go will never have the syntactic flexibility of these languages. The benefit of Go is having a consistent language across users, and having a language that is possible to implement efficiently. In Matlab, the tradeoff is to allow really easy access to columns, even if that means a silent copy. In python, it's likely okay to sacrifice performance opportunities for syntactic ease -- python is not a fast language in general. Just because it's the right choice for those languages doesn't mean it's the right choice for Go -- the definition of "not a problem" is different.


I agree that we want to be able to perform operations on the data in columns, and I also agree that in some cases we want to do that copy-free. There is a problem with my proposal at the moment, one cannot extract a single column of a table. A few of us offline have been discussing this. The solution is to also add an unshape function that can take a table, and return the linear data.

func unshape([,*]T) ([]T, [N-1]int)

where [,*]T is shorthand for a table of dimension N, and the returned values are the underlying linear data of the table, and the strides in the dimension. With this function, one can extract a specific column of data, copy-free. Updating gonum/matrix, for instance, we would have

type Dense [,]float64

type Vector struct {
    data []float64
    stride int
}

func (d Dense) ColView(i int) Vector{
      data, stride := unshape(d)
      return Vector {
           data: data[i:],
           stride: stride,
     }  
}

The returned vector is a copy-free view on the column.

Personally, I think this meaningfully meets your desirata. You can use a Vector anywhere you "need something that is a vector", and you can use a []T anywhere else.


There are two costs to strided slices. The first is having two types that are basically the same thing. The second is the lost of optimization opportunities. It seems possible to me that strided slices are much harder to optimize than contiguous ones, especially in the area of SIMD. If SIMD optimizations are too difficult to implement, then Go is never going to be competitive with C and Fortran. It seems like there could still be this tension with where to use strided slices and contiguous slices.

I guess I'm playing the role of champion against strided slices. In that role, what I want to see is:

  1. An argument for why the Vector type is insufficient, making strided slices necessary. Such an argument should keep in mind
    a) Assuming inlining, accessing through a Vector is just as expensive as accessing a strided slice
    b) As far as I understand, the storage costs for a strided slice are identical to the storage costs of a Vector

  2. An explanation why a future Go compiler will still be able to implement SIMD operations, even understanding the goals of Go for fast compile times and small binaries

  3. Rules of thumb on when to use a strided slice and when to use a contiguous one. Should the entirety of gonum change? Thus, anytime I would have had a []float64, I should instead have a [']float64 (or whatever the syntax is)? If not... clearly it should be type Vector [']float64. Should gonum/floats use [']float64 or []float64? Should there also be a duplicate "stridedfloats" package? What about the weights of a categorical random variable (https://github.com/gonum/stat/blob/master/distuv/categorical.go#L15)?


@griesemer Your post came in the middle of me writing mine. Should I update the proposal to add unshape operation? I think it's necessary for the proposal to be useful.

Contributor

btracey commented Aug 24, 2016

Thanks for the comments @yiyus . It's a good conversation to have.

It would be difficult to explain such a limitation to someone coming from Python, Fortran or Matlab, for example.

In Matlab, such slicing creates a copy, so the behavior in Go is going to be surprising no matter what.

Could you give some examples in Fortran? I've only worked in old Fortran codebases. In BLAS and LAPACK, for instance, data arrays are contiguous, as in this proposal. Many of the operations do work on column vectors, of course, but that is implemented by explicitly passing the stride and doing the strided indexing manually. See the implementation of Dasum https://github.com/gonum/blas/blob/master/native/level1double.go#L91 or http://www.netlib.org/lapack/explore-html/de/d05/dasum_8f_source.html

Python does have the behavior you propose. Do you know how they deal with it? I suspect that they have to make a copy of the array before passing it to Lapack. I tried to read the source, but I when I got to the call for _makearray, I get lost in the c code for the array function and in wrap = getattr(a, "__array_prepare__", new.__array_wrap__)

I ask because the goals of Go are different than those of python/matlab/julia. Go will never have the syntactic flexibility of these languages. The benefit of Go is having a consistent language across users, and having a language that is possible to implement efficiently. In Matlab, the tradeoff is to allow really easy access to columns, even if that means a silent copy. In python, it's likely okay to sacrifice performance opportunities for syntactic ease -- python is not a fast language in general. Just because it's the right choice for those languages doesn't mean it's the right choice for Go -- the definition of "not a problem" is different.


I agree that we want to be able to perform operations on the data in columns, and I also agree that in some cases we want to do that copy-free. There is a problem with my proposal at the moment, one cannot extract a single column of a table. A few of us offline have been discussing this. The solution is to also add an unshape function that can take a table, and return the linear data.

func unshape([,*]T) ([]T, [N-1]int)

where [,*]T is shorthand for a table of dimension N, and the returned values are the underlying linear data of the table, and the strides in the dimension. With this function, one can extract a specific column of data, copy-free. Updating gonum/matrix, for instance, we would have

type Dense [,]float64

type Vector struct {
    data []float64
    stride int
}

func (d Dense) ColView(i int) Vector{
      data, stride := unshape(d)
      return Vector {
           data: data[i:],
           stride: stride,
     }  
}

The returned vector is a copy-free view on the column.

Personally, I think this meaningfully meets your desirata. You can use a Vector anywhere you "need something that is a vector", and you can use a []T anywhere else.


There are two costs to strided slices. The first is having two types that are basically the same thing. The second is the lost of optimization opportunities. It seems possible to me that strided slices are much harder to optimize than contiguous ones, especially in the area of SIMD. If SIMD optimizations are too difficult to implement, then Go is never going to be competitive with C and Fortran. It seems like there could still be this tension with where to use strided slices and contiguous slices.

I guess I'm playing the role of champion against strided slices. In that role, what I want to see is:

  1. An argument for why the Vector type is insufficient, making strided slices necessary. Such an argument should keep in mind
    a) Assuming inlining, accessing through a Vector is just as expensive as accessing a strided slice
    b) As far as I understand, the storage costs for a strided slice are identical to the storage costs of a Vector

  2. An explanation why a future Go compiler will still be able to implement SIMD operations, even understanding the goals of Go for fast compile times and small binaries

  3. Rules of thumb on when to use a strided slice and when to use a contiguous one. Should the entirety of gonum change? Thus, anytime I would have had a []float64, I should instead have a [']float64 (or whatever the syntax is)? If not... clearly it should be type Vector [']float64. Should gonum/floats use [']float64 or []float64? Should there also be a duplicate "stridedfloats" package? What about the weights of a categorical random variable (https://github.com/gonum/stat/blob/master/distuv/categorical.go#L15)?


@griesemer Your post came in the middle of me writing mine. Should I update the proposal to add unshape operation? I think it's necessary for the proposal to be useful.

@sbinet

This comment has been minimized.

Show comment
Hide comment
@sbinet

sbinet Aug 24, 2016

Member

@btracey no, numpy.array has all the informations to pass to lapack to hand it a pointer to C-array of floats with strides, row/col-major and rank informations. that's the buffer protocol I was talking about.
(it also has knobs for r/w access and ref-counting but that's only relevant to the CPython implementation)

see: https://docs.python.org/3/c-api/buffer.html
(the numpy.array was the champion for this protocol (during the python2 era) that it, of course, also implements in python3)

wrt SIMD, even with the current proposal we might have some issues with alignment (especially when you sub-slice an n-dim slice (strided or not)).
SIMD has support for scatter/gather operations anyways. They are of course less efficient than in the case of completely adjacent data, but for the alignment issues evocated above, I'd say you would want to copy your data anyways to be sure to use the SIMD instructions. (in effect, that's what you would do to tap into GPUs and their remote device memory)

Member

sbinet commented Aug 24, 2016

@btracey no, numpy.array has all the informations to pass to lapack to hand it a pointer to C-array of floats with strides, row/col-major and rank informations. that's the buffer protocol I was talking about.
(it also has knobs for r/w access and ref-counting but that's only relevant to the CPython implementation)

see: https://docs.python.org/3/c-api/buffer.html
(the numpy.array was the champion for this protocol (during the python2 era) that it, of course, also implements in python3)

wrt SIMD, even with the current proposal we might have some issues with alignment (especially when you sub-slice an n-dim slice (strided or not)).
SIMD has support for scatter/gather operations anyways. They are of course less efficient than in the case of completely adjacent data, but for the alignment issues evocated above, I'd say you would want to copy your data anyways to be sure to use the SIMD instructions. (in effect, that's what you would do to tap into GPUs and their remote device memory)

@yiyus

This comment has been minimized.

Show comment
Hide comment
@yiyus

yiyus Aug 24, 2016

@btracey Answering your question, since Fortran 90 it is possible to take rows from a matrix with a similar syntax as the one used by Python, as for example in: row = matrix(1,:). Fortran uses column major ordering, so col = matrix(:,1) would just be a sub-array of stride 1 (every array has a stride in Fortran).

Also, I do not think that supporting strided slices would suppose a problem for optimization, on the contrary. If they are not supported, some workaround will be needed to extract columns from a matrix, but this will disallow applying optimizations with SIMD instructions that support strided loads.

yiyus commented Aug 24, 2016

@btracey Answering your question, since Fortran 90 it is possible to take rows from a matrix with a similar syntax as the one used by Python, as for example in: row = matrix(1,:). Fortran uses column major ordering, so col = matrix(:,1) would just be a sub-array of stride 1 (every array has a stride in Fortran).

Also, I do not think that supporting strided slices would suppose a problem for optimization, on the contrary. If they are not supported, some workaround will be needed to extract columns from a matrix, but this will disallow applying optimizations with SIMD instructions that support strided loads.

@btracey

This comment has been minimized.

Show comment
Hide comment
@btracey

btracey Aug 24, 2016

Contributor

@sbinet it has the information to pass to Lapack, but I'm pretty sure it has to copy the data before passing to Lapack if the inner stride is not 1.
From http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html
"Most third-party libraries expect contiguous arrays. But, often it is not difficult to support general-purpose striding. I encourage you to use the striding information in your own code whenever possible, and reserve single-segment requirements for wrapping third-party code. Using the striding information provided with the ndarray rather than requiring a contiguous striding reduces copying that otherwise must be made."

Contributor

btracey commented Aug 24, 2016

@sbinet it has the information to pass to Lapack, but I'm pretty sure it has to copy the data before passing to Lapack if the inner stride is not 1.
From http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html
"Most third-party libraries expect contiguous arrays. But, often it is not difficult to support general-purpose striding. I encourage you to use the striding information in your own code whenever possible, and reserve single-segment requirements for wrapping third-party code. Using the striding information provided with the ndarray rather than requiring a contiguous striding reduces copying that otherwise must be made."

@yiyus

This comment has been minimized.

Show comment
Hide comment
@yiyus

yiyus Aug 24, 2016

To add to the discussion, I have cleaned up a bit my old proposal for "shaped slices":
https://docs.google.com/document/d/1IvvkX60AMObA11CB6Gcc3xxG14VayGqBOjfpwg3qIRY/edit?usp=sharing

It is not really a counter proposal (it is far less mature than the table proposal, and the syntax chosen or other details may not be the best options, or even possible). I just want to show how I think that everything essential to implement multidimensional slices in what I consider a useful form is a new type, a reshape operation and multi-dimensional slicing/indexing.

Please, if you have any specific comment, add them directly in the document, to avoid hijacking this issue.

yiyus commented Aug 24, 2016

To add to the discussion, I have cleaned up a bit my old proposal for "shaped slices":
https://docs.google.com/document/d/1IvvkX60AMObA11CB6Gcc3xxG14VayGqBOjfpwg3qIRY/edit?usp=sharing

It is not really a counter proposal (it is far less mature than the table proposal, and the syntax chosen or other details may not be the best options, or even possible). I just want to show how I think that everything essential to implement multidimensional slices in what I consider a useful form is a new type, a reshape operation and multi-dimensional slicing/indexing.

Please, if you have any specific comment, add them directly in the document, to avoid hijacking this issue.

@btracey

This comment has been minimized.

Show comment
Hide comment
@btracey

btracey Aug 25, 2016

Contributor

I updated my proposal with the unpack built-in described above, and added some code examples of operations that are very similar to strided slices. I reworded the criticism of strided-slice based proposals.

Current PR still at https://go-review.googlesource.com/#/c/25180/

Contributor

btracey commented Aug 25, 2016

I updated my proposal with the unpack built-in described above, and added some code examples of operations that are very similar to strided slices. I reworded the criticism of strided-slice based proposals.

Current PR still at https://go-review.googlesource.com/#/c/25180/

@sbinet

This comment has been minimized.

Show comment
Hide comment
@sbinet

sbinet Aug 27, 2016

Member

with the new unshape builtin, I must say it's starting to become a bit crowded.

I am tempted to bring back my original suggestion (somewhere on gonum-dev):

slice := make([*]int, 2, 3, 4)      // a 2x3x4 n-dim slice
sub1 := slice[:,1,:]                // a 2x4 n-dim slice
sub2 := reshape(slice, 6, 4)[:2,:3] // a 2x3 n-dim slice

for i, v := range slice {
    for j, u := range v {
        for k := range w {
            fmt.Printf("slice[%d,%d,%d] = %v\n", slice[i,j,k])
        }
    }
}
  • [*]T is a strided-ndim slice whose elements have type T.
  • len(slice) returns a slice of length equal to the number of dimensions of the ndim-slice (ie: []int{2,3,4} for the above slice ndim-slice)
  • a ndim-slice can not be appended to, and has a fixed capacity, equal to its length

at the reflect level, an ndim-slice could be represented like:

type NdSliceHeader struct {
    Data   unsafe.Pointer
    Len    []int
    Stride []int
}

nd-slice literals

v := [*]int{1,2,3,4} // a 1x4 nd-slice
u := reshape([*]int{1,2,3,4}, 2, 2) // a 2x2 nd-slice
w := reshape([*]int{1,2,3,4}, 2, 1, 2} // a 2x1x2 nd-slice

// reshaping a slice is also allowed and creates an nd-slice
v := reshape([]int{1,2,3,4}, 2, 2) // a 2x2 nd-slice
// perhaps also this conversion could be allowed, like string/[]byte:
v := [*]int([]int{1,2,3,4}) // a 1x4 nd-slice

copy

src := make([*]int, 2, 3)
dst := make([*]int, 4, 3)
// copy returns the number of elements copied in each dimension
n := copy(dst[:2,:], src[:1,1:]) // n == []int{1, 2}

wrt to the non-strided proposal, you get the ability to slice in all dimensions.
you loose the compile-time check of rank (ie number of dimensions) of an ndim-slice, so you could imagine a situation where you'd pass a 2d ndim-slice to a functions expecting a 3d one.
and you probably loose a bit in random-access to elements because you need to multiply by the stride in each dimension (and fetch those strides).

Member

sbinet commented Aug 27, 2016

with the new unshape builtin, I must say it's starting to become a bit crowded.

I am tempted to bring back my original suggestion (somewhere on gonum-dev):

slice := make([*]int, 2, 3, 4)      // a 2x3x4 n-dim slice
sub1 := slice[:,1,:]                // a 2x4 n-dim slice
sub2 := reshape(slice, 6, 4)[:2,:3] // a 2x3 n-dim slice

for i, v := range slice {
    for j, u := range v {
        for k := range w {
            fmt.Printf("slice[%d,%d,%d] = %v\n", slice[i,j,k])
        }
    }
}
  • [*]T is a strided-ndim slice whose elements have type T.
  • len(slice) returns a slice of length equal to the number of dimensions of the ndim-slice (ie: []int{2,3,4} for the above slice ndim-slice)
  • a ndim-slice can not be appended to, and has a fixed capacity, equal to its length

at the reflect level, an ndim-slice could be represented like:

type NdSliceHeader struct {
    Data   unsafe.Pointer
    Len    []int
    Stride []int
}

nd-slice literals

v := [*]int{1,2,3,4} // a 1x4 nd-slice
u := reshape([*]int{1,2,3,4}, 2, 2) // a 2x2 nd-slice
w := reshape([*]int{1,2,3,4}, 2, 1, 2} // a 2x1x2 nd-slice

// reshaping a slice is also allowed and creates an nd-slice
v := reshape([]int{1,2,3,4}, 2, 2) // a 2x2 nd-slice
// perhaps also this conversion could be allowed, like string/[]byte:
v := [*]int([]int{1,2,3,4}) // a 1x4 nd-slice

copy

src := make([*]int, 2, 3)
dst := make([*]int, 4, 3)
// copy returns the number of elements copied in each dimension
n := copy(dst[:2,:], src[:1,1:]) // n == []int{1, 2}

wrt to the non-strided proposal, you get the ability to slice in all dimensions.
you loose the compile-time check of rank (ie number of dimensions) of an ndim-slice, so you could imagine a situation where you'd pass a 2d ndim-slice to a functions expecting a 3d one.
and you probably loose a bit in random-access to elements because you need to multiply by the stride in each dimension (and fetch those strides).

@btracey

This comment has been minimized.

Show comment
Hide comment
@btracey

btracey Aug 27, 2016

Contributor

This is similar in spirit to @yiyus proposal. I don't think it actually gets you around unshape though, unless you want to forbid people from ever getting the underlying slice without using unsafe. One important usage is passing data to C, I.e. Lapack and other programs. Also note that with strided slices one needs to allocate and make a copy before lapqck (and others) since they require contiguous data.

Contributor

btracey commented Aug 27, 2016

This is similar in spirit to @yiyus proposal. I don't think it actually gets you around unshape though, unless you want to forbid people from ever getting the underlying slice without using unsafe. One important usage is passing data to C, I.e. Lapack and other programs. Also note that with strided slices one needs to allocate and make a copy before lapqck (and others) since they require contiguous data.

@sbinet

This comment has been minimized.

Show comment
Hide comment
@sbinet

sbinet Aug 28, 2016

Member

wrt using unsafe: that's already the case for a slice and its array. you can't go back at the underlying array of a slice.
I see "my" nd-slice and its non-reshaped slice as the same pair than a slice and its underlying array.

if/when a mechanism is devised to get (safely) the underlying array from a slice, I suppose it could be transposed to the nd-slice/slice pair.
but, to pass data to C, you need to use unsafe in some way, so...

Member

sbinet commented Aug 28, 2016

wrt using unsafe: that's already the case for a slice and its array. you can't go back at the underlying array of a slice.
I see "my" nd-slice and its non-reshaped slice as the same pair than a slice and its underlying array.

if/when a mechanism is devised to get (safely) the underlying array from a slice, I suppose it could be transposed to the nd-slice/slice pair.
but, to pass data to C, you need to use unsafe in some way, so...

@sbinet

This comment has been minimized.

Show comment
Hide comment
@sbinet

sbinet Aug 28, 2016

Member

ah! one thing I forgot to mention in my nd-slice post: slicing a nd-slice, modifying its stride.
as a nd-slice's capacity can't be modified, we can use the 3-index slice as a way to specify the stride when extracting a sub-nd-slice:

v := reshape([*]int{1,2,3,4,5,6}, 2, 3) // a 2x3 nd-slice
// 1 2 3
// 4 5 6
u := v[:, 0:3:2] // a 2x2 nd-slice, taking one column every two
// 1 3
// 4 6
w := v[::, ::2] // == u
Member

sbinet commented Aug 28, 2016

ah! one thing I forgot to mention in my nd-slice post: slicing a nd-slice, modifying its stride.
as a nd-slice's capacity can't be modified, we can use the 3-index slice as a way to specify the stride when extracting a sub-nd-slice:

v := reshape([*]int{1,2,3,4,5,6}, 2, 3) // a 2x3 nd-slice
// 1 2 3
// 4 5 6
u := v[:, 0:3:2] // a 2x2 nd-slice, taking one column every two
// 1 3
// 4 6
w := v[::, ::2] // == u
@sbinet

This comment has been minimized.

Show comment
Hide comment
@sbinet

sbinet Sep 10, 2016

Member

looking at how tensorflow wraps around its C api is interesting and a valuable data point, /me thinks:
https://github.com/tensorflow/tensorflow/blob/0e5d49d362a5ef72179e385d1f71ec29ab0392f6/tensorflow/go/tensor.go

Member

sbinet commented Sep 10, 2016

looking at how tensorflow wraps around its C api is interesting and a valuable data point, /me thinks:
https://github.com/tensorflow/tensorflow/blob/0e5d49d362a5ef72179e385d1f71ec29ab0392f6/tensorflow/go/tensor.go

gopherbot pushed a commit to golang/proposal that referenced this issue Nov 18, 2016

proposal: rename tables, update range, rename down-slicing, add unpack
This PR incorporates the larger structural changes suggested in the previous round of review. It does four things
1) Renames tables to just slices.
2) Recasts down-slicing as just a specific form of indexing.
3) Changes the behavior of range to be much simpler, and to follow from the idea of indexing.
4) Change the behavior of reshape to allow data structures with different numbers of elements
5) Add the unpack language built-in
6) Reworks the discussion of the performance of the single struct representation
It also cleans up a number of minor language issues

Updates golang/go#6282.

Change-Id: Iabb905c089c6d41195ca3a3602157cd49af7acd1
Reviewed-on: https://go-review.googlesource.com/25180
Reviewed-by: Robert Griesemer <gri@golang.org>
@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Jan 18, 2017

Contributor

To make progress with this proposal with the goal to come to a decision eventually, here's an abbreviated summary of the discussion so far.

Primary goals

First of all, there are some overarching goals that this proposals attempts to achieve. To name a few (borrowing from #6282 (comment), and #6282 (comment)):

  • It should be straight-forward to write typical numerical algorithms in Go in a natural way.
  • There should be a “standard” mechanism to represent multi-dimensional slices/matrices in Go (one standard way to represent matrices, for instance).
  • Such algorithms should be implementable in a reasonably efficient manner, with a performance coming close to a typical C implementation.

Virtually everybody also seems to be in agreement with respect to indexing notation and memory layout:

  • Slice/vector/matrix elements should be accessed via the familiar indexing notation, suitably extended to multiple dimensions: v[i], m[i, j], t[i, j, k], etc.

  • A multi-dimensional slice or matrix must be laid out contiguously in memory, without pointers to sub-slices. Successive slice elements may not be adjacent in memory if they have a "stride" that is > 1 (where 1 is the size of a slice element).

Proposed design

@btracey, with input from the gonum team, has spent considerable effort coming up with a concrete design for multi-dimensional slices with the intent to address the goals of this proposal: https://github.com/golang/proposal/blob/master/design/6282-table-data.md .
Thank you @btracey , and the gonum team for your significant effort!

The above design addresses many of the desired goals of this proposal and, as a significant plus, the proposed multi-dimensional slices are in many ways a natural extension of the one-dim. slices we already have in Go.

Problem areas

The single-biggest issue with the proposal is that one-dim. slices don’t have a stride exactly because multi-dim. slices gracefully degrade into Go’s existing slices. This problem has been pointed out as early as #6282 (comment), before a concrete design document was posted. The design document addresses this issue with various work-arounds.

As a concrete example, given a two-dim. slice m representing a matrix [,]float64, with the proposed design it is easy and efficient (no copy of matrix elements involved) to select a matrix row i as a sub-slice m[i, :] but it is impossible to select a matrix column j that way (m[:, j] is invalid). In other words, indexing is asymmetric, and the asymmetry will lead to special treatment in algorithms (columns must be explicitly copied element-wise).

To support various "reshaping" operations for multi-dim. slices, the design proposes operations such as reshape, and unshape. @sbinet points out that the design doc has become a bit crowded in terms of additional predeclared functions ( #6282 (comment) ).

Alternatives

Several alternative proposals have been floated as well:

  • #6282 (comment) proposes that the std library simply define a standard Matrix format (and perhaps Vectors, etc.).

  • #6282 (comment) proposes a “shaped slice” type for Go which is similar to multi-dimensional slices but supports strides in each dimension, and consequently doesn’t gracefully degrade into a regular (unstrided) slice in the one-dim. case.

  • https://talks.golang.org/2016/prototype-your-design.pdf discusses as an example for that talk the implementation of a rewriter that can automatically rewrite indexing expressions of the form a[i, j, k] and a[i, j, k] = x into method calls a.At(i, j, k), and a.Set(i, j, k, x) respectively. While this approach does not extend the language per se, it permits writing numerical algorithm using "nice" notation which is then automatically translated into regular Go. It also has the advantage of providing full control over the underlying implementation of multi-dim. slices. A complete prototype implementation can be found in https://github.com/griesemer/dotGo2016).

Summary

The proposed design of multi-dim. slices is a natural extension of Go's existing one-dim. slices. From a language point of view, the design appears backward-compatible with Go 1; and it does address many goals of the proposal. That said, the asymmetry of indexing operations requires non-obvious work-arounds when implementing numerical algorithms which runs counter one of the primary goals ( #6282 (comment) ) of this proposal.

Contributor

griesemer commented Jan 18, 2017

To make progress with this proposal with the goal to come to a decision eventually, here's an abbreviated summary of the discussion so far.

Primary goals

First of all, there are some overarching goals that this proposals attempts to achieve. To name a few (borrowing from #6282 (comment), and #6282 (comment)):

  • It should be straight-forward to write typical numerical algorithms in Go in a natural way.
  • There should be a “standard” mechanism to represent multi-dimensional slices/matrices in Go (one standard way to represent matrices, for instance).
  • Such algorithms should be implementable in a reasonably efficient manner, with a performance coming close to a typical C implementation.

Virtually everybody also seems to be in agreement with respect to indexing notation and memory layout:

  • Slice/vector/matrix elements should be accessed via the familiar indexing notation, suitably extended to multiple dimensions: v[i], m[i, j], t[i, j, k], etc.

  • A multi-dimensional slice or matrix must be laid out contiguously in memory, without pointers to sub-slices. Successive slice elements may not be adjacent in memory if they have a "stride" that is > 1 (where 1 is the size of a slice element).

Proposed design

@btracey, with input from the gonum team, has spent considerable effort coming up with a concrete design for multi-dimensional slices with the intent to address the goals of this proposal: https://github.com/golang/proposal/blob/master/design/6282-table-data.md .
Thank you @btracey , and the gonum team for your significant effort!

The above design addresses many of the desired goals of this proposal and, as a significant plus, the proposed multi-dimensional slices are in many ways a natural extension of the one-dim. slices we already have in Go.

Problem areas

The single-biggest issue with the proposal is that one-dim. slices don’t have a stride exactly because multi-dim. slices gracefully degrade into Go’s existing slices. This problem has been pointed out as early as #6282 (comment), before a concrete design document was posted. The design document addresses this issue with various work-arounds.

As a concrete example, given a two-dim. slice m representing a matrix [,]float64, with the proposed design it is easy and efficient (no copy of matrix elements involved) to select a matrix row i as a sub-slice m[i, :] but it is impossible to select a matrix column j that way (m[:, j] is invalid). In other words, indexing is asymmetric, and the asymmetry will lead to special treatment in algorithms (columns must be explicitly copied element-wise).

To support various "reshaping" operations for multi-dim. slices, the design proposes operations such as reshape, and unshape. @sbinet points out that the design doc has become a bit crowded in terms of additional predeclared functions ( #6282 (comment) ).

Alternatives

Several alternative proposals have been floated as well:

  • #6282 (comment) proposes that the std library simply define a standard Matrix format (and perhaps Vectors, etc.).

  • #6282 (comment) proposes a “shaped slice” type for Go which is similar to multi-dimensional slices but supports strides in each dimension, and consequently doesn’t gracefully degrade into a regular (unstrided) slice in the one-dim. case.

  • https://talks.golang.org/2016/prototype-your-design.pdf discusses as an example for that talk the implementation of a rewriter that can automatically rewrite indexing expressions of the form a[i, j, k] and a[i, j, k] = x into method calls a.At(i, j, k), and a.Set(i, j, k, x) respectively. While this approach does not extend the language per se, it permits writing numerical algorithm using "nice" notation which is then automatically translated into regular Go. It also has the advantage of providing full control over the underlying implementation of multi-dim. slices. A complete prototype implementation can be found in https://github.com/griesemer/dotGo2016).

Summary

The proposed design of multi-dim. slices is a natural extension of Go's existing one-dim. slices. From a language point of view, the design appears backward-compatible with Go 1; and it does address many goals of the proposal. That said, the asymmetry of indexing operations requires non-obvious work-arounds when implementing numerical algorithms which runs counter one of the primary goals ( #6282 (comment) ) of this proposal.

@btracey

This comment has been minimized.

Show comment
Hide comment
@btracey

btracey Jan 19, 2017

Contributor

I agree with the summary above. Two quick comments (especially for those thinking of making a new proposal)

  • I think any proposal that seeks to extend Go slices will have the same downsides as my proposal. The asymmetry is fundamental, but also I think it is difficult to reduce the scope of the changes without harming the benefits brought by the change. As a simple example, removing unshape and reshape makes it harder to interface multi-dim slices with data streams and C code
  • Another alternative suggestion is to modify Go to allow index operator methods. This would be similar to the talk by @griesemer, except an actual change to the Go spec, and not just a rewriter program.

Thanks to @griesemer for the significant effort invested in this issue.

Contributor

btracey commented Jan 19, 2017

I agree with the summary above. Two quick comments (especially for those thinking of making a new proposal)

  • I think any proposal that seeks to extend Go slices will have the same downsides as my proposal. The asymmetry is fundamental, but also I think it is difficult to reduce the scope of the changes without harming the benefits brought by the change. As a simple example, removing unshape and reshape makes it harder to interface multi-dim slices with data streams and C code
  • Another alternative suggestion is to modify Go to allow index operator methods. This would be similar to the talk by @griesemer, except an actual change to the Go spec, and not just a rewriter program.

Thanks to @griesemer for the significant effort invested in this issue.

@j6k4m8

This comment has been minimized.

Show comment
Hide comment
@j6k4m8

j6k4m8 Jan 19, 2017

Thank you @griesemer — a really good summary of the challenges and benefits.

To elaborate on my 👍 to @btracey's comment above, I especially want to get behind index-operator methods: I suspect that moving index-operator syntax from rewriter-land to native Go would buy considerable power, and would allow libraries to handle more of the heavy-lifting when our implementation preferences diverge (e.g. axis-reordering operations can exist in a library, and needn't exist in native implementation).

j6k4m8 commented Jan 19, 2017

Thank you @griesemer — a really good summary of the challenges and benefits.

To elaborate on my 👍 to @btracey's comment above, I especially want to get behind index-operator methods: I suspect that moving index-operator syntax from rewriter-land to native Go would buy considerable power, and would allow libraries to handle more of the heavy-lifting when our implementation preferences diverge (e.g. axis-reordering operations can exist in a library, and needn't exist in native implementation).

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Jan 24, 2017

Contributor

This proposal addresses a large portion of the originally stated goal: better Go support for numerical applications. However, as also has become quite clear, it falls short in others (asymmetry of proposed solution, potential proliferation of builtins). Judging from the feedback received on this issue, there is no clear consensus that the shortcomings can be safely ignored.

Adding multi-dim. slices to the existing Go language would be a significant engineering effort. It seems unwise to make this effort with full knowledge of the proposal's inherent problems. Furthermore, exactly because the proposed solution ties so closely into the existing language, it would be nearly impossible to change or adjust the design down the road.

Thus, after repeated and careful consideration, we are going to decline this proposal.

That said, the discussions so far have been extremely helpful in delineating the problem domain, identifying key issues, and for identifying possible alternative approaches.

We suggest that this discussion continue with the intent to come up with a new and improved design (possibly along the lines of one of the alternatives above) with the intent of having a blueprint for consideration for Go 2.

Again, thanks to everybody, and particularly @btracey, for your contributions and time spent on this proposal.

-gri, for @golang/proposal-review

Contributor

griesemer commented Jan 24, 2017

This proposal addresses a large portion of the originally stated goal: better Go support for numerical applications. However, as also has become quite clear, it falls short in others (asymmetry of proposed solution, potential proliferation of builtins). Judging from the feedback received on this issue, there is no clear consensus that the shortcomings can be safely ignored.

Adding multi-dim. slices to the existing Go language would be a significant engineering effort. It seems unwise to make this effort with full knowledge of the proposal's inherent problems. Furthermore, exactly because the proposed solution ties so closely into the existing language, it would be nearly impossible to change or adjust the design down the road.

Thus, after repeated and careful consideration, we are going to decline this proposal.

That said, the discussions so far have been extremely helpful in delineating the problem domain, identifying key issues, and for identifying possible alternative approaches.

We suggest that this discussion continue with the intent to come up with a new and improved design (possibly along the lines of one of the alternatives above) with the intent of having a blueprint for consideration for Go 2.

Again, thanks to everybody, and particularly @btracey, for your contributions and time spent on this proposal.

-gri, for @golang/proposal-review

@dm319

This comment has been minimized.

Show comment
Hide comment
@dm319

dm319 Jul 15, 2017

Time to revive this discussion? As a go and R fan, how does fortran approach the problems seen here?

dm319 commented Jul 15, 2017

Time to revive this discussion? As a go and R fan, how does fortran approach the problems seen here?

@btracey

This comment has been minimized.

Show comment
Hide comment
@btracey

btracey Jul 15, 2017

Contributor

This issue is closed (and the proposal declined) for good reason. If you're looking for similar functionality, please see the gonum packages (gonum.org)

Contributor

btracey commented Jul 15, 2017

This issue is closed (and the proposal declined) for good reason. If you're looking for similar functionality, please see the gonum packages (gonum.org)

@dm319

This comment has been minimized.

Show comment
Hide comment
@dm319

dm319 Jul 16, 2017

We suggest that this discussion continue with the intent to come up with a new and improved design (possibly along the lines of one of the alternatives above) with the intent of having a blueprint for consideration for Go 2.

-gri

Maybe I should have mentioned that my comment was prompted by the recent announcement about working towards Go 2 (https://blog.golang.org/toward-go2).

dm319 commented Jul 16, 2017

We suggest that this discussion continue with the intent to come up with a new and improved design (possibly along the lines of one of the alternatives above) with the intent of having a blueprint for consideration for Go 2.

-gri

Maybe I should have mentioned that my comment was prompted by the recent announcement about working towards Go 2 (https://blog.golang.org/toward-go2).

@SamWhited

This comment has been minimized.

Show comment
Hide comment
@SamWhited

SamWhited Jul 16, 2017

Member

Maybe I should have mentioned that my comment was prompted by the recent announcement about working towards Go 2

Consider filing an Experience Report with any specific issues you've run into with real code. We can't talk about proposals and solutions until we know what the actual underlying problems are.

Member

SamWhited commented Jul 16, 2017

Maybe I should have mentioned that my comment was prompted by the recent announcement about working towards Go 2

Consider filing an Experience Report with any specific issues you've run into with real code. We can't talk about proposals and solutions until we know what the actual underlying problems are.

@shelby3

This comment has been minimized.

Show comment
Hide comment
@shelby3

shelby3 Feb 10, 2018

@griesemer wrote:

[…] but it is impossible to select a matrix column j that way (m[:, j] is invalid). In other words, indexing is asymmetric […]

I only had 5 minutes to learn about this issue by perusing this thread, thus my very rushed thought is that perhaps the problem can be in theory solved with typeclasses.

Essentially the 3. Struct type is the most correct solution to the issue presuming monomorphisation (inlining) and a smart enough optimizing compiler1. And btw, it’s the first solution that came to mind within 1 minute before I saw it proposed, so why did it take 4 years? So if we have typeclasses and operator overloading then the SetAt noise is replaced with the [] as we desire. Also the slice of a column as a matrix becomes another kind of struct which has a different typeclass implementation. Tada!

Afaics, with typeclasses the entire thing can be handled with libraries. And so we stop burdening the native with what can be in a library. Actually Go’s interfaces as is may be sufficient to implement the column slice as a specialized struct?[Edit: on further thought no Go’s interfaces lack the concept of an associated type in order for instances of the typeclass to recursively declare the special struct needed to take slice on a column of a matrix.]

Remember that typeclass bounds at the call site select the correct interface automatically based on the input data type. Go’s interfaces sort of do that also, but there’s some differences and limitations.

Apologies if in my haste I missed some points that cause my post to be noise. Please courteously correct me if so.

P.S. Is this implicating that Go has been without subslicing matrix capability for 5 years because of a lack of sufficient higher-level abstractions support in the language? Yet, I presume combining higher-level abstractions and maintaining low-level control performance is a difficult design challenge and maybe even insurmountable in general.

1 One of the reasons along with higher-level abstractions perhaps OCaml is favored by hedgefunds?

shelby3 commented Feb 10, 2018

@griesemer wrote:

[…] but it is impossible to select a matrix column j that way (m[:, j] is invalid). In other words, indexing is asymmetric […]

I only had 5 minutes to learn about this issue by perusing this thread, thus my very rushed thought is that perhaps the problem can be in theory solved with typeclasses.

Essentially the 3. Struct type is the most correct solution to the issue presuming monomorphisation (inlining) and a smart enough optimizing compiler1. And btw, it’s the first solution that came to mind within 1 minute before I saw it proposed, so why did it take 4 years? So if we have typeclasses and operator overloading then the SetAt noise is replaced with the [] as we desire. Also the slice of a column as a matrix becomes another kind of struct which has a different typeclass implementation. Tada!

Afaics, with typeclasses the entire thing can be handled with libraries. And so we stop burdening the native with what can be in a library. Actually Go’s interfaces as is may be sufficient to implement the column slice as a specialized struct?[Edit: on further thought no Go’s interfaces lack the concept of an associated type in order for instances of the typeclass to recursively declare the special struct needed to take slice on a column of a matrix.]

Remember that typeclass bounds at the call site select the correct interface automatically based on the input data type. Go’s interfaces sort of do that also, but there’s some differences and limitations.

Apologies if in my haste I missed some points that cause my post to be noise. Please courteously correct me if so.

P.S. Is this implicating that Go has been without subslicing matrix capability for 5 years because of a lack of sufficient higher-level abstractions support in the language? Yet, I presume combining higher-level abstractions and maintaining low-level control performance is a difficult design challenge and maybe even insurmountable in general.

1 One of the reasons along with higher-level abstractions perhaps OCaml is favored by hedgefunds?

@Jonconradt

This comment has been minimized.

Show comment
Hide comment
@Jonconradt

Jonconradt Aug 7, 2018

It appears that this was not submitted as an Experience Report. I am wondering why not?

Jonconradt commented Aug 7, 2018

It appears that this was not submitted as an Experience Report. I am wondering why not?

@sbinet

This comment has been minimized.

Show comment
Hide comment
@sbinet

sbinet Aug 7, 2018

Member

Probably because this happened before that procedure.

Member

sbinet commented Aug 7, 2018

Probably because this happened before that procedure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment