Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Go2: Vector Basic Type - Similar to []T but with enhancements #35307

Closed
scott-boyce opened this issue Nov 1, 2019 · 25 comments
Closed

Proposal: Go2: Vector Basic Type - Similar to []T but with enhancements #35307

scott-boyce opened this issue Nov 1, 2019 · 25 comments

Comments

@scott-boyce
Copy link

@scott-boyce scott-boyce commented Nov 1, 2019

Background

Forgive me if I don't do this right, its actually my first post on github, let alone a feature development proposal. I primarily develop in Fortran 95-2008 with python for pre/post processing of data and glue codes. I have been experimenting with Go for the past three months and think the language can have some powerful extensions beyond server side programming.

Proposal

Adding a "vect" basic type that are analogous to standard slices, but are optimized for vector math, contiguous memory, pass by reference and compile with cpu SIMD instructions or addition optimizations. In addition there could be compiler flags to customize instructions for specific OS or CPUs (eg AVX2), which then would make it easier on the developer without having to dive into assembly code.

Potential type names could be

vecfloat32   or   float32vec
vecfloat64   or   float64vec 
vecint32     or   int32vec
vecint64     or   int64vec 

The use of the vec types would necessitate using make to initialize the base vector that would point to the underlying array, which would be guaranteed to be in contiguous memory and once made can only point to that underlying array or nil (nothing else). Operations that break the contiguous feature and constant dimention, such as append, should be disabled.

A vector size is set at run-time, but cannot be dynamically changed once established.

For example:

     x := make(float64vec, 32)
     y := make(float64vec, 32)
  ivec := make(int64vec,  100) 
  jvec := make(int64vec,  100) 

Would create a x and y as float64 vector-arrays and ivec and jvec as int64 vector-arrays. If possible it might be good to make part of the basic type implicit padding for (such as shifting the 100 dim to the best cache size, but still limiting the size to 100)

Like a slice a vec would point to an underlying array, but there would be additional restrictions:

  1. A vec that has not been allocated with make() is of nil type and is not usable (like a map)

  2. No dimension changing; after creating with make() cannot change len/cap

  3. Underlying array is always in contiguous memory

  4. Vec passed by reference, loops do not create copy variables (see later example) only pointer reference

  5. The life of a vec variable is nil to start and then once allocated with make() cannot be associated with any other memory location unless set to nil first.

    • No automatic repointing of a non-pointer vec type
    • A vec's memory is freed when it goes out of scope or set to nil.
    • A vec can be set to nil and the subsequently reallocated with make() to change its size, but it should be thought of as being a new variable (makes it simpler for name reuse).
      • If a vec is set to nil, then all pointers (eg *float32vec) that point to it are set to nil and no longer associated with vec.
  6. Pointers can only point to contiguous portions of allocated vectors

  7. Pointer versions cannot be allocated with make()

  8. Pointers may only point to

    • nil
    • a vec of the same type
      • If vec is allocated, then points to contiguous memory it points to
      • If vec is nil, then pointer follows vec
        -if vec is allocated with make(vec, len), then pointer will point to newly allocated memory along with vec
    • Another pointer, but must be a contiguous portion of it or nil
      • If p1 = &p2 and then later p2 => &vec, then by association p1 => &vec
  9. Vec pointers are automatically dereferenced when a dimension is specified.

    • p1[3] = 5 is syntactic sugar *p1[3] = 5
    • Consequently: *p1 == p[:]
      -No dimension functions just like a pointer that requires an address
  10. Pointers must point to an existing dimension or panic (example are p1, p2 *vecint32, vec vecint32)

  • p1 = &p2 is ok if p2 = nil or p2 => &vec

  • p1 = &p2[2:6] is ok only if p2 !=nil and p2 = &vec and spans at least 5 elements

    • dereferencing is not necessary as p2 should just return the address at vec[2]
    • An example: vec := make(vecint32,10); p2 => &vec; p1 => &p2[2:6]
  1. Pointers may point to a subdimention of a pointer that points to a subdimention of a vec as long as the memory exists
    • For example:
      • vec := make(vecint32, 20) // vec contains 20 elements
      • p2 => &vec[2:] // p2 points to the vec[2]
      • p1 => &p2[4:8] // p1 points to the p2[4], which points to vec[6]

Vector Operations

The advantage of a vector type is to allow vector operations (element wize operations and setting values).

For example:

     x := make(float64vec, 32)
     y := make(float64vec, 32)
     z := make(float64vec, 32) 
     
     z = x + y
     z = x - y
     z = x * y
     z = x / y
     z = x % y

Scalar operations that use the same base type are applied to the entire vector.

For example:

     x := make(float64vec, 32)
     y := make(float64vec, 32)
     
     x = 1.0           //all of x is set to 1
     y = x[3] + 10  //scalar value sets entire y vector

If a vector is combined with a scalar of the same time, then the scalar is applied to the entire vector.
For example:

     x := make(float64vec, 32)
     y := make(float64vec, 32)
     var z float64 = 15 
     
     x = 1.0           //all of x is set to 1
     y = x * z         // Set matching elements in y to the product of z to x
     x*=z              // Multiply all elements in x by z
     y = x + y + z  // vector addition for x and y, then add z to all elements       

The advantage is that these operations may lend to faster vector processes by the compiler.

Looping and Reference Values

Looping with range should be syntactic sugar for referencing an index.

For example:

     x := make(float64vec, 32)
      
     for i,v := range x{
      v = i
     }     
     
// is equivalent to:
      
     for i := range x{
      x[i] = i
     }  

Again the main goal of this is to take advantage of looping over contiguous memory for long vectors of numbers

Possible multi-dimension extensions

While I would not advocate this, it does open the possibility of creating an alias to a vec type that creates pointers to a vector to be referenced by multiple dimensions.

Not sure how this would be done, but one possiblity is to set one of the dimensions like how an array is declared

For example:

     x := make([8]float64vec, 24)
     
     // or
     //x := make(float64vec, dim1, dim2, ...) //dimX is the size of dimension X and total size is prod(dimX)
     
     x := make(float64vec, 3, 8)  // row or column major can be discussed at a later time

This would create a vector that in contiguous memory for 32 float64, but can be referenced in groupings of 8.

Such as:

x[1,3] would be syntactic sugar for x [1*8 + 3 - 1] or simply x[12]

Under the hood it would just be a contiguous memory vector, but the multi-index would open the doors to many numerical applications and potential compiler optimizations.

Hopefully this is a useful suggestion and would open the door for GO applications in numerics as well as adding faster vector math for its current applications.

@gopherbot gopherbot added this to the Proposal milestone Nov 1, 2019
@gopherbot gopherbot added the Proposal label Nov 1, 2019
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 1, 2019

The long list of restrictions is pretty complicated and not very Go like. In Go we aim for orthogonality of language features. I particularly note "If a vec is set to nil, then all pointers (eg *float32vec) that point to it are set to nil and no longer associated with vec"; I don't see how that can be implemented.

The usual problem with general purpose vector types is that different processors implement different kinds of support. That makes it hard to use general purpose types and be confident that they will work efficiently on all processors. So if you care about portability you wind up writing processor-specific code anyhow. And if you are going to write processor-specific code, it's not obvious why the language needs to support it. For example, we could instead provide processor-specific versions of the vector intrinsic functions used in C (e.g., https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/x86-Built-in-Functions.html#x86-Built-in-Functions).

@scott-boyce

This comment has been minimized.

Copy link
Author

@scott-boyce scott-boyce commented Nov 1, 2019

I am not a compiler developer, so those restrictions are more suggestions to keep numerical models and common array pitfalls form occurring (What I frequently see in debugging).

Even in Fortran, you are supposed to nullify all pointers (as NULLIFY(p) or P=>NULL()) once their referenced array is deallocated. Fortran is good about stack cleanup, so pointers are auto-nullified and arrays auto-deallocated once they go out of a functions scope.

I can see how the infered pointer nil'ing would be a problem, and was more hopeful as that would improve robustest of code, but requring an explict nil or auto-nil'ing when they go out of scope could work.

For example in Fortran

  REAL, DIMENSION(:), ALLOCATABLE,  TARGET:: VEC
  REAL, DIMENSION(:),  CONTIGUOUS, POINTER:: PNT
  REAL, DIMENSION(:),  CONTIGUOUS, POINTER:: SUBPNT
  !
  ALLOCATE(VEC(8))
  PNT    => NULL()
  SUBPNT => NULL()
  !
  DO CONCURRENT (I=1:SIZE(VEC));  VEC(I) = I
  END DO
  PNT => VEC
  SUBPNT => PNT(3:6)
  !
  WRITE(*,'(/*(F3.1, :2x))') VEC     ! 1.0  2.0  3.0  4.0  5.0  6.0  7.0  8.0
  WRITE(*,'( *(F3.1, :2x))') PNT     ! 1.0  2.0  3.0  4.0  5.0  6.0  7.0  8.0
  WRITE(*,'( *(F3.1, :2x))') SUBPNT  ! 3.0  4.0  5.0  6.0
  !
  SUBPNT = 5.0
  !
  WRITE(*,'(/*(F3.1, :2x))') VEC     ! 1.0  2.0  5.0  5.0  5.0  5.0  7.0  8.0
  WRITE(*,'( *(F3.1, :2x))') PNT     ! 1.0  2.0  5.0  5.0  5.0  5.0  7.0  8.0
  WRITE(*,'( *(F3.1, :2x))') SUBPNT  ! 5.0  5.0  5.0  5.0
  !
  PNT => NULL()
  !
  WRITE(*,'(/*(F3.1, :2x))') VEC     ! 1.0  2.0  5.0  5.0  5.0  5.0  7.0  8.0
  !WRITE(*,'( *(F3.1, :2x))') PNT    ! Raises Error
  WRITE(*,'( *(F3.1, :2x))') SUBPNT  ! 5.0  5.0  5.0  5.0
  !
  DEALLOCATE(VEC)                   !Explicit deallocate -- not part of GO, but used for this example
  !
  !WRITE(*,'(*(F3.1, :2x))') VEC    ! Raises Error 
  !WRITE(*,'(*(F3.1, :2x))') PNT    ! Raises Error
  WRITE(*,'(/*(F3.1, :2x))') SUBPNT  ! Pointer holds onto memory location and prints gibberish

In terms of implementation, it would have to be pushed on the compiler to optimize the code based on the architecture that it is on. This is what is done in Fortran, you write the code, and the compiler translates it to the appropriate OS/CPU and optimizes the hell out of it. Its why code written in the 1980s in Fortran is still usable today and did not have the hangups that C did when machines went to 64bit. In fact some code I compile into my projects was written in the early 90s; one program that I can compile, but dont use, solves the determinant and was written in the 1960s!

The Fortran compiler knows that the arrays are contiguous in memory and imposes additional restrictions on the pointers such that the binary is optimized for the target OS/CPU.

For example, my current project is a regional surface and groundwater simulation software platform composed of ~300,000 lines of Fortran that compiles on Windows and Linux on Intel/AMD64 cpus. It then auto-vectorizes any array manipulations (the DO CONCURRENT or similar to the examples I gave earlier).

It was something I noticed that would extend the utility of GO by including some sort of vector support. The one thing I dont like about Python, is everyone just turns to C for real array operations (hence why numpy is faster than numpy cause its really just a c-wrapper on top of Fortran).

Another option, is to develop bindings for fgo, which allows go to pass slices to fortran subroutines. From what I read about recommendations about CGO I would not advocate that. (Personally I would love it, but it sounds like the benifits are lost in translation),

@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Nov 1, 2019

Adding a "vect" basic type that are analogous to standard slices, but are optimized for vector math, contiguous memory, pass by reference and compile with cpu SIMD instructions or addition optimizations.

Slices provide contiguous memory and pass by reference already.

Have you looked at GoNum? It provides a lot of what you are describing as far as optimizations go.

I think that just leaves the operator overloading.

@maj-o

This comment has been minimized.

Copy link

@maj-o maj-o commented Nov 4, 2019

The benefit of this proposal are the operators. Go lacks the ability of defining operators (not overloading!). GoNum is no help. Using methods makes the mathematical expressions unreadable, untestable and at the end wrong, because methods have no precedence.
Why can i add strings but not decimals or vectors using the plus operator?
The inner data may be a slice or whatever. This could each developer solve at his/her own. This is not the problem.
But declaring an indexing or operators is not possible. This leads many solutions to other programming languages.

@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Nov 4, 2019

So could this proposal just be operator overloading on slice types? Do we need a separate vector type?

b := make([]float64, 100)
c := make([]float64, 100)
b *= 5.0 // multiply every element of b by 5
x := b + c // vector addition
@scott-boyce

This comment has been minimized.

Copy link
Author

@scott-boyce scott-boyce commented Nov 4, 2019

Its more that by creating a "vec" type also clues in the go compiler to make additional optimizations at the sacrifice of flexibility.

The compiler than can offer flags for additional cpu optimization flags, such as SSE/AVX/etc.

In Fortran, there is automatic vectorization, so when you compile you specify the instructions it should try to use (or minimum CPU it should support) and it looks for contiguous blocks and vector math and adds into the assembly the instructions for the CPU (such as AVX2).

Like that previous Fortran code I set up, would automatically be passed as AVX2 instructions.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 4, 2019

We should certainly consider adding vectorization to the gc compiler. Note that you can already get vectorization with Go by using gccgo or GoLLVM.

Adding new types to the language is a much higher bar.

@scott-boyce

This comment has been minimized.

Copy link
Author

@scott-boyce scott-boyce commented Nov 4, 2019

We should certainly consider adding vectorization to the gc compiler. Note that you can already get vectorization with Go by using gccgo or GoLLVM.

Adding new types to the language is a much higher bar.

Does this only apply to the C code developed for go or the go code itself? I saw the LLVM, and put on my todo list to figure out their Fortran LLVM (mostly curious if it can complete with then private compilers, like Intel).

@thanm

This comment has been minimized.

Copy link
Member

@thanm thanm commented Nov 4, 2019

@maj-o

This comment has been minimized.

Copy link

@maj-o maj-o commented Nov 4, 2019

@thanm I've tried to read the disassembles, but sorry, my Z80 and K68 times are long ago.
As i understand, the compiler already does a good job optimizing additions in a loop. Am i right?

@thanm

This comment has been minimized.

Copy link
Member

@thanm thanm commented Nov 4, 2019

@maj-o the intent of my post was just to show that the gccgo backend is using the same sorts of vector instructions that you would expect to see for a comparable C/C++ example compiler with "gcc -O3".

@smasher164

This comment has been minimized.

Copy link
Member

@smasher164 smasher164 commented Nov 5, 2019

In practice, depending on auto-vectorization becomes a game of tug-of-war with the optimizer to generate the assembly one wants, across future changes to one's code. Most code that auto-vectorization takes care of is usually manually vectorized anyways, so I would prefer to see manual vectorization surfaced in the language/standard-library (possibly via intrinsics) as opposed to assembly.

However, I think that dynamic vectors introduce loads of complexity to the optimizer as to what happens across architectures. Even with packed SIMD intrinsics, the fallbacks and API surface area get unwieldy, like with Vec128 (see https://godoc.org/github.com/smasher164/simd). I would be more open to first introducing fixed-width intrinsics.

@deanveloper

This comment has been minimized.

Copy link

@deanveloper deanveloper commented Nov 6, 2019

Note that an alternative that would solve many of these issues is with a more general approach of considering operator functions: #27605

It may be a good idea to look for a more generic language feature rather than having tons of built-in types that all have their own operators defined for them by the language.

@scott-boyce

This comment has been minimized.

Copy link
Author

@scott-boyce scott-boyce commented Nov 6, 2019

While operator overloading would be really nice, its more setting up a datatype with limitations that aid the compiler for determining optimization and auto-vectorization.

I probably would recommend making tons of vec versions, but rather 5 of them, maybe one for byte, int32, int64, floast32, float64. Another option is to make them an alias to the slice versions of the same name, but imposes more restrictions.

Directives are nice to some degree, but I know the golang is pretty anti-directive. While I think it be better to impose a consistent protocal for the life of a variable,

There could be some benivit for the programmer to say,
"//go:AVX var s"
or
"//go:VecOpt var s" then pass a compiler flag for the VecOpt to use (AVX/SSE/etc).

On a side note, this is Fortran Operator Overloading (a clipped version of my Fortran library that mimics Python DateTime (The bottom set of GENERIC pointers are the operator overloading):

  TYPE DATE_OPERATOR
     CHARACTER(10):: DATE = NO_DATE
     INTEGER:: YEAR  = NINER              ! NINER = -999
     INTEGER:: MONTH = NINER
     INTEGER:: DAY   = NINER
     INTEGER:: JDN   = NINER              !Julian day number
     REAL(REAL64):: FRAC  = DZ            !Fraction of 24 hour day
     REAL(REAL64):: DYEAR = DZ  
     LOGICAL:: MONTH_DAY = FALSE
     !
     CONTAINS
     !
     GENERIC            :: INIT         => INITIALIZE_DATE_OPERATOR_STR,        & !(DATE_STR, [FRAC], [LEAP], [FOUND_DATE], [ONLY_DYEAR], [TIME_SPACE])
                                           INITIALIZE_DATE_OPERATOR_DMY,        & !(DAY, MONTH, YEAR, [FRAC])
                                           INITIALIZE_DATE_OPERATOR_DMYHMS,     & !(DAY, MONTH, YEAR, HOUR, MIN, SEC) HMS int
                                           INITIALIZE_DATE_OPERATOR_DMYHMSdbl,  & !(DAY, MONTH, YEAR, HOUR, MIN, SEC) S is dble
                                           INITIALIZE_DATE_OPERATOR_JY,         & !(JDN, YEAR, [FRAC])
                                           INITIALIZE_DATE_OPERATOR_JYdbl,      & !(JDN, YEAR,       )  JDN is dbl
                                           INITIALIZE_DATE_OPERATOR_DYEAR_DBLE, & !(DYEAR)
                                           INITIALIZE_DATE_OPERATOR_DYEAR_SNGL, & !(DYEAR)
                                           DESTROY_DATE_OPERATOR
     PROCEDURE, PASS(DT):: NOW          => INITIALIZE_DATE_OPERATOR_CURRENT    !()
     GENERIC            :: INTERPOLATE  => DATE_OPERATOR_INTERPOLATE_DBLE, DATE_OPERATOR_INTERPOLATE_SNGL
     !
     GENERIC            :: ADD_DAY      => ADD_DAY_INT,   ADD_DAY_DBLE  !(DAY,  FRAC, LEAP)
     GENERIC            :: ADD_MONTH    => ADD_MONTH_INT, ADD_MONTH_DBL !(MON,  LEAP)
     GENERIC            :: ADD_YEAR     => ADD_YEAR_INT,  ADD_YEAR_DBL  !(YEAR, LEAP)
     GENERIC            :: ADD_SEC      => ADD_SEC_INT, ADD_SEC_DBLE    !(SEC,  LEAP)
     GENERIC            :: ADD_MIN      => ADD_MIN_INT, ADD_MIN_DBLE    !(MIN,  LEAP)
     GENERIC            :: ADD_HOUR     => ADD_HOUR_INT, ADD_HOUR_DBLE  !(HOUR, LEAP)
     !
     PROCEDURE, PASS(DT):: STR          => DATE_OPERATOR_STRING_REPRESENTATION
     PROCEDURE, PASS(DT):: STR_ELAPSED  => DATE_OPERATOR_PRINT_DIF !(DT, DT2, [UNIT])  UNIT => 0 = all units, 1 = sec, 2 = min, 3 = hour, 4 = day, 5 = largest Unit
     PROCEDURE, PASS(DT):: PRETTYPRINT  => DATE_OPERATOR_PRETTYPRINT
     !
     GENERIC            :: READ(FORMATTED)  => DATE_OPERATOR_FMTREAD   ! Fortran User-Defined Derived-Type I/O
     GENERIC            :: READ(UNFORMATTED)=> DATE_OPERATOR_BINREAD
     GENERIC            :: WRITE(FORMATTED) => DATE_OPERATOR_FMTWRITE
     GENERIC            :: OPERATOR(+)      => DATE_OPERATOR_ADD_INT, DATE_OPERATOR_ADD_DBLE, DATE_OPERATOR_ADD_SNGL
     GENERIC            :: OPERATOR(-)      => DATE_OPERATOR_SUB_INT, DATE_OPERATOR_SUB_DBLE, DATE_OPERATOR_SUB_SNGL, DATE_OPERATOR_SUB
     GENERIC            :: ASSIGNMENT(=)    => COPY_DATE_OPERATOR, STR_TO_DATE_OPERATOR
     GENERIC            :: OPERATOR(==)     => DATE_OPERATOR_EQUALITY
     GENERIC            :: OPERATOR(<)      => DATE_OPERATOR_LESS_THAN
     GENERIC            :: OPERATOR(<=)     => DATE_OPERATOR_LESS_THAN_EQUAL
     GENERIC            :: OPERATOR(>)      => DATE_OPERATOR_GREATER_THAN
     GENERIC            :: OPERATOR(>=)     => DATE_OPERATOR_GREATER_THAN_EQUAL
     !
  END TYPE
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 19, 2019

This proposal has a long list of rules which I think are intended to force the use of vector instructions. But those rules are complicated, and seem hard to remember, and don't seem particularly Go like.

It might be simpler to just permit some binary operations on slice and array types. That would make it easier for the compiler to vectorize those operations when possible. Though it would have other drawbacks, as the time required for the operation would depend on the length of the slice.

If the desire is to be sure that operations use vector instructions, then I don't think this is the right approach. We need something simpler.

@scott-boyce

This comment has been minimized.

Copy link
Author

@scott-boyce scott-boyce commented Nov 20, 2019

I am open to suggestions, this was more to start a discussion thread of ways to make GO more suitable for numerical simulations without having to co-compile with C/Fortran (like what is done with Python to make it able to compete with Fortran).

It would be great to have the language have some level of support for fast vector math or fast looping through a vector or multi-dimensional array.

I was not sure if the slice operations could be modified. There could be something added to the make() function for slices that indicates that the slice should be treated as a vectorizable array. Another option also is to have make() use the capacity as a minimum and to pick the optimal size for vectorization or the type of vectorization requested.

So something like

x := make([]float64, 10, 16, []string{"AVX", "SIMD"})

would use the best requested instruction.

A better option is to just an on/off flag that must be used in tandem with a compiler flag (here I just use the word fast to indicate the array should use fast systemics).

x := make([]float64, 10, 16, "fast")

then when compiled it would by default ignore the "fast"

but could have something like go -fast:avx myCode.go

which would inform the compiler to avx any array tagged with fast (It also may increase the size of CAP to make the operations better.

If its necessary to disable the optimization loops could have a macro directive as I mentioned earlier in

"//go:AVX var s"
or
"//go:VecOpt var s" then pass a compiler flag for the VecOpt to use (AVX/SSE/etc).

but could be something more like:
//go:fast
to indicate the use any vectorization within the loop

@IanTayler

This comment has been minimized.

Copy link

@IanTayler IanTayler commented Nov 21, 2019

The question, for me, although it might not belong to this issue, is whether it makes sense to add something like a vector stdlib package with functions like func Addf64(x, y []float64), probably modifying x in place and implemented in a way that makes it most likely to be vectorized (potentially using compiler directives/whatever). Generics might make something like that more organic, if they allow us to have a single function for all numeric slice types.

That's not going to take advantage of every possible SIMD optimization, but in my opinion anything more complex than that belongs in a package like gonum. And if that's not possible/good enough then I'm happy doing numerical programming in julia/python/something else and using golang for what I'm using it now.

As a side note, I'm not a big fan of operator overloading in general, more so if there's no way of making sure certain properties hold (e.g. + defines something reasonably close to an abelian group barring floating-point approximation and weirdnesses).

I've seen it gone wrong even in very simple cases like overloading of operators for atomic values in C++, where it makes it much harder to detect at a simple glance when an operation is guaranteed to be atomic and when it isn't when compared to using fetch_add, etc.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 22, 2019

@scott-boyce Adding options to make is not simple. I want to stress that we are unlikely to make the language more complex so that it can support vectorized loops. We're happy to make Go code run as fast as possible, by making the compiler and runtime complex if necessary. We're not interested in making the language more complex in order to make it run faster.

@IanTayler 's suggestion of a vector package is something that could more easily be done. The vector package would provide a comprehensible API, and anybody who choose to use it could do so. The language would not change.

@scott-boyce

This comment has been minimized.

Copy link
Author

@scott-boyce scott-boyce commented Nov 25, 2019

@ianlancetaylor That works for me as well. Most people dont use the directives in Fortran, but rather rely on compiler flags for the compiler target to do the optimizations (it would be my preference to have it htat way, I just was unsure if the GO compiler required some sort of flag scheme in the code to help it).

@smasher164

This comment has been minimized.

Copy link
Member

@smasher164 smasher164 commented Nov 26, 2019

Regarding a standard vectorization package, it is worth looking at prior art in the form of the WebAssembly SIMD Proposal, which itself is inspired by Dart's SIMD Numeric Types.

They both acknowledge that a 128-bit lane width is enough for most people. However, a standard vectorization package isn't feasible either without some form of generics or assistance from the language.

For example, instead of defining an ExtractLaneXX(V128, int) XX function for every fixed-width builtin (int8, int16, int32, int64, uint8, uint16, uint32, uint64, float32, float64), a simpler API should parameterize these functions over the vector and element type. Given the current contracts draft, that would look like

package simd

contract V128(V, T) {
    V [4]float32, [2]float64, [16]int8, [8]int16, [4]int32, [2]int64, [16]uint8, [8]uint16, [4]uint32, [2]uint64
    T int8, int16, int32, int64, uint8, uint16, uint32, uint64, float32, float64
}

func ExtractLane(type V, T V128)(vec V, i int) T { ... }

Usage would look like:

vec := [4]float32{1,2,3,4}
simd.ExtractLane([4]float32, int16)(vec, 2)

Aside: It would be nice to be able to omit the type argument for [4]float32, since it could be deduced from vec, but I don't think the draft allows something like simd.ExtractLane(T:int16)(vec, 2).

These generic functions would define a scalar fallbacks in the package, but would need the compiler's assistance to be intrinisified (like math/bits) into SSE, NEON, MSA, Altivec, VIS across different GOARCHs. Runtime feature detection would likely slow down these operations unless the instructions can be scheduled to batch them together. Alternatively, we could introduce an environment variable to disregard feature detection at build-time.

Perhaps this issue can be repurposed into a proposal for a vector package?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 26, 2019

I recommend that we use a different issue for a different vector proposal.

Based on the discussion above, and the complexity of the proposal, this particular proposal is a likely decline. Leaving open for four weeks for final comments.

@maj-o

This comment has been minimized.

Copy link

@maj-o maj-o commented Dec 10, 2019

I think it is worth thinking about a solid and simple base for numerical types in go (without braking anything).
For me this seams to be more general issue. It's not just "removing complex numbers" (#19921) or simd (or even mimd) optimized vector types or arbitrary precision decimals or the pure existance of the math package or ...
Please, talk about how all this could lead to a solid and simple solution.

Don't get me wrong, i love go.

@pbarker

This comment has been minimized.

Copy link
Contributor

@pbarker pbarker commented Dec 28, 2019

There's an incredible opportunity for Go in this space. The current goto numerical languages (python/r/julia) are all flawed in ways that make them painful to write. Bringing all the things Go has done right into numerical computing would have a big impact.

I agree that operator overloading can go very wrong after writing Scala for a couple years. Which means they probably need to be a native type as proposed with the vector package. Having pragmas that can support different hardware accelerators would be incredible.

@kortschak

This comment has been minimized.

Copy link
Contributor

@kortschak kortschak commented Jan 7, 2020

Using methods makes the mathematical expressions unreadable, untestable and at the end wrong, because methods have no precedence.

This is a strong and evidentially empty statement.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 7, 2020

We agree that there is an opportunity here, but we don't know what it looks like. Also, this is an area that is likely to be affected by generics, which are in progress. For this specific proposal, there is no change in consensus. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.