Add IBM System/390 support #291

shibatch · 2020-03-23T06:35:51Z

This patch adds IBM System/390 support.
Clang is not supported at this time because it seems not supporting VX intrinsics properly.

Note that this is the first big-endian architecture that SLEEF supports.

Configure.cmake

src/arch/helpers390x_128.h

seiko2plus · 2020-03-23T08:03:41Z

src/arch/helpers390x_128.h

+typedef VECTOR double vdouble;
+typedef VECTOR int vint;
+
+typedef VECTOR float vfloat;


Z13 doesn't support single-precision, but it can be emulated via two double-precision registers

I don't understand this.
It seems that both gcc and qemu support single-precision vector operation with z13 option.
Are both of those buggy?

Ah, gcc unrolls each vector operation into scalar operations.

vector extension in both compilers(clang & gcc) always emulate non-existence instructions which may lead to bad performance, that why I prefer the use of the prototypes of ZVECTOR and VSX instead.
An example of how we should handle single-precision in Z13:

// type #if CONFIG <= 131 typedef struct { vdouble val[2]; } vfloat; #else typedef __vector float vfloat; #endif // load static INLINE vfloat vloadu_vf_p(const float *p) { #if CONFIG <= 131 vfloat r; r.val[0] = vec_ld2f(p); // load and convert r.val[1] = vec_ld2f(p + 2); return r; #else return vec_xl(0, p); #endif } // store static INLINE void vstoreu_v_p_vf(float *p, vfloat v) { #if CONFIG <= 131 vec_st2f(v.val[0], p); // convert and store vec_st2f(v.val[1], p + 2); #else return vec_xst(v, 0, p); #endif } // Now emulate all operations via two double-prescion vectors static INLINE vfloat vsqrt_vf_vf(vfloat vf) { #if CONFIG <= 131 vf.val[0] = vec_sqrt(vf.val[0]); vf.val[1] = vec_sqrt(vf.val[1]); return vf; #else return vec_sqrt(vf); #endif }

This is emulation using double-precision operations.
SLEEF has functions that return bit-identical results across all platforms, and those functions cannot be implemented using this method. We need genuine single-precision operations.
I don't know how widely Z13 computers are currently deployed, but is single-precision support for Z13 so important?

Z13 mainframe launched in 2015, not sure if still widely used but most of the Linux-one instances provided through IBM/Cloud are Z14 so I guess we can drop the support of Z13 for now,
and focuses only on Z14/Z15

edelsohn · 2020-03-23T16:56:40Z

z13 has special instructions to load and store a single precision floating point pair as a double precision floating point pair, and then use normal double precision operations. z14 adds full single precision support.

edelsohn · 2020-03-25T01:38:44Z

The Linux Community Cloud systems now are z15.

shibatch · 2020-03-26T23:48:34Z

@seiko2plus @edelsohn

I now wonder if it is worth implementing single-precision functions with emulation with double-precision vector computation with ZVECTOR1.
There are the following two obstacles in implementing the functions in that way.

Data types for vfloat will be different between ZVECTOR1 and ZVECTOR2. This will make the code messy.
We cannot implement the deterministic version of the functions. We need true single-precision computation for this.

Importance of such functions is not certain. I don't know how widely Z13 computers are being used. It is hard to imagine that users use such functions on only Z13 computers.

The reason that I implemented ZVECTOR1 support is that QEMU 4.2.0 supports up to Z13 processors. So, testing is possible for ZVECTOR1 without real hardware. I also did not notice that the single precision vector operations on Z13 are emulated within the compiler.

There are another option, which is to drop ZVECTOR1 support. So, there are three options.

Continue implementing single precision functions with emulation with double-precision vector computation.
Go with the current implementation.
Drop ZVECTOR1 support.

I would like to know how important ZVECTOR1 support is. Double-precision functions with the current implementation works normally. We can just say that single precision functions with ZVECTOR1 are supplementary. So I think the current implementation is satisfactory. How do you guys think?

edelsohn · 2020-03-27T00:56:54Z

The Linux Community Cloud systems all should be z15.
I think that you safely can ignore z13. z13 will not be important for PyTorch users.

seiko2plus · 2020-03-27T00:58:01Z

travis/before_install.s390x-gcc.sh

@@ -0,0 +1,2 @@
+#!/bin/bash
+set -ev


Could you please dump auxiliary vector via LD_SHOW_AUXV=1 /bin/true to determine the ZARCH version?

…ch/sleef into Add_s390x_support_rebased

shibatch · 2020-04-07T00:03:16Z

@seiko2plus @edelsohn
I have now removed Z13 support.
Please review the patch again.

seiko2plus

It looks good to me. still, need several improvements similar to #288. But I will work on it later.

shibatch added 5 commits March 23, 2020 13:40

no message

23a6236

no message

bd44a91

no message

b1545b1

no message

db539e0

no message

6bd027a

shibatch requested review from seiko2plus, fpetrogalli and edelsohn March 23, 2020 06:35