need to decide on how to support FLOAT*16 for Fortran. #5

iains · 2020-08-30T15:14:14Z

Unfortunately, the current arm64 iOS/macOS long double is 64 bit - we need to figure out some reasonable way to give users access to a larger version.

immediate options:

support IEEE754-128 somehow (soft/hard as possible). This might be a bit painful performance-wise, TBD.
if performance is an issue, there's always "ibm128 - double double".

fxcoudert · 2020-08-30T16:09:04Z

From the Fortran point of view: if we have access to binary32 and binary64 floating-point types, not having larger types is not a show-stopper, because:

There are Fortran codes out there that use 80-bit extended-precision floating-point types, but they are not portable anyway, and compilers have traditionally not been consistent w.r.t. access to such a type.
Support for binary128 is nice (I am not going to say otherwise, as I was among those who introduced libquadmath and Fortran support for binary128), but… here again, codes using it are the exception, rather than the norm — because not all compilers support it anyway.

fxcoudert · 2020-08-30T21:52:00Z

Here is the current situation confirmed with a build of 6363793:

there are 5 integer (and logical) kinds, which I believe is correct: 8, 16, 32, 64 and 128-bit integer types
there are only 2 real (and complex) kinds: 32 and 64-bit floating-point types

Both the front-end and libgfortran correctly detect and build for those kinds. From my point of view, this is standard-conforming under the Fortran standard.

I note that we could in theory expose _Float16 as a Fortran real and complex kind, but we don't do it on other ARM targets and I do not think there is a real need. And we would need to have math functions for it.

rhdtownsend · 2020-12-20T20:46:20Z

From the Fortran point of view: if we have access to binary32 and binary64 floating-point types, not having larger types is not a show-stopper, because:

* There are Fortran codes out there that use 80-bit extended-precision floating-point types, but they are not portable anyway, and compilers have traditionally not been consistent w.r.t. access to such a type.

* Support for binary128 is nice (I am not going to say otherwise, as I was among those who introduced libquadmath and Fortran support for binary128), but… here again, codes using it are the exception, rather than the norm — because not all compilers support it anyway.

Given that gcc on x86_64/linux and aarch64/linux supports 128-bit floats, I think it would be very important to offer them on aarch64/darwin. Plus, there are a couple of big codes that I use/develop that rely on quad precision at certain points in the execution path.

iains · 2020-12-21T09:02:32Z

(as a writer of scientific and signal processing code) I am inclined to agree, there are times when intermediates for a double calc are easier to handle as quads than some numerical gymnastic.

(As a toolchain developer) I have to point out that this is not just something that requires us to write a bit of library code and/or support in the compiler.

Adding a different-size type requires ABI changes, C++ mangling, C++ library support and (in some way) libc support [think, printf]. GCC's Fortran is part of a suite of compilers, that share much common infrastructure.

While we could adopt the [128b long double] ABI as per AAPCS (the aarch64 'official' ABI) .. in fact, Darwin/macOS has a different ABI (darwinpcs) that specifies 64b long double. It would be prudent, at least, to discuss the way forward with the Apple folks.

/home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77:25: runtime error: left shift of 0x0000000000000000fffffffffffffffb by 96 places cannot be represented in type '__int128' #0 0x7ffff754edfe in __ubsan::Value::getSIntValue() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77 #1 0x7ffff7548719 in __ubsan::Value::isNegative() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.h:190 #2 0x7ffff7542a34 in handleShiftOutOfBoundsImpl /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:338 #3 0x7ffff75431b7 in __ubsan_handle_shift_out_of_bounds /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:370 #4 0x40067f in main (/home/marxin/Programming/testcases/a.out+0x40067f) #5 0x7ffff72c8b24 in __libc_start_main (/lib64/libc.so.6+0x27b24) #6 0x4005bd in _start (/home/marxin/Programming/testcases/a.out+0x4005bd) Differential Revision: https://reviews.llvm.org/D97263 Cherry-pick from 16ede0956cb1f4b692dfa619ccfa6ab1de28e19b.

stephentyrone · 2021-03-26T15:56:28Z

I'll talk to some folks within Apple and see if we can add an explicit C ABI for _Float128 in the darwinpcs. Speaking personally, double-double is something that no language should foist on users unless they explicitly say they want it. Binding it to FLOAT*16 would be quite unfortunate.

iains · 2021-03-26T16:13:41Z

(some of) the Apple folks are already aware of the interest.

what would be great is to minimise any additional divergence between the AAPCS and darwinpcs - every such difference is a maintenance burden (my guess is that's equally true for those responsible for maintaining the aarch64 LLVM backend).

At least in terms of CC and type mangling etc that's very highly desirable (of course, if already have [64] libc++ symbols mangled as 'long double' then we will be obliged to use a different mangling for __float128, _Float128 (however it is eventually spelt).

The transition from mlong-double-64 => mlong-double-128 was handled once before at the OS level with libc symbols with $LDBL128 appended. That worked (and continues to work) well on powerpc-darwin (the fact that the underlying implementation of the 128b version is double-double there is academic to the overall presentation of an API).

iains · 2021-03-26T16:17:30Z

note that if one did use double-double as a stop-gap, it would be mangled __ibm128 for compatibility - so would not actually tread on the toes of a final solution.

It might be instructive to look at how fortran handles the two possible 128 long doubles for powerpc ...

(there are more pressing code-gen issues on my agenda before this one tho, sadly).

fxcoudert · 2021-03-26T17:54:13Z

Given that aarch64-linux gcc supports TFmode (which it calls long double, as I understand), we should ideally use the same ABI as that (except in our case it would be called _float128).

Nobody wants double-double/ibm128.

gcc/ada/ * raise-gcc.c (__gnat_others_value): Remove const qualifier. (__gnat_all_others_value): Likewise. (__gnat_unhandled_others_value): Likewise. (GNAT_OTHERS): Cast to Exception_Id instead of _Unwind_Ptr. (GNAT_ALL_OTHERS): Likewise. (GNAT_UNHANDLED_OTHERS): Likewise. (Is_Handled_By_Others): Change parameter type to Exception_Id. (Language_For): Likewise. (Foreign_Data_For): Likewise. (is_handled_by): Likewise. Adjust throughout, remove redundant line and fix indentation. * libgnat/a-exexpr.adb (Is_Handled_By_Others): Remove pragma and useless qualification from parameter type. (Foreign_Data_For): Likewise. (Language_For): Likewise.

The fixed error is: ==21166==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60300000d900 #0 0x7367d7 in operator delete(void*, unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:172 #1 0x3b82e6e in pointer_equiv_analyzer::~pointer_equiv_analyzer() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:161 #2 0x3b83387 in hybrid_folder::~hybrid_folder() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:517 #3 0x3b83387 in execute_early_vrp /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:686 #4 0x1790611 in execute_one_pass(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2567 #5 0x1792003 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2656 #6 0x1792029 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2657 #7 0x179209f in execute_pass_list(function*, opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2667 #8 0x178a5f3 in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:1773 #9 0x1792fac in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/plugin.h:191 #10 0x1792fac in execute_ipa_pass_list(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:3001 #11 0xc525fc in ipa_passes /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2154 #12 0xc525fc in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2289 #13 0xc5a096 in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2269 #14 0xc5a096 in symbol_table::finalize_compilation_unit() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2537 #15 0x1a7a17c in compile_file /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:482 #16 0x69c758 in do_compile /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2210 #17 0x69c758 in toplev::main(int, char**) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2349 #18 0x6a932a in main /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/main.c:39 #19 0x7ffff7820b34 in __libc_start_main ../csu/libc-start.c:332 #20 0x6aa5fd in _start (/home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/objdir/gcc/cc1+0x6aa5fd) 0x60300000d900 is located 0 bytes inside of 32-byte region [0x60300000d900,0x60300000d920) allocated by thread T0 here: #0 0x735ab7 in operator new[](unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:102 #1 0x3b82dac in pointer_equiv_analyzer::pointer_equiv_analyzer(gimple_ranger*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:156 gcc/ChangeLog: * gimple-ssa-evrp.c (pointer_equiv_analyzer::~pointer_equiv_analyzer): Use delete[].

…atch.pd All of the optimizations/transformations mentioned in bugzilla for PR tree-optimization/40210 are already implemented in mainline GCC, with one exception. In comment #5, there's a suggestion that (bswap64(x)>>56)&0xff can be implemented without the bswap as (unsigned char)x, or equivalently x&0xff. This patch implements the above optimization, and closely related variants. For any single bit, (bswap(X)>>C1)&1 can be simplified to (X>>C2)&1, where bit position C2 is the appropriate permutation of C1. Similarly, the bswap can eliminated if the desired set of bits all lie within the same byte, hence (bswap(x)>>8)&255 can always be optimized, as can (bswap(x)>>8)&123. Previously, int foo(long long x) { return (__builtin_bswap64(x) >> 56) & 0xff; } compiled with -O2 to foo: movq %rdi, %rax bswap %rax shrq $56, %rax ret with this patch, it now compiles to foo: movzbl %dil, %eax ret 2021-07-08 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR tree-optimization/40210 * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as (x>>C3)&C2 when possible. Simplify bswap(x)>>C1 as ((T)x)>>C2 when possible. Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255. gcc/testsuite/ChangeLog PR tree-optimization/40210 * gcc.dg/builtin-bswap-13.c: New test. * gcc.dg/builtin-bswap-14.c: New test.

utf · 2021-11-05T13:40:56Z

Hi all. Are there any updates on this issue?

iains · 2021-11-05T13:48:38Z

no, the priority at the moment is on getting the base port ready for inclusion in gcc-12
We would not be able to do this "unilaterally" anyway - the ABI owner is Apple, so we might make a suggestion (or offer a design) - but they would have to initiate the changes, and I'd imagine that would require a clang implementation as well.

utf · 2021-11-05T13:50:48Z

Ok, thanks for the update. All your work on this port is very much appreciated.

Fixes: ==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000666ca5c at pc 0x000000ef094b bp 0x7fffffff8180 sp 0x7fffffff8178 READ of size 4 at 0x00000666ca5c thread T0 #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855 #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916 #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887 #3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829 #4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427 #5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346 #6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967 #7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808 #8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146 for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d. gcc/d/ChangeLog: * d-attribs.cc (parse_optimize_options): Check index before accessing cl_options.

jeffhammond · 2021-12-06T09:32:01Z

Is it possible to an a configure option to tell GCC to use int128 as the ABI for binary128? What doesn't work in that scenario?

fxcoudert · 2021-12-06T10:00:50Z

Is it possible to an a configure option to tell GCC to use int128 as the ABI for binary128? What doesn't work in that scenario?

I think the limitation is not “what do we chose” but “how do we make sure the choices made are consistent”. In a way, Apple “owns” this ABI and we probably cannot decide without them.

jeffhammond · 2021-12-06T10:07:33Z

Given that Apple doesn't have a Fortran compiler, I don't see how GCC could be inconsistent with their REAL*16 ABI. I also think 15 months is more than enough opportunity for Apple to have contributed here, and their failure to contribute means that the open-source community should be able to solve the problem themselves. Perhaps seeing the ecosystem move on without them will inspire Apple to do something here. It's not like they lack the resources.

iains · 2021-12-06T10:07:39Z

In a way, Apple “owns” this ABI and we probably cannot decide without them.

I think this is the critical factor, if there's a demand for 128b float (I would expect that to come at some point), then raising radars with Apple is a good starting point (since the absence affects C-family as well and we would need a coherent change to allow c-bindings to work)
For my 0.02GBP it would be sad if the chosen ABI diverged more from the AAPCS than we already have, so engaging in the discussion with the Apple ABI folks will also be on my agenda.
It also needs support from libSystem (libc) for printing etc. and a sensible migration path (e.g. as for the case when PPC migrated from long-double-64 => long-double-128 - we have to attach some discriminator to the builtins and libc calls).

jeffhammond · 2021-12-06T10:45:54Z

Can we at least converge on the idea that REAL*16 must be equivalent to REAL(kind=REAL128) and this must be binary128, ie that neither long double nor double double is relevant? That means we can connect directly to C/C++ float128.

I think we're mostly here but I figure it's good to be unambiguous.

iains · 2021-12-06T11:00:57Z

" C/C++ float128 " is defined where and with what ABI?

The version of the darwinpcs I am working from has no 128b float (C long double is 64 bits).

Iff the port transitions to having a 128b float type, then I would expect C long double to be using that ( at least selectively with -mlong-double-128 as is done elsewhere ) .. at some point one would also expect that to become the default and then -mlong-double-64 would be provided as a legacy option.

This is the kind of mechanism that has been used in the past.

It seems that no-one is interested in using double-double, so that is a moot point (although as can be seen with the current PPC64 work, there are strategies for supporting both that and IEEE128 in the same toolchain)

jeffhammond · 2021-12-06T11:24:51Z

I'm not saying Apple has a C/C++ float128 ABI, I'm merely saying that mapping REAL*16 to that simplifies the problem and gives Apple more of a reason to care.

I don't think long double is worth thinking about. It's a garbage feature of C that exists to expose x87 80-bit floats, and is beyond useless in any other context, because it's not even a platform stable ABI (Linux yes, but not always).

fxcoudert · 2021-12-06T11:27:33Z

My own take on this:

No-one wants double-double. Even on PPC64, where this exists, the transition to IEEE128 is painful
IEEE128 exists, it's there, it's become a real standard, let's use it
libSystem support is something I would not expect, and that is not relevant for Fortran: we provide our own libraries for printing and math (libquadmath)
On Intel, there was never any system support for __float128, and people still used it :) so having it on ARM would allow us to reach feature ARM/Intel feature parity, at least

Iff the port transitions to having a 128b float type, then I would expect C long double to be using that ( at least selectively with -mlong-double-128 as is done elsewhere )

I disagree with that. On most targets, C long double is equal to double, at least with default options. We should keep it that way, as there may be code that expects it. In fact, I think we have to keep double = long double != __float128, to retain compatibility with ABI and system clang, which treats long double as the same size as double:

$ cat u.c
#include <math.h>
#include <stdio.h>

int main (void) {
  long double x = 2;
  printf("%20.17Lg\n", sqrtl(x));
  printf("%zi\n", sizeof x);
}
$ clang u.c -W -Wall && ./a.out 
  1.4142135623730951
8

Edit, to add clarification on that sqrtl example above: also, I don't think we should or can switch long double to something other than Apple's choice (long double = double), because then the library functions would not match anymore, and that's a nightmare (it happened on powerpc-darwin at some point, and created lot of trouble.

iains · 2021-12-06T12:43:28Z

Well, I understand that your focus is on Fortran - but some of us transitioned away from Fortran around the same time the VAX 11/785 was retired (not a co-incidence sometimes one has no choice on employer's decisions). We carried on writing numerical analysis, signal processing and simulation code (just now in C) .. if Fortran users have a requirement for 128b float, then so do c-family users ;)

.. clearly, most of the use of aarch64 darwin has been under iOS and therefore has not seen any such desires(?) - but I'd think to move forward, the platform would have to adopt an 128b float type and agree ABI before we could plug it into fortran.

That would mean that clang and the relevant parts of the system would be updated to deal with this; FWIW (despite that you recall pain associated) the approach used for PPC is solid and functional to this day (completely independent of what the 128bits represent).

iains · 2021-12-06T12:49:44Z

clarification: none of what I've said implies switching from Apple's choice - rather, my thesis is that we have to adhere to Apple's choice - if they elect not to adopt a 128b float (officially) then we might think again - but in the first instance, the right solution is to lobby for an official 128b float type on the platform.

I do not think it significant that no Apple comments have been made on this thread (there is no reason for anyone from Apple to read it).

jeffhammond · 2021-12-06T13:05:19Z

Well, I understand that your focus is on Fortran

It's my focus here because it's in the title of the issue, and I came here as a result of not being about to build Fortran applications on my Apple M1 laptop.

jeffhammond · 2021-12-06T13:08:50Z

I do not think it significant that no Apple comments have been made on this thread (there is no reason for anyone from Apple to read it).

That's not correct (#5 (comment)), although I know Steve isn't speaking with institutional authority here.

iains · 2021-12-06T13:54:41Z

I fear there is an amount of "talking past" going on here...

Fortran does not exist independently of the rest of the system (even if it is the only thing one uses)
If we unilaterally adopt some 128b float format, and invent our own ABI (or use that of AAPCS), that might not be what is eventually chosen by Apple - ergo the first step should be to lobby for an official 128b format.
My preference is strongly for using the same as specified in AAPCS (and with the same ABI) - that means there is least for us to maintain that is "different" for Darwin/Mach-O.
speaking purely for myself (in c-family), it is a nuisance when long double does not offer any more precision than double quite a lot of code assumes the contrary .. but I guess in the end if it's spelt ___float128 that's just mechanical editing.

jeffhammond · 2021-12-06T15:00:57Z

If we unilaterally adopt some 128b float format, and invent our own ABI (or use that of AAPCS), that might not be what is eventually chosen by Apple - ergo the first step should be to lobby for an official 128b format.

If Apple declines to respond, does that mean we can never fix this issue for users? I contend that Apple has had their chance to respond, and they have not, so GCC should do the right thing for its users and solve the problem using the best available option. I agree with your arguments that the best non-Apple solution is AAPCS.

I also believe that the only reasonable choice here is REAL*16 = REAL(kind=REAL128) = __float128 = binary128, all with 16 bytes of storage and 16-byte alignment, so I don't see a huge risk of picking this without Apple's involvement. Do we think they are doing to choose something else? What other reasonable options are there?

speaking purely for myself (in c-family), it is a nuisance when long double does not offer any more precision than double quite a lot of code assumes the contrary .. but I guess in the end if it's spelt ___float128 that's just mechanical editing.

On the long double tangent, if I want something well-defined that's wider than binary64, I use binary128, via __float128. I've never seen any evidence for long double to exist except to support x87.

stephentyrone · 2021-12-06T15:26:00Z

I also believe that the only reasonable choice here is REAL*16 = REAL(kind=REAL128) = __float128 = binary128, all with 16 bytes of storage and 16-byte alignment, so I don't see a huge risk of picking this without Apple's involvement.

This is correct. Furthermore, Tim Northover correctly observed in an email correspondence:

The [Darwin] PCS is pretty much covered by falling back to the ARM AAPCS for anything that's not explicitly mentioned as a difference. The relevant parts are worded in terms of "quad precision floating-point type" rather than "long double".

Similarly, the Itanium ABI provides a mangling for __float128 (it's 'g'). If and when support gets added for C++ it's very unlikely we'd diverge from that.

So there's no need to "invent our own ABI (or use that of AAPCS)"; since the Darwin PCS document does not say anything about quad, the behavior is defined by AAPCS.

fxcoudert · 2021-12-06T15:46:33Z

it is a nuisance when long double does not offer any more precision than double

@iains unless I misunderstand something, I think that ship has sailed, anyway. Apple's ABI (https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms) explicitly states:

The long double type is a double precision IEEE754 binary floating-point type, which makes it identical to the double type.

iains · 2021-12-06T15:58:54Z

which is what we implement .. (and does not make any difference to me as a toolchain maintainer)

the comment was from me as a "user" of c-family : I do not see long double as an X87 thing, but then most of my DSP and numerical analysis coding was/is not on X86 platforms, it's still a nuisance to have to edit the source to use a different type for the extra precision. Anyway, we can agree to differ on this ..

Is there a current working port of libquadmath?
the priority for me is to get the main port in (so "patches welcome" for this for now).

jeffhammond · 2021-12-06T16:08:18Z

I don't see why the long double ABI matters here. REAL128 is not long double on most systems. It's certainly not on an x86 or ARM ones.

If Apple has no ABI for binary128 then we fall back on AAPCS.

fxcoudert · 2021-12-06T16:08:30Z

Is there a current working port of libquadmath?

The libquadmath in GCC is designed to provide quad-precision math and I/O routines on targets for which __float128 is implemented (and is a IEEE binary128 floating-point type).

the priority for me is to get the main port in (so "patches welcome" for this for now).

I'll try to come up with something.

stephentyrone · 2021-12-07T15:17:43Z

I have discussed this with the relevant folks, and the following is an official statement:

long double will continue to bind double (IEEE binary64) in the Darwin ABI for ARM64.
if we were to add IEEE binary128 support for C or C++ in Apple's toolchains, it would follow the AAPCS layout and calling conventions for quad-precision floating-point, and the Itanium name mangling.

We're going to look into where it makes the most sense to document this other than a random post on a GitHub issue. I'll add a reference to that once it's available.

iains · 2021-12-07T15:23:57Z

thanks very much for the info!

we are currently looking into (with some success mostly down to FX) in implementing __float128 (which automatically enables C's _Float128 in GCC)

this is currently following the existing aarch64 backend ABI (which is AAPCS64).

It might be necessary to document how we deal with variadic calls, since there are significant differences between darwinpcs and AAPCS64 there.

stephentyrone · 2021-12-07T15:39:20Z

Ok. If you need further clarification on variadic conventions, please let us know.

iains · 2021-12-08T13:54:27Z

I think we should be in a position to close this issue now (technically, the decision is made, and practically the latest sources support __float128 in c-family and IEEE QP in libgfortran).

I am sure that there are wrinkles - but those can be dealt with separately.

Ok. If you need further clarification on variadic conventions, please let us know.

My intention is that the technical content of the README.md will be added to the GCC port directory as documentation for how the Darwin sub-port differs from AAPCS64.

so yes, there might be questions (ISTR that when comparing the two ABIs there were a few things I had to make assumptions about) - but that can be moved to a different issue or topic.
We (GCC-side) should also document (in the same place) the additions that we've made that are not covered by the darwinpcs.

Unless anyone else has questions or additions I will close this in 24h or so.

iains · 2021-12-15T16:16:33Z

we have an implementation on the branch now.

…imize or target pragmas [PR103012] The following testcases ICE when an optimize or target pragma is followed by a long line (4096+ chars). This is because on such long lines we can't use columns anymore, but the cpp_define calls performed by c_cpp_builtins_optimize_pragma or from the backend hooks for target pragma are done on temporary buffers and expect to get columns from whatever line they appear on (which happens to be the long line after optimize/target pragma), and we run into: #0 fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986 #1 0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502 #2 0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827 #3 0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898 #4 0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592 #5 0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394 #6 0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601 #7 0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639 #8 0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589 #9 0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513 #10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522 #11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>) at ../../gcc/c-family/c-cppbuiltin.c:600 assertion that LC_RENAME doesn't happen first. I think the right fix is emit those predefined macros upon optimize/target pragmas with BUILTINS_LOCATION, like we already do for those macros at the start of the TU, they don't appear in columns of the next line after it. Another possibility would be to force them at the location of the pragma. 2021-12-30 Jakub Jelinek <jakub@redhat.com> PR c++/103012 gcc/ * config/i386/i386-c.c (ix86_pragma_target_parse): Perform cpp_define/cpp_undef calls with forced token locations BUILTINS_LOCATION. * config/arm/arm-c.c (arm_pragma_target_parse): Likewise. * config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise. * config/s390/s390-c.c (s390_pragma_target_parse): Likewise. gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform cpp_define_unused/cpp_undef calls with forced token locations BUILTINS_LOCATION. gcc/testsuite/ PR c++/103012 * g++.dg/cpp/pr103012.C: New test. * g++.target/i386/pr103012.C: New test.

…04617] On #define A(n) int foo1##n(void) { return 1##n; } #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9) E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325) B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642) testcase with ./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto -O0 -o foo1.o foo1.c -ffunction-sections ./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o /tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized (testcase too slow to be included into testsuite). The problem is clearly reported by readelf: readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323 readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section. because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE inclusive. Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be used instead and .symtab_shndx section should contain the real section index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >= SHN_LORESERVE value is needed it should put those into Shdr[0].sh_{size,link}. But, sh_{link,info} are 32-bit fields which can contain any section index. Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before 2011) used to mishandle the > 63.75K sections case and assumed there is a hole in between the sections, but what simple_object_elf_copy_lto_debug_sections does wouldn't help in that case for the debug temp object creation, we'd need to detect the case also in that routine and take it into account in the remapping etc. I think it is not worth it given that it is over 10 years, if somebody needs 63.75K or more sections, better use more recent binutils. 2022-02-22 Jakub Jelinek <jakub@redhat.com> PR lto/104617 * simple-object-elf.c (simple_object_elf_match): Fix up URL in comment. (simple_object_elf_copy_lto_debug_sections): Remap sh_info and sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE range (inclusive).

Here we weren't respecting SFINAE when evaluating a call to a consteval function, which caused us to reject the new testcase below. This patch fixes this by making build_over_call use the SFINAE-friendly version of cxx_constant_value. This change causes us to no longer diagnose ahead of time a couple of non-constant non-dependent consteval calls in consteval-if2.C with -fchecking=2. These errors were apparently coming from the call to fold_non_dependent_expr in build_non_dependent_expr (for the RHS of the +=) despite complain=tf_none being passed. Now that build_over_call respects the value of complain during constant evaluation of a consteval call, the errors are gone. That the errors are also gone without -fchecking=2 is a regression caused by r12-7264-gc19f317a78c0e4 and is the subject of PR104620. As described in comment #5, I think it's basically an accident that we were diagnosing these two calls correctly before r12-7264, so perhaps we can live without these errors for GCC 12. Thus this patch just XFAILs the two tests. PR c++/104620 gcc/cp/ChangeLog: * call.cc (build_over_call): Use cxx_constant_value_sfinae instead of cxx_constant_value to evaluate a consteval call. * constexpr.cc (cxx_constant_value_sfinae): Add decl parameter and pass it to cxx_eval_outermost_constant_expr. * cp-tree.h (cxx_constant_value_sfinae): Add decl parameter. * pt.cc (fold_targs_r): Pass NULL_TREE as decl parameter to cxx_constant_value_sfinae. gcc/testsuite/ChangeLog: * g++.dg/cpp23/consteval-if2.C: XFAIL two dg-error tests where the argument to the non-constant non-dependent consteval call is wrapped by NON_DEPENDENT_EXPR. * g++.dg/cpp2a/consteval30.C: New test.

I noticed that for member class templates of a class template we were unnecessarily substituting both the template and its type. Avoiding that duplication speeds compilation of this silly testcase from ~12s to ~9s on my laptop. It's unlikely to make a difference on any real code, but the simplification is also nice. We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation of the template class, but it makes more sense to do that in tsubst_template_decl anyway. #define NC(X) \ template <class U> struct X##1; \ template <class U> struct X##2; \ template <class U> struct X##3; \ template <class U> struct X##4; \ template <class U> struct X##5; \ template <class U> struct X##6; #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f) #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E) template <int I> struct A { NC3(am) }; template <class...Ts> void sink(Ts...); template <int...Is> void g() { sink(A<Is>()...); } template <int I> void f() { g<__integer_pack(I)...>(); } int main() { f<1000>(); } gcc/cp/ChangeLog: * pt.cc (instantiate_class_template): Skip the RECORD_TYPE of a class template. (tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.

Consider constexpr int VAL = 1; struct foo { template <int B> void bar(typename std::conditional<B==VAL, int, float>::type arg) { } }; template void foo::bar<1>(int arg); where we since r11-291 fail to emit the code for the explicit instantiation. That's because cp_walk_subtrees/TYPENAME_TYPE now walks TYPE_CONTEXT ('conditional' here) as well, and in a template finds the B==VAL template argument. VAL is constexpr, which implies const, which in the global scope implies static. constrain_visibility_for_template then makes "struct conditional<(B == VAL), int, float>" non-TREE_PUBLIC. Then symtab_node::needed_p checks TREE_PUBLIC, sees it's 0, and we don't emit any code. I thought the fix would be some ODR-esque check to not consider constexpr variables/fns that are used just for their value. But it turned out to be tricky. For instance, we can't skip determine_visibility in a template; we can't even skip it for value-dep expressions. For example, no-linkage-expr1.C has using P = struct {}*; template <int N> void f(int(*)[((P)0, N)]) {} where ((P)0, N) is value-dep, but N is not relevant here: we have to ferret out the anonymous type. When instantiating, it's already gone. This patch uses decl_constant_var_p. This is to implement (an approximation) [basic.def.odr]#14.5.1 and [basic.def.odr]#5.2. PR c++/110323 gcc/cp/ChangeLog: * decl2.cc (min_vis_expr_r) <case VAR_DECL>: Do nothing for decl_constant_var_p VAR_DECLs. gcc/testsuite/ChangeLog: * g++.dg/template/explicit-instantiation6.C: New test. * g++.dg/template/explicit-instantiation7.C: New test.

iains added the enhancement New feature or request label Sep 3, 2020

fxcoudert mentioned this issue Apr 7, 2021

compile GCC without quadmath on Apple Silicon to unbreak Fortran REAL128 Homebrew/homebrew-core#73949

Closed

wiremoons mentioned this issue Aug 21, 2021

Build error: Error: Invalid type-spec (./src/M_strings.f90:9602:34) urbanjost/M_strings#2

Open

Romendakil mentioned this issue Sep 17, 2021

stdlib build error with fpm (macOS Apple Silicon) fortran-lang/stdlib#527

Closed

iains closed this as completed Dec 15, 2021

mjuric mentioned this issue Apr 14, 2022

Can we avoid the use of REAL*16 type? oorb/oorb#146

Closed

davidchall mentioned this issue May 3, 2022

gfortran libquadmath unavailable on Apple Silicon davidchall/homebrew-hep#203

Closed

amontoison mentioned this issue Aug 19, 2024

Try the sifdecoder on Mac Silicon ralna/SIFDecode#25

Merged

need to decide on how to support FLOAT*16 for Fortran. #5

need to decide on how to support FLOAT*16 for Fortran. #5

Comments

iains commented Aug 30, 2020

fxcoudert commented Aug 30, 2020

fxcoudert commented Aug 30, 2020

rhdtownsend commented Dec 20, 2020

iains commented Dec 21, 2020

stephentyrone commented Mar 26, 2021 • edited Loading

iains commented Mar 26, 2021

iains commented Mar 26, 2021

fxcoudert commented Mar 26, 2021

utf commented Nov 5, 2021

iains commented Nov 5, 2021

utf commented Nov 5, 2021

jeffhammond commented Dec 6, 2021

fxcoudert commented Dec 6, 2021

jeffhammond commented Dec 6, 2021

iains commented Dec 6, 2021

jeffhammond commented Dec 6, 2021

iains commented Dec 6, 2021

jeffhammond commented Dec 6, 2021 • edited Loading

fxcoudert commented Dec 6, 2021 • edited Loading

iains commented Dec 6, 2021

iains commented Dec 6, 2021

jeffhammond commented Dec 6, 2021

jeffhammond commented Dec 6, 2021

iains commented Dec 6, 2021 • edited Loading

jeffhammond commented Dec 6, 2021

stephentyrone commented Dec 6, 2021

fxcoudert commented Dec 6, 2021

iains commented Dec 6, 2021

jeffhammond commented Dec 6, 2021

fxcoudert commented Dec 6, 2021

stephentyrone commented Dec 7, 2021

iains commented Dec 7, 2021

stephentyrone commented Dec 7, 2021

iains commented Dec 8, 2021

iains commented Dec 15, 2021

stephentyrone commented Mar 26, 2021 •

edited

Loading

jeffhammond commented Dec 6, 2021 •

edited

Loading

fxcoudert commented Dec 6, 2021 •

edited

Loading

iains commented Dec 6, 2021 •

edited

Loading