Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need to decide on how to support FLOAT*16 for Fortran. #5

Closed
iains opened this issue Aug 30, 2020 · 35 comments
Closed

need to decide on how to support FLOAT*16 for Fortran. #5

iains opened this issue Aug 30, 2020 · 35 comments
Labels
enhancement New feature or request

Comments

@iains
Copy link
Owner

iains commented Aug 30, 2020

Unfortunately, the current arm64 iOS/macOS long double is 64 bit - we need to figure out some reasonable way to give users access to a larger version.

immediate options:

  • support IEEE754-128 somehow (soft/hard as possible). This might be a bit painful performance-wise, TBD.
  • if performance is an issue, there's always "ibm128 - double double".
@fxcoudert
Copy link
Contributor

From the Fortran point of view: if we have access to binary32 and binary64 floating-point types, not having larger types is not a show-stopper, because:

  • There are Fortran codes out there that use 80-bit extended-precision floating-point types, but they are not portable anyway, and compilers have traditionally not been consistent w.r.t. access to such a type.
  • Support for binary128 is nice (I am not going to say otherwise, as I was among those who introduced libquadmath and Fortran support for binary128), but… here again, codes using it are the exception, rather than the norm — because not all compilers support it anyway.

@fxcoudert
Copy link
Contributor

Here is the current situation confirmed with a build of 6363793:

  • there are 5 integer (and logical) kinds, which I believe is correct: 8, 16, 32, 64 and 128-bit integer types
  • there are only 2 real (and complex) kinds: 32 and 64-bit floating-point types

Both the front-end and libgfortran correctly detect and build for those kinds. From my point of view, this is standard-conforming under the Fortran standard.

I note that we could in theory expose _Float16 as a Fortran real and complex kind, but we don't do it on other ARM targets and I do not think there is a real need. And we would need to have math functions for it.

@iains iains added the enhancement New feature or request label Sep 3, 2020
@rhdtownsend
Copy link

From the Fortran point of view: if we have access to binary32 and binary64 floating-point types, not having larger types is not a show-stopper, because:

* There are Fortran codes out there that use 80-bit extended-precision floating-point types, but they are not portable anyway, and compilers have traditionally not been consistent w.r.t. access to such a type.

* Support for binary128 is nice (I am not going to say otherwise, as I was among those who introduced libquadmath and Fortran support for binary128), but… here again, codes using it are the exception, rather than the norm — because not all compilers support it anyway.

Given that gcc on x86_64/linux and aarch64/linux supports 128-bit floats, I think it would be very important to offer them on aarch64/darwin. Plus, there are a couple of big codes that I use/develop that rely on quad precision at certain points in the execution path.

@iains
Copy link
Owner Author

iains commented Dec 21, 2020

(as a writer of scientific and signal processing code) I am inclined to agree, there are times when intermediates for a double calc are easier to handle as quads than some numerical gymnastic.

(As a toolchain developer) I have to point out that this is not just something that requires us to write a bit of library code and/or support in the compiler.

Adding a different-size type requires ABI changes, C++ mangling, C++ library support and (in some way) libc support [think, printf]. GCC's Fortran is part of a suite of compilers, that share much common infrastructure.

While we could adopt the [128b long double] ABI as per AAPCS (the aarch64 'official' ABI) .. in fact, Darwin/macOS has a different ABI (darwinpcs) that specifies 64b long double. It would be prudent, at least, to discuss the way forward with the Apple folks.

iains pushed a commit that referenced this issue Feb 28, 2021
/home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77:25: runtime error: left shift of 0x0000000000000000fffffffffffffffb by 96 places cannot be represented in type '__int128'
    #0 0x7ffff754edfe in __ubsan::Value::getSIntValue() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77
    #1 0x7ffff7548719 in __ubsan::Value::isNegative() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.h:190
    #2 0x7ffff7542a34 in handleShiftOutOfBoundsImpl /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:338
    #3 0x7ffff75431b7 in __ubsan_handle_shift_out_of_bounds /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:370
    #4 0x40067f in main (/home/marxin/Programming/testcases/a.out+0x40067f)
    #5 0x7ffff72c8b24 in __libc_start_main (/lib64/libc.so.6+0x27b24)
    #6 0x4005bd in _start (/home/marxin/Programming/testcases/a.out+0x4005bd)

Differential Revision: https://reviews.llvm.org/D97263

Cherry-pick from 16ede0956cb1f4b692dfa619ccfa6ab1de28e19b.
@stephentyrone
Copy link

stephentyrone commented Mar 26, 2021

I'll talk to some folks within Apple and see if we can add an explicit C ABI for _Float128 in the darwinpcs. Speaking personally, double-double is something that no language should foist on users unless they explicitly say they want it. Binding it to FLOAT*16 would be quite unfortunate.

@iains
Copy link
Owner Author

iains commented Mar 26, 2021

(some of) the Apple folks are already aware of the interest.

what would be great is to minimise any additional divergence between the AAPCS and darwinpcs - every such difference is a maintenance burden (my guess is that's equally true for those responsible for maintaining the aarch64 LLVM backend).

At least in terms of CC and type mangling etc that's very highly desirable (of course, if already have [64] libc++ symbols mangled as 'long double' then we will be obliged to use a different mangling for __float128, _Float128 (however it is eventually spelt).

The transition from mlong-double-64 => mlong-double-128 was handled once before at the OS level with libc symbols with $LDBL128 appended. That worked (and continues to work) well on powerpc-darwin (the fact that the underlying implementation of the 128b version is double-double there is academic to the overall presentation of an API).

@iains
Copy link
Owner Author

iains commented Mar 26, 2021

note that if one did use double-double as a stop-gap, it would be mangled __ibm128 for compatibility - so would not actually tread on the toes of a final solution.

It might be instructive to look at how fortran handles the two possible 128 long doubles for powerpc ...

(there are more pressing code-gen issues on my agenda before this one tho, sadly).

@fxcoudert
Copy link
Contributor

Given that aarch64-linux gcc supports TFmode (which it calls long double, as I understand), we should ideally use the same ABI as that (except in our case it would be called _float128).

Nobody wants double-double/ibm128.

iains pushed a commit that referenced this issue Jun 18, 2021
gcc/ada/

	* raise-gcc.c (__gnat_others_value): Remove const qualifier.
	(__gnat_all_others_value): Likewise.
	(__gnat_unhandled_others_value): Likewise.
	(GNAT_OTHERS): Cast to Exception_Id instead of _Unwind_Ptr.
	(GNAT_ALL_OTHERS): Likewise.
	(GNAT_UNHANDLED_OTHERS): Likewise.
	(Is_Handled_By_Others): Change parameter type to Exception_Id.
	(Language_For): Likewise.
	(Foreign_Data_For): Likewise.
	(is_handled_by): Likewise.  Adjust throughout, remove redundant
	line and fix indentation.
	* libgnat/a-exexpr.adb (Is_Handled_By_Others): Remove pragma and
	useless qualification from parameter type.
	(Foreign_Data_For): Likewise.
	(Language_For): Likewise.
iains pushed a commit that referenced this issue Jun 18, 2021
The fixed error is:

==21166==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60300000d900
    #0 0x7367d7 in operator delete(void*, unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:172
    #1 0x3b82e6e in pointer_equiv_analyzer::~pointer_equiv_analyzer() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:161
    #2 0x3b83387 in hybrid_folder::~hybrid_folder() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:517
    #3 0x3b83387 in execute_early_vrp /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:686
    #4 0x1790611 in execute_one_pass(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2567
    #5 0x1792003 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2656
    #6 0x1792029 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2657
    #7 0x179209f in execute_pass_list(function*, opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2667
    #8 0x178a5f3 in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:1773
    #9 0x1792fac in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/plugin.h:191
    #10 0x1792fac in execute_ipa_pass_list(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:3001
    #11 0xc525fc in ipa_passes /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2154
    #12 0xc525fc in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2289
    #13 0xc5a096 in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2269
    #14 0xc5a096 in symbol_table::finalize_compilation_unit() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2537
    #15 0x1a7a17c in compile_file /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:482
    #16 0x69c758 in do_compile /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2210
    #17 0x69c758 in toplev::main(int, char**) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2349
    #18 0x6a932a in main /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/main.c:39
    #19 0x7ffff7820b34 in __libc_start_main ../csu/libc-start.c:332
    #20 0x6aa5fd in _start (/home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/objdir/gcc/cc1+0x6aa5fd)

0x60300000d900 is located 0 bytes inside of 32-byte region [0x60300000d900,0x60300000d920)
allocated by thread T0 here:
    #0 0x735ab7 in operator new[](unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:102
    #1 0x3b82dac in pointer_equiv_analyzer::pointer_equiv_analyzer(gimple_ranger*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:156

gcc/ChangeLog:

	* gimple-ssa-evrp.c (pointer_equiv_analyzer::~pointer_equiv_analyzer): Use delete[].
iains pushed a commit that referenced this issue Jul 11, 2021
…atch.pd

All of the optimizations/transformations mentioned in bugzilla for
PR tree-optimization/40210 are already implemented in mainline GCC,
with one exception.  In comment #5, there's a suggestion that
(bswap64(x)>>56)&0xff can be implemented without the bswap as
(unsigned char)x, or equivalently x&0xff.

This patch implements the above optimization, and closely related
variants.  For any single bit, (bswap(X)>>C1)&1 can be simplified
to (X>>C2)&1, where bit position C2 is the appropriate permutation
of C1.  Similarly, the bswap can eliminated if the desired set of
bits all lie within the same byte, hence (bswap(x)>>8)&255 can
always be optimized, as can (bswap(x)>>8)&123.

Previously,
int foo(long long x) {
  return (__builtin_bswap64(x) >> 56) & 0xff;
}

compiled with -O2 to
foo:	movq    %rdi, %rax
        bswap   %rax
        shrq    $56, %rax
        ret

with this patch, it now compiles to
foo:	movzbl  %dil, %eax
        ret

2021-07-08  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR tree-optimization/40210
	* match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as
	(x>>C3)&C2 when possible.  Simplify bswap(x)>>C1 as ((T)x)>>C2
	when possible.  Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255.

gcc/testsuite/ChangeLog
	PR tree-optimization/40210
	* gcc.dg/builtin-bswap-13.c: New test.
	* gcc.dg/builtin-bswap-14.c: New test.
@utf
Copy link

utf commented Nov 5, 2021

Hi all. Are there any updates on this issue?

@iains
Copy link
Owner Author

iains commented Nov 5, 2021

no, the priority at the moment is on getting the base port ready for inclusion in gcc-12
We would not be able to do this "unilaterally" anyway - the ABI owner is Apple, so we might make a suggestion (or offer a design) - but they would have to initiate the changes, and I'd imagine that would require a clang implementation as well.

@utf
Copy link

utf commented Nov 5, 2021

Ok, thanks for the update. All your work on this port is very much appreciated.

iains pushed a commit that referenced this issue Nov 28, 2021
Fixes:

==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000666ca5c at pc 0x000000ef094b bp 0x7fffffff8180 sp 0x7fffffff8178
READ of size 4 at 0x00000666ca5c thread T0
    #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855
    #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916
    #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887
    #3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829
    #4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427
    #5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346
    #6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967
    #7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808
    #8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146

for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d.

gcc/d/ChangeLog:

	* d-attribs.cc (parse_optimize_options): Check index before
	accessing cl_options.
@jeffhammond
Copy link

Is it possible to an a configure option to tell GCC to use int128 as the ABI for binary128? What doesn't work in that scenario?

@fxcoudert
Copy link
Contributor

Is it possible to an a configure option to tell GCC to use int128 as the ABI for binary128? What doesn't work in that scenario?

I think the limitation is not “what do we chose” but “how do we make sure the choices made are consistent”. In a way, Apple “owns” this ABI and we probably cannot decide without them.

@jeffhammond
Copy link

Given that Apple doesn't have a Fortran compiler, I don't see how GCC could be inconsistent with their REAL*16 ABI. I also think 15 months is more than enough opportunity for Apple to have contributed here, and their failure to contribute means that the open-source community should be able to solve the problem themselves. Perhaps seeing the ecosystem move on without them will inspire Apple to do something here. It's not like they lack the resources.

@iains
Copy link
Owner Author

iains commented Dec 6, 2021

In a way, Apple “owns” this ABI and we probably cannot decide without them.

  • I think this is the critical factor, if there's a demand for 128b float (I would expect that to come at some point), then raising radars with Apple is a good starting point (since the absence affects C-family as well and we would need a coherent change to allow c-bindings to work)
  • For my 0.02GBP it would be sad if the chosen ABI diverged more from the AAPCS than we already have, so engaging in the discussion with the Apple ABI folks will also be on my agenda.
  • It also needs support from libSystem (libc) for printing etc. and a sensible migration path (e.g. as for the case when PPC migrated from long-double-64 => long-double-128 - we have to attach some discriminator to the builtins and libc calls).

@jeffhammond
Copy link

Can we at least converge on the idea that REAL*16 must be equivalent to REAL(kind=REAL128) and this must be binary128, ie that neither long double nor double double is relevant? That means we can connect directly to C/C++ float128.

I think we're mostly here but I figure it's good to be unambiguous.

@iains
Copy link
Owner Author

iains commented Dec 6, 2021

" C/C++ float128 " is defined where and with what ABI?

The version of the darwinpcs I am working from has no 128b float (C long double is 64 bits).

Iff the port transitions to having a 128b float type, then I would expect C long double to be using that ( at least selectively with -mlong-double-128 as is done elsewhere ) .. at some point one would also expect that to become the default and then -mlong-double-64 would be provided as a legacy option.

This is the kind of mechanism that has been used in the past.

It seems that no-one is interested in using double-double, so that is a moot point (although as can be seen with the current PPC64 work, there are strategies for supporting both that and IEEE128 in the same toolchain)

@jeffhammond
Copy link

jeffhammond commented Dec 6, 2021

I'm not saying Apple has a C/C++ float128 ABI, I'm merely saying that mapping REAL*16 to that simplifies the problem and gives Apple more of a reason to care.

I don't think long double is worth thinking about. It's a garbage feature of C that exists to expose x87 80-bit floats, and is beyond useless in any other context, because it's not even a platform stable ABI (Linux yes, but not always).

@fxcoudert
Copy link
Contributor

fxcoudert commented Dec 6, 2021

My own take on this:

  • No-one wants double-double. Even on PPC64, where this exists, the transition to IEEE128 is painful
  • IEEE128 exists, it's there, it's become a real standard, let's use it
  • libSystem support is something I would not expect, and that is not relevant for Fortran: we provide our own libraries for printing and math (libquadmath)
  • On Intel, there was never any system support for __float128, and people still used it :) so having it on ARM would allow us to reach feature ARM/Intel feature parity, at least

Iff the port transitions to having a 128b float type, then I would expect C long double to be using that ( at least selectively with -mlong-double-128 as is done elsewhere )

I disagree with that. On most targets, C long double is equal to double, at least with default options. We should keep it that way, as there may be code that expects it. In fact, I think we have to keep double = long double != __float128, to retain compatibility with ABI and system clang, which treats long double as the same size as double:

$ cat u.c
#include <math.h>
#include <stdio.h>

int main (void) {
  long double x = 2;
  printf("%20.17Lg\n", sqrtl(x));
  printf("%zi\n", sizeof x);
}
$ clang u.c -W -Wall && ./a.out 
  1.4142135623730951
8

Edit, to add clarification on that sqrtl example above: also, I don't think we should or can switch long double to something other than Apple's choice (long double = double), because then the library functions would not match anymore, and that's a nightmare (it happened on powerpc-darwin at some point, and created lot of trouble.

@iains
Copy link
Owner Author

iains commented Dec 6, 2021

Well, I understand that your focus is on Fortran - but some of us transitioned away from Fortran around the same time the VAX 11/785 was retired (not a co-incidence sometimes one has no choice on employer's decisions). We carried on writing numerical analysis, signal processing and simulation code (just now in C) .. if Fortran users have a requirement for 128b float, then so do c-family users ;)

.. clearly, most of the use of aarch64 darwin has been under iOS and therefore has not seen any such desires(?) - but I'd think to move forward, the platform would have to adopt an 128b float type and agree ABI before we could plug it into fortran.

That would mean that clang and the relevant parts of the system would be updated to deal with this; FWIW (despite that you recall pain associated) the approach used for PPC is solid and functional to this day (completely independent of what the 128bits represent).

@iains
Copy link
Owner Author

iains commented Dec 6, 2021

clarification: none of what I've said implies switching from Apple's choice - rather, my thesis is that we have to adhere to Apple's choice - if they elect not to adopt a 128b float (officially) then we might think again - but in the first instance, the right solution is to lobby for an official 128b float type on the platform.

I do not think it significant that no Apple comments have been made on this thread (there is no reason for anyone from Apple to read it).

@jeffhammond
Copy link

Well, I understand that your focus is on Fortran

It's my focus here because it's in the title of the issue, and I came here as a result of not being about to build Fortran applications on my Apple M1 laptop.

@jeffhammond
Copy link

I do not think it significant that no Apple comments have been made on this thread (there is no reason for anyone from Apple to read it).

That's not correct (#5 (comment)), although I know Steve isn't speaking with institutional authority here.

@iains
Copy link
Owner Author

iains commented Dec 6, 2021

I fear there is an amount of "talking past" going on here...

  • Fortran does not exist independently of the rest of the system (even if it is the only thing one uses)

  • If we unilaterally adopt some 128b float format, and invent our own ABI (or use that of AAPCS), that might not be what is eventually chosen by Apple - ergo the first step should be to lobby for an official 128b format.

  • My preference is strongly for using the same as specified in AAPCS (and with the same ABI) - that means there is least for us to maintain that is "different" for Darwin/Mach-O.

  • speaking purely for myself (in c-family), it is a nuisance when long double does not offer any more precision than double quite a lot of code assumes the contrary .. but I guess in the end if it's spelt ___float128 that's just mechanical editing.

@jeffhammond
Copy link

If we unilaterally adopt some 128b float format, and invent our own ABI (or use that of AAPCS), that might not be what is eventually chosen by Apple - ergo the first step should be to lobby for an official 128b format.

If Apple declines to respond, does that mean we can never fix this issue for users? I contend that Apple has had their chance to respond, and they have not, so GCC should do the right thing for its users and solve the problem using the best available option. I agree with your arguments that the best non-Apple solution is AAPCS.

I also believe that the only reasonable choice here is REAL*16 = REAL(kind=REAL128) = __float128 = binary128, all with 16 bytes of storage and 16-byte alignment, so I don't see a huge risk of picking this without Apple's involvement. Do we think they are doing to choose something else? What other reasonable options are there?

speaking purely for myself (in c-family), it is a nuisance when long double does not offer any more precision than double quite a lot of code assumes the contrary .. but I guess in the end if it's spelt ___float128 that's just mechanical editing.

On the long double tangent, if I want something well-defined that's wider than binary64, I use binary128, via __float128. I've never seen any evidence for long double to exist except to support x87.

@stephentyrone
Copy link

I also believe that the only reasonable choice here is REAL*16 = REAL(kind=REAL128) = __float128 = binary128, all with 16 bytes of storage and 16-byte alignment, so I don't see a huge risk of picking this without Apple's involvement.

This is correct. Furthermore, Tim Northover correctly observed in an email correspondence:

The [Darwin] PCS is pretty much covered by falling back to the ARM AAPCS for anything that's not explicitly mentioned as a difference. The relevant parts are worded in terms of "quad precision floating-point type" rather than "long double".

Similarly, the Itanium ABI provides a mangling for __float128 (it's 'g'). If and when support gets added for C++ it's very unlikely we'd diverge from that.

So there's no need to "invent our own ABI (or use that of AAPCS)"; since the Darwin PCS document does not say anything about quad, the behavior is defined by AAPCS.

@fxcoudert
Copy link
Contributor

it is a nuisance when long double does not offer any more precision than double

@iains unless I misunderstand something, I think that ship has sailed, anyway. Apple's ABI (https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms) explicitly states:

The long double type is a double precision IEEE754 binary floating-point type, which makes it identical to the double type.

@iains
Copy link
Owner Author

iains commented Dec 6, 2021

which is what we implement .. (and does not make any difference to me as a toolchain maintainer)

the comment was from me as a "user" of c-family : I do not see long double as an X87 thing, but then most of my DSP and numerical analysis coding was/is not on X86 platforms, it's still a nuisance to have to edit the source to use a different type for the extra precision. Anyway, we can agree to differ on this ..

  • Is there a current working port of libquadmath?

  • the priority for me is to get the main port in (so "patches welcome" for this for now).

@jeffhammond
Copy link

I don't see why the long double ABI matters here. REAL128 is not long double on most systems. It's certainly not on an x86 or ARM ones.

If Apple has no ABI for binary128 then we fall back on AAPCS.

@fxcoudert
Copy link
Contributor

Is there a current working port of libquadmath?

The libquadmath in GCC is designed to provide quad-precision math and I/O routines on targets for which __float128 is implemented (and is a IEEE binary128 floating-point type).

the priority for me is to get the main port in (so "patches welcome" for this for now).

I'll try to come up with something.

@stephentyrone
Copy link

I have discussed this with the relevant folks, and the following is an official statement:

  1. long double will continue to bind double (IEEE binary64) in the Darwin ABI for ARM64.
  2. if we were to add IEEE binary128 support for C or C++ in Apple's toolchains, it would follow the AAPCS layout and calling conventions for quad-precision floating-point, and the Itanium name mangling.

We're going to look into where it makes the most sense to document this other than a random post on a GitHub issue. I'll add a reference to that once it's available.

@iains
Copy link
Owner Author

iains commented Dec 7, 2021

thanks very much for the info!

we are currently looking into (with some success mostly down to FX) in implementing __float128 (which automatically enables C's _Float128 in GCC)

this is currently following the existing aarch64 backend ABI (which is AAPCS64).

It might be necessary to document how we deal with variadic calls, since there are significant differences between darwinpcs and AAPCS64 there.

@stephentyrone
Copy link

Ok. If you need further clarification on variadic conventions, please let us know.

@iains
Copy link
Owner Author

iains commented Dec 8, 2021

I think we should be in a position to close this issue now (technically, the decision is made, and practically the latest sources support __float128 in c-family and IEEE QP in libgfortran).

I am sure that there are wrinkles - but those can be dealt with separately.

Ok. If you need further clarification on variadic conventions, please let us know.

My intention is that the technical content of the README.md will be added to the GCC port directory as documentation for how the Darwin sub-port differs from AAPCS64.

  • so yes, there might be questions (ISTR that when comparing the two ABIs there were a few things I had to make assumptions about) - but that can be moved to a different issue or topic.

  • We (GCC-side) should also document (in the same place) the additions that we've made that are not covered by the darwinpcs.

Unless anyone else has questions or additions I will close this in 24h or so.

@iains
Copy link
Owner Author

iains commented Dec 15, 2021

we have an implementation on the branch now.

@iains iains closed this as completed Dec 15, 2021
iains pushed a commit that referenced this issue Jan 2, 2022
…imize or target pragmas [PR103012]

The following testcases ICE when an optimize or target pragma
is followed by a long line (4096+ chars).
This is because on such long lines we can't use columns anymore,
but the cpp_define calls performed by c_cpp_builtins_optimize_pragma
or from the backend hooks for target pragma are done on temporary
buffers and expect to get columns from whatever line they appear on
(which happens to be the long line after optimize/target pragma),
and we run into:
 #0  fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986
 #1  0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502
 #2  0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827
 #3  0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898
 #4  0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592
 #5  0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394
 #6  0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601
 #7  0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639
 #8  0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589
 #9  0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513
 #10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522
 #11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>)
     at ../../gcc/c-family/c-cppbuiltin.c:600
assertion that LC_RENAME doesn't happen first.

I think the right fix is emit those predefined macros upon
optimize/target pragmas with BUILTINS_LOCATION, like we already do
for those macros at the start of the TU, they don't appear in columns
of the next line after it.  Another possibility would be to force them
at the location of the pragma.

2021-12-30  Jakub Jelinek  <jakub@redhat.com>

	PR c++/103012
gcc/
	* config/i386/i386-c.c (ix86_pragma_target_parse): Perform
	cpp_define/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
	* config/arm/arm-c.c (arm_pragma_target_parse): Likewise.
	* config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise.
	* config/s390/s390-c.c (s390_pragma_target_parse): Likewise.
gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform
	cpp_define_unused/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
gcc/testsuite/
	PR c++/103012
	* g++.dg/cpp/pr103012.C: New test.
	* g++.target/i386/pr103012.C: New test.
iains pushed a commit that referenced this issue Feb 26, 2022
…04617]

On
 #define A(n) int foo1##n(void) { return 1##n; }
 #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
 #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
 #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
 #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
 E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
 B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c -ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
contain any section index.

Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc.  I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.

2022-02-22  Jakub Jelinek  <jakub@redhat.com>

	PR lto/104617
	* simple-object-elf.c (simple_object_elf_match): Fix up URL
	in comment.
	(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
	sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
	range (inclusive).
iains pushed a commit that referenced this issue Mar 26, 2022
Here we weren't respecting SFINAE when evaluating a call to a consteval
function, which caused us to reject the new testcase below.  This patch
fixes this by making build_over_call use the SFINAE-friendly version of
cxx_constant_value.

This change causes us to no longer diagnose ahead of time a couple of
non-constant non-dependent consteval calls in consteval-if2.C with
-fchecking=2.  These errors were apparently coming from the call to
fold_non_dependent_expr in build_non_dependent_expr (for the RHS of the +=)
despite complain=tf_none being passed.  Now that build_over_call respects
the value of complain during constant evaluation of a consteval call,
the errors are gone.

That the errors are also gone without -fchecking=2 is a regression caused
by r12-7264-gc19f317a78c0e4 and is the subject of PR104620.  As described
in comment #5, I think it's basically an accident that we were diagnosing
these two calls correctly before r12-7264, so perhaps we can live without
these errors for GCC 12.  Thus this patch just XFAILs the two tests.

	PR c++/104620

gcc/cp/ChangeLog:

	* call.cc (build_over_call): Use cxx_constant_value_sfinae
	instead of cxx_constant_value to evaluate a consteval call.
	* constexpr.cc (cxx_constant_value_sfinae): Add decl parameter
	and pass it to cxx_eval_outermost_constant_expr.
	* cp-tree.h (cxx_constant_value_sfinae): Add decl parameter.
	* pt.cc (fold_targs_r): Pass NULL_TREE as decl parameter to
	cxx_constant_value_sfinae.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp23/consteval-if2.C: XFAIL two dg-error tests where
	the argument to the non-constant non-dependent consteval call is
	wrapped by NON_DEPENDENT_EXPR.
	* g++.dg/cpp2a/consteval30.C: New test.
iains pushed a commit that referenced this issue May 26, 2023
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X)					\
    template <class U> struct X##1;		\
    template <class U> struct X##2;		\
    template <class U> struct X##3;		\
    template <class U> struct X##4;		\
    template <class U> struct X##5;		\
    template <class U> struct X##6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

	* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
	of a class template.
	(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
iains pushed a commit that referenced this issue Mar 25, 2024
Consider

  constexpr int VAL = 1;
  struct foo {
      template <int B>
      void bar(typename std::conditional<B==VAL, int, float>::type arg) { }
  };
  template void foo::bar<1>(int arg);

where we since r11-291 fail to emit the code for the explicit
instantiation.  That's because cp_walk_subtrees/TYPENAME_TYPE now
walks TYPE_CONTEXT ('conditional' here) as well, and in a template
finds the B==VAL template argument.  VAL is constexpr, which implies const,
which in the global scope implies static.  constrain_visibility_for_template
then makes "struct conditional<(B == VAL), int, float>" non-TREE_PUBLIC.
Then symtab_node::needed_p checks TREE_PUBLIC, sees it's 0, and we don't
emit any code.

I thought the fix would be some ODR-esque check to not consider
constexpr variables/fns that are used just for their value.  But
it turned out to be tricky.  For instance, we can't skip
determine_visibility in a template; we can't even skip it for value-dep
expressions.  For example, no-linkage-expr1.C has

  using P = struct {}*;
  template <int N>
  void f(int(*)[((P)0, N)]) {}

where ((P)0, N) is value-dep, but N is not relevant here: we have to
ferret out the anonymous type.  When instantiating, it's already gone.

This patch uses decl_constant_var_p.  This is to implement (an
approximation) [basic.def.odr]#14.5.1 and [basic.def.odr]#5.2.

	PR c++/110323

gcc/cp/ChangeLog:

	* decl2.cc (min_vis_expr_r) <case VAR_DECL>: Do nothing for
	decl_constant_var_p VAR_DECLs.

gcc/testsuite/ChangeLog:

	* g++.dg/template/explicit-instantiation6.C: New test.
	* g++.dg/template/explicit-instantiation7.C: New test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants