Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proper build system which automatically detects architecture (or at least add aarch64 support) #2355

Closed
ThanosApostolou opened this issue Oct 10, 2019 · 8 comments

Comments

@ThanosApostolou
Copy link

It's a little bit messy to build stockfish for multiple architectures by passing the ARCH variable. Is adding a build system like cmake or meson something that you would consider?

Also there is currently not an architecture option for aarch64 and it's not clear from the makefile how I can manually override this easily. So, I when I included stockfish as a module for gnome-chess at flathub I chose to use armv7 even for the aarch64 architecture as you can see here:
flathub/org.gnome.Chess@01ff82e
Maybe you can add an aarch64 option too at the makefile, until you consider the decision on the build system?

@snicolet
Copy link
Member

snicolet commented Jan 9, 2020

Adding an aarch64 option in the makefile: sure, we could consider this. However, the experience of aarch64 in the SF team is quite limited, so maybe you would be the best person to propose a patch as you obviously have access to such a machine?

About cmake or meson: probably not an option, because each build tool we add needs extra maintenance, and we prefer to keep SF toolchain as simple as possible.

@hgy59
Copy link

hgy59 commented May 3, 2020

Please add aarch64 (ARM64) support to src/Makefile as this is the prefered ARM arch for Synology's Diskstation (NAS) and supported by Raspberry Pi 3 and 4.

@vondele
Copy link
Member

vondele commented May 3, 2020

I assume you have access ? In that case, suggest patches (as a pull request or as a diff), indicating how they have been tested.

@bftjoe
Copy link
Contributor

bftjoe commented May 9, 2020

There already is cmake/Visual Studio support but it's autogenerated by appveyor.yml

I extracted it and shortened it here: https://github.com/bftjoe/Stockfish/blob/master/CMakeLists.txt

Not sure how it's more maintenance to include CMakeLists.txt in the repo instead of autogenerating it...

@abdulbadii
Copy link

abdulbadii commented Jun 22, 2020

try its options prefixed by CXXFLAGS= in place of ARC= COMP=.
guided by GCC reference:
AArch64 Options. These options are defined for AArch64 implementations:

-mabi=name
Generate code for the specifi ed data model. Permissible values are‘ ilp32’ for SysV-like data model where int, long int and pointers are 32 bits, and ‘lp64’ for SysV-like data model where int is 32 bits, but long int and pointers are 64 bits.
The default depends on the specific target configuration. Note that the LP64 and ILP32 ABIs
are not link-compatible; you must compile your entire program with the same ABI, and link
with a compatible set of libraries.
-mbig-endian
Generate big-endian code. This is the default when GCC is configured for an ‘aarch64_be--’ target.
-mgeneral-regs-only
Generate code which uses only the general-purpose registers. This will prevent the compiler from
using floating-point and Advanced SIMD registers but will not impose any restrictions on the assembler.
-mlittle-endian
Generate little-endian code. This is the default when GCC is configured for an
‘aarch64--’ but not an ‘aarch64_be--’ target.
-mcmodel=tiny
Generate code for the tiny code model. The program and its statically defined
symbols must be within 1MB of each other. Programs can be statically or
dynamically linked.
-mcmodel=small
Generate code for the small code model. The program and its statically defined
symbols must be within 4GB of each other. Programs can be statically or
dynamically linked. This is the default code model.
-mcmodel=large
Generate code for the large code model. This makes no assumptions about
addresses and sizes of sections. Programs can be statically linked only.
-mstrict-align
-mno-strict-align
Avoid or allow generating memory accesses that may not be aligned on a natural
object boundary as described in the architecture specification.
-momit-leaf-frame-pointer
-mno-omit-leaf-frame-pointer
Omit or keep the frame pointer in leaf functions. The former behavior is the
default.
-mstack-protector-guard=guard
-mstack-protector-guard-reg=reg
-mstack-protector-guard-offset=offset
Generate stack protection code using canary at guard. Supported locations are
‘global’ for a global canary or ‘sysreg’ for a canary in an appropriate system
register.
With the latter choice the options ‘-mstack-protector-guard-reg=reg’ and
‘-mstack-protector-guard-offset=offset’ furthermore specify which system
register to use as base register for reading the canary, and from what offset
from that base register. There is no default register or off set as this is entirely
for use within the Linux kernel.
-mstack-protector-guard=guard
-mstack-protector-guard-reg=reg
-mstack-protector-guard-offset=offset
Generate stack protection code using canary at guard. Supported locations are
‘global’ for a global canary or ‘sysreg’ for a canary in an appropriate system
register.

With the latter choice the options ‘-mstack-protector-guard-reg=reg’ and
‘-mstack-protector-guard-offset=offset’ furthermore specify which system register to use
as base register for reading the canary, and from what off set
from that base register. There is no default register or off set as this is entirely
for use within the Linux kernel.
-mtls-dialect=desc
Use TLS descriptors as the thread-local storage mechanism for dynamic accesses
of TLS variables. This is the default.
-mtls-dialect=traditional
Use traditional TLS as the thread-local storage mechanism for dynamic accesses
of TLS variables.
-mtls-size=size
Specify bit size of immediate TLS off sets. Valid values are 12, 24, 32, 48. This
option requires binutils 2.26 or newer.
-mfix-cortex-a53-835769
-mno-fix-cortex-a53-835769
Enable or disable the workaround for the ARM Cortex-A53 erratum number
835769. This involves inserting a NOP instruction between memory instructions
and 64-bit integer multiply-accumulate instructions.
-mfix-cortex-a53-843419
-mno-fix-cortex-a53-843419
Enable or disable the workaround for the ARM Cortex-A53 erratum number
843419. This erratum workaround is made at link time and this will only pass
the corresponding flag to the linker.
-mlow-precision-recip-sqrt
-mno-low-precision-recip-sqrt
Enable or disable the reciprocal square root approximation. This option only
has an eff ect if‘ -ffast-math’ or ‘-funsafe-math-optimizations’ is used as
well. Enabling this reduces precision of reciprocal square root results to about
16 bits for single precision and to 32 bits for double precision.
-mlow-precision-sqrt
-mno-low-precision-sqrt
Enable or disable the square root approximation. This option only has an effect if
‘ -ffast-math’ or ‘-funsafe-math-optimizations’ is used as well.
Enabling this reduces precision of square root results to about 16 bits for
single precision and to 32 bits for double precision. If enabled, it implies
‘-mlow-precision-recip-sqrt’.
-mlow-precision-div
-mno-low-precision-div
Enable or disable the division approximation. This option only has an eff ect if
‘-ffast-math’ or ‘-funsafe-math-optimizations’ is used as well. Enabling
this reduces precision of division results to about 16 bits for single precision
and to 32 bits for double precision.

-mtrack-speculation
-mno-track-speculation
Enable or disable generation of additional code to track speculative execution
through conditional branches. The tracking state can then be used by the com-
piler when expanding calls to __builtin_speculation_safe_copy to permit
a more efficient code sequence to be generated.
-moutline-atomics
-mno-outline-atomics
Enable or disable calls to out-of-line helpers to implement atomic operations.
These helpers will, at runtime, determine if the LSE instructions from
ARMv8.1-A can be used; if not, they will use the load/store-exclusive
instructions that are present in the base ARMv8.0 ISA.
This option is only applicable when compiling for the base ARMv8.0
instruction set. If using a later revision, e.g. ‘-march=armv8.1-a’ or ‘-march=armv8-a+lse’,
the ARMv8.1-Atomics instructions will be used directly. The same applies when using ‘-mcpu=’
when the selected cpu supports the ‘lse’ feature.

-march=name
Specify the name of the target architecture and, optionally, one or more feature
modifiers. This option has the form‘ -march=arch{+[no]feature}*’.
The permissible values for arch are ‘armv8-a’, ‘armv8.1-a’, ‘armv8.2-a’,
‘armv8.3-a’, ‘armv8.4-a’, ‘armv8.5-a’ or native.
The value ‘armv8.5-a’ implies ‘armv8.4-a’ and enables compiler support for
the ARMv8.5-A architecture extensions.
The value ‘armv8.4-a’ implies ‘armv8.3-a’ and enables compiler support for
the ARMv8.4-A architecture extensions.
The value ‘armv8.3-a’ implies ‘armv8.2-a’ and enables compiler support for
the ARMv8.3-A architecture extensions.
The value ‘armv8.2-a’ implies ‘armv8.1-a’ and enables compiler support for
the ARMv8.2-A architecture extensions.
The value ‘armv8.1-a’ implies ‘armv8-a’ and enables compiler support for the
ARMv8.1-A architecture extension. In particular, it enables the ‘+crc’, ‘+lse’,
and ‘+rdma’ features.
The value ‘native’ is available on native AArch64 GNU/Linux and causes the
compiler to pick the architecture of the host system. This option has no effect
if the compiler is unable to recognize the architecture of the host system,
The permissible values for feature are listed in the sub-section on [‘-march’ and
‘-mcpu’ Feature Modifi ers], page 253 . Where conflicting feature modifiers are
specified, the right-most feature is used.
GCC uses name to determine what kind of instructions it can emit when generating
assembly code. If ‘-march’ is specifi ed without either of‘ -mtune’ or
‘-mcpu’ also being specifi ed, the code is tuned to perform well across a range of
target processors implementing the target architecture.

-mtune=name
Specify the name of the target processor for which GCC should tune the
performance of the code. Permissible values for this option are: ‘generic’,
‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’,
‘cortex-a73’, ‘cortex-a75’, ‘cortex-a76’, ‘cortex-a76ae’, ‘cortex-a77’,
‘cortex-a65’, ‘cortex-a65ae’, ‘cortex-a34’, ‘ares’, ‘exynos-m1’, ‘emag’,
‘falkor’, ‘neoverse-e1’,‘neoverse-n1’,‘qdf24xx’, ‘saphira’, ‘phecda’,
‘xgene1’, ‘vulcan’, ‘octeontx’, ‘octeontx81’, ‘octeontx83’, ‘thunderx’,
‘thunderxt88’, ‘thunderxt88p1’, ‘thunderxt81’, ‘tsv110’, ‘thunderxt83’,
‘thunderx2t99’, ‘cortex-a57.cortex-a53’, ‘cortex-a72.cortex-a53’,
‘cortex-a73.cortex-a35’, ‘cortex-a73.cortex-a53’, ‘cortex-a75.cortex-a55’
‘cortex-a76.cortex-a55’ ‘native’.
The values ‘cortex-a57.cortex-a53’, ‘cortex-a72.cortex-a53’,
‘cortex-a73.cortex-a35’, ‘cortex-a73.cortex-a53’, ‘cortex-a75.cortex-a55’
‘cortex-a76.cortex-a55’ specify that GCC should tune for a big.LITTLE
system.
Additionally on native AArch64 GNU/Linux systems the value ‘native’ tunes
performance to the host system. This option has no eff ect if the compiler is
unable to recognize the processor of the host system.
Where none of ‘-mtune=’, ‘-mcpu=’ or ‘-march=’ are specifi ed, the code is tuned
to perform well across a range of target processors.
This option cannot be suffixed by feature modifiers.

-mcpu=name
Specify the name of the target processor, optionally suffi xed by one or more
feature modifi ers. This option has the form‘ -mcpu=cpu{+[no]feature}*’, where
the permissible values for cpu are the same as those available for ‘-mtune’. The
permissible values for feature are documented in the sub-section on [‘-march’
and ‘-mcpu’ Feature Modifi ers].
Where conflicting feature modifiers are specified, the right-most feature is used.
GCC uses name to determine what kind of instructions it can emit when generating assembly
code (as if by ‘-march’) and to determine the target processor for which to tune for
performance (as if by ‘-mtune’). Where this option is used
in conjunction with ‘-march’ or ‘-mtune’, those options take precedence over
the appropriate part of this option.
-moverride=string
Override tuning decisions made by the back-end in response to a ‘-mtune=’
switch. The syntax, semantics, and accepted values for string in this option are
not guaranteed to be consistent across releases.
This option is only intended to be useful when developing GCC.
-mverbose-cost-dump
Enable verbose cost model dumping in the debug dump fi les. This option is
provided for use in debugging the compiler.Chapter 3: GCC Command Options 253
-mpc-relative-literal-loads
-mno-pc-relative-literal-loads
Enable or disable PC-relative literal loads. With this option literal pools are
accessed using a single instruction and emitted after each function. This lim-
its the maximum size of functions to 1MB. This is enabled by default for
‘-mcmodel=tiny’.
-msign-return-address=scope
Select the function scope on which return address signing will be applied. Permissible values
are ‘none’, which disables return address signing, ‘non-leaf’,
which enables pointer signing for functions which are not leaf functions, and
‘all’, which enables pointer signing for all functions. The default value is
‘none’. This option has been deprecated by -mbranch-protection.
-mbranch-protection=none|standard|pac-ret[+leaf+b-key]|bti
Select the branch protection features to use. ‘none’ is the default and turns
off all types of branch protection.‘ standard’ turns on all types of branch
protection features. If a feature has additional tuning options, then ‘standard’
sets it to its standard level. ‘pac-ret[+leaf]’ turns on return address signing
to its standard level: signing functions that save the return address to memory
(non-leaf functions will practically always do this) using the a-key. The optional
argument ‘leaf’ can be used to extend the signing to include leaf functions. The
optional argument ‘b-key’ can be used to sign the functions with the B-key
instead of the A-key. ‘bti’ turns on branch target identification mechanism.
-msve-vector-bits=bits
Specify the number of bits in an SVE vector register. This option only has an
eff ect when SVE is enabled.
GCC supports two forms of SVE code generation: “vector-length agnostic”
output that works with any size of vector register and “vector-length specific”
output that allows GCC to make assumptions about the vector length when it is
useful for optimization reasons. The possible values of ‘bits’ are: ‘scalable’,
‘128’, ‘256’, ‘512’, ‘1024’ and ‘2048’. Specifying ‘scalable’ selects vector-
length agnostic output. At present ‘-msve-vector-bits=128’ also generates
vector-length agnostic output. All other values generate vector-length specific
code. The behavior of these values may change in future releases and no value
except ‘scalable’ should be relied on for producing code that is portable across
diff erent hardware SVE vector lengths.
The default is ‘-msve-vector-bits=scalable’, which produces vector-length agnostic code.

niklasf added a commit to niklasf/Stockfish that referenced this issue Jun 23, 2020
Tested with bench run after compiling with

- g++ (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
- clang version 3.8.1-24

on ThunderX CN8890.

No functional change.
@niklasf
Copy link
Contributor

niklasf commented Jun 23, 2020

Here's a minimal patch to maybe make some progress on this: #2760

@abdulbadii
Copy link

Thanks, must God bless you.. ameen

@vondele
Copy link
Member

vondele commented Jun 24, 2020

I'll merge the pull request to support armv8 and close the issue with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants