Skip to content

Commit 5e3ab2c

Browse files
authored
Merge pull request #34 from su2code/simd_updates
Updating the options and instructions related to vectorization
2 parents faaef23 + eca69b3 commit 5e3ab2c

File tree

2 files changed

+14
-4
lines changed

2 files changed

+14
-4
lines changed

_docs_v7/Build-SU2-Linux-MacOS.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ To set a installation directory for the binaries and python scripts, use the `--
142142
If you are not interested in setting custom compiler flags and other options you can now go directly to the [Compilation](#compilation) section, otherwise continue reading the next section.
143143

144144
### Advanced Configuration ###
145-
In general meson appends flags set with the environment variable `CXX_FLAGS`. It is however recommended to use
145+
In general meson appends flags set with the environment variable `CXXFLAGS`. It is however recommended to use
146146
mesons built-in options to set debug mode, warning levels and optimizations. All options can be found [here](https://mesonbuild.com/Builtin-options.html) or by using `./meson.py configure`. An already created configuration can be modified by using the `--reconfigure` flag, e.g.:
147147
```
148148
./meson.py build --reconfigure --buildtype=debug
@@ -155,7 +155,11 @@ The debug mode can be enabled by using the `--buildtype=debug` option. This adds
155155

156156
#### Compiler optimizations ####
157157

158-
The optimization level can be set with `--optimization=level`, where `level` corresponds to a number between 0 (no optimization) and 3 (highest level of optimizations). The default level is 3.
158+
The optimization level can be set with `--optimization=level`, where `level` corresponds to a number between 0 (no optimization) and 3 (highest level of optimizations) which is the default.
159+
However, that may not result in optimum performance, for example with the GNU compilers level 2 and the extra flag `-funroll-loops` results in better performance for most problems.
160+
161+
Some numerical schemes support vectorization (see which ones in the Convective Schemes page), to make the most out of it the compiler needs to be informed of the target CPU architecture, so it knows what "kind of vectorization" it can generate (256 or 512bit, 128bit being the default).
162+
With gcc, clang, and icc this can be done via the `-march=??` and `-mtune=??` options, where `??` needs to be set appropriately e.g. `skylake`, `ryzen`, etc., these flags can be passed to the compiler by setting `CXXFLAGS` before first running meson (which will print some messages acknowledging the flags).
159163

160164
#### Warning level ####
161165

_docs_v7/Convective-Schemes.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,12 +37,16 @@ To achieve second order upwind schemes need to be used with MUSCL reconstruction
3737
### Central Schemes ###
3838

3939
- `JST` - Jameson-Schmidt-Turkel scheme with scalar dissipation defined by the second and fourth order dissipation coefficients in option `JST_SENSOR_COEFF = (2nd, 4th)` the default values are 0.5 and 0.02 respectively. This scheme offers a good compromise between accuracy and robustness but it will over predict viscous drag contributions in low-Re meshes.
40-
- `JST-KE` - Equivalent to `JST` with 0 fourth order coefficient (the computational effort is slightly reduced as solution Laplacians no longer need to be computed);
40+
- `JST_KE` - Equivalent to `JST` with 0 fourth order coefficient (the computational effort is slightly reduced as solution Laplacians no longer need to be computed);
41+
- `JST_MAT` - Jameson-Schmidt-Turkel scheme with matrix dissipation, the classical dissipation term is scaled by the flux Jacobian with the minimum Eigenvalue limited by `ENTROPY_FIX_COEFF` (0.05-0.2 is recommended, larger means more numerical dissipation). This scheme gives better viscous drag predictions on low-Re meshes than `JST`.
4142
- `LAX-FRIEDRICH` - The simplest of central schemes with a first order dissipation term specified via `LAX_SENSOR_COEFF` (the default is 0.15), this scheme is the most stable and least accurate due to its very dissipative nature.
4243

4344
The option `CENTRAL_JACOBIAN_FIX_FACTOR` (default value 4.0) affects all central schemes.
4445
In implicit time marching it improves the numerical properties of the Jacobian matrix so that higher CFL values can be used.
4546
To maintain CFL at lower-than-default values of dissipation coefficients, a higher factor should be used.
47+
`JST_MAT` benefits from higher values (~8.0).
48+
49+
All compressible central schemes support vectorization (`USE_VECTORIZATION= YES`) with no robustness downsides, see the build instructions for how to tune the compilation for maximum vectorization performance.
4650

4751
**Note:** The Lax-Friedrich scheme is always used on coarse multigrid levels when any central scheme is selected.
4852

@@ -70,12 +74,14 @@ Some of the schemes above have tunning parameters or accept extra options, the f
7074
| **`ROE_LOW_DISSIPATION`** | X | | | | X | | |
7175
| **`USE_ACCURATE_FLUX_JACOBIANS`** | | | | X | X | | |
7276
| **`MIN/MAX_ROE_TURKEL_PREC`** | | | X | | | | |
77+
| **`USE_VECTORIZATION`** | X | | | | | | |
7378

7479
- `ROE_KAPPA`, default 0.5, constant that multiplies the left and right state sum;
7580
- `ENTROPY_FIX_COEFF`, default 0.001, puts a lower bound on dissipation by limiting the minimum convective Eigenvalue to a fraction of the speed of sound. Increasing it may help overcome convergence issues, at the expense of making the solution sensitive to this parameter.
7681
- `ROE_LOW_DISSIPATION`, default `NONE`, methods to reduce dissipation in regions where certain conditions are verified, `FD` (wall distance based), `NTS` (Travin and Shur), `FD_DUCROS` and `NTS_DUCROS` as before plus Ducros' shock sensor;
7782
- `USE_ACCURATE_FLUX_JACOBIANS`, default `NO`, if set to `YES` accurate flux Jacobians are used instead of Roe approximates, slower on a per iteration basis but in some cases allows much higher CFL values to be used and therefore faster overall convergence;
78-
- `MIN_ROE_TURKEL_PREC` and `MAX_ROE_TURKEL_PREC`, defaults 0.01 and 0.2 respectively, reference Mach numbers for Turkel preconditioning.
83+
- `MIN_ROE_TURKEL_PREC` and `MAX_ROE_TURKEL_PREC`, defaults 0.01 and 0.2 respectively, reference Mach numbers for Turkel preconditioning;
84+
- `USE_VECTORIZATION`, default `NO`, if `YES` use the vectorized (SSE, AVX, or AVX512) implementation which is faster but may be less robust against initial solution transients.
7985

8086
**Note:** Some schemes are not compatible with all other features of SU2, the AUSM family and CUSP are not compatible with unsteady simulations of moving grids, non-ideal gases are only compatible with the standard Roe and HLLC schemes.
8187

0 commit comments

Comments
 (0)