Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCC internal error #1477

Closed
FlorianDeconinck opened this issue Dec 12, 2023 · 7 comments
Closed

GCC internal error #1477

FlorianDeconinck opened this issue Dec 12, 2023 · 7 comments

Comments

@FlorianDeconinck
Copy link
Contributor

Describe the bug
Code generated by the CPU backend of DaCe can (infrequently) lead to GCC dying of internal failure.

/home/runner/work/dace/dace/pace/.gt_cache_FV3_A/dacecache/pace_fv3core_stencils_remapping_LagrangianToEulerian___call__/src/cpu/pace_fv3core_stencils_remapping_LagrangianToEulerian___call__.cpp:653:52: internal compiler error: in convert_like_internal, at cp/call.c:7952
  653 |                 const double* const &pe1__ = &pe1[0];
      |                                                    ^
0xe34ddb internal_error(char const*, ...)
	???:0
0xe2b903 fancy_abort(char const*, int, char const*)
	???:0
0xfe340e cp_finish_decl(tree_node*, tree_node*, bool, tree_node*, int)
	???:0
0x14a51e3 c_parse_file()
	???:0
0x149493e c_common_parse_file()
	???:0
Please submit a full bug report,

The above pattern const double* const &X = &Y[0] is indeed the culprit. Removing it by hand saves GCC from dying.

This happens from gcc 10 to 13 included.

Can we get rid of the const &, which arguably is correct but seems to make GCC sad?

To Reproduce

git clone https://github.com/GEOS-ESM/pace.git
cd pace
python -m pip install --upgrade pip wheel setuptools
pip -m venv .venv
source .venv/bin/activate
pip install -r requirements_dev.txt
mkdir -p test_data
cd test_data
wget https://portal.nccs.nasa.gov/datashare/astg/smt/pace-regression-data/8.1.3_c12_6_ranks_standard.Remapping.tar.gz
tar -xzvf 8.1.3_c12_6_ranks_standard.Remapping.tar.gz
cd ../../
export FV3_DACEMODE=BuildAndRun
export PACE_CONSTANTS=GFS
pytest -v -s --data_path=./test_data/8.1.3/c12_6ranks_standard/dycore \
    --backend=dace:cpu --which_modules=Remapping --which_rank=0 \
    --threshold_overrides_file=./fv3core/tests/savepoint/translate/overrides/standard.yaml \
    ./fv3core/tests/savepoint
@alexnick83
Copy link
Contributor

Is it the const & or the combination with const * that creates the issue? I suspect that it is the latter. Could you write here the relevant SDFG information (Memlet expression and data container information; if one of the data is a view then please also write the data connected to it)?

@FlorianDeconinck
Copy link
Contributor Author

It's the combination. We are running under a separate branch on my fork where I generate the above as const* x = &y[0] and it's working fine.

The issue is that since it's an internal GCC failure the pattern doesn't fail all the time, I'll endeavor to get the smallest SDFG failing I can

@FlorianDeconinck
Copy link
Contributor Author

Here's the smallest SDFG that triggers it, albeit a big one I am afraid. A load/compile shows it: remapping.sdfgz.zip. As usual with github I appended .zip but it's a straight .sdfgz

@FlorianDeconinck
Copy link
Contributor Author

@FlorianDeconinck
Copy link
Contributor Author

Up this issue.

It's still very much present with latest DaCe on main. At the moment, we have to run our own fork of DaCe which is far form ideal. How can we proceed forward?

@alexnick83
Copy link
Contributor

Can you please check whether #1522 fixes the issue?

@FlorianDeconinck
Copy link
Contributor Author

#1522 fixes it.

github-merge-queue bot pushed a commit that referenced this issue Feb 22, 2024
@tbennun tbennun closed this as completed Feb 26, 2024
github-merge-queue bot pushed a commit that referenced this issue May 9, 2024
Follow up of #1460 

- [x] Fixed the `ci` script (including `git checkout issues` around
selecting the correct `dace`)
- [x] Move `D_SW` to execute only on rank 0 to avoid rebuild
- [x] Swapped Rieman Solver on C-grid for D-grid for better coverage

~~WARNING: this PR is blocked by #1477~~
~~WARNING: this PR is blocked by #1568~~

---------

Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants