Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clang crashes on riscv64 #50090

Closed
sergev mannequin opened this issue Jun 17, 2021 · 9 comments
Closed

clang crashes on riscv64 #50090

sergev mannequin opened this issue Jun 17, 2021 · 9 comments
Labels
backend:RISC-V bugzilla Issues migrated from bugzilla question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

Comments

@sergev
Copy link
Mannequin

sergev mannequin commented Jun 17, 2021

Bugzilla Link 50746
Version 11.0
OS Linux
CC @asb,@topperc,@DimitryAndric,@efriedma-quic,@frasercrmck,@luismarques,@zygoloid

Extended Description

I use Debian 11 installed on RISC-V platform: Nezha board with Allwinner D1 processor.

I installed clang as usual: "sudo apt install clang". The version is 1:11.0-51+nmu5. The source of the packages is http://ftp.ports.debian.org/debian-ports/.

When I run clang from command line without parameters, it crashes with message:

$ clang
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments: clang

  1. Compilation construction
    /usr/lib/riscv64-linux-gnu/libLLVM-11.so.1(_ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamE+0x28)[0x3fec874c08]
    Illegal instruction

Other compilers seem to work fine. I checked gcc, rustc, go.

Information about the system:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye

$ uname -a
Linux nezha 5.4.61 #​68 PREEMPT Tue Jun 1 04:18:22 UTC 2021 riscv64 GNU/Linux

$ /usr/sbin/hwinfo --short
cpu:
rv64imafdcvu
keyboard:
/dev/ttyS0 serial console
network:
eth0 ARM Ethernet controller
wlan0 ARM Ethernet controller
Network controller
network interface:
eth0 Ethernet network interface
lo Loopback network interface
sit0 Network Interface
wlan0 WLAN network interface
disk:
/dev/mmcblk0 Disk
partition:
/dev/mmcblk0p1 Partition
/dev/mmcblk0p2 Partition
/dev/mmcblk0p3 Partition
/dev/mmcblk0p4 Partition
/dev/mmcblk0p5 Partition
/dev/mmcblk0p6 Partition
/dev/mmcblk0p7 Partition
/dev/mmcblk0p8 Partition
hub:
Linux Foundation 2.0 root hub
Linux Foundation 1.1 root hub
memory:
Main Memory

Thanks,
--Serge

@DimitryAndric
Copy link
Collaborator

I think you should first report this with the Debian package maintainer(s). If you're able, can you build clang from source on this particular system, and see if it then also crashes in the same way?

@sergev
Copy link
Mannequin Author

sergev mannequin commented Jun 18, 2021

Let's investigate with gdb.

$ gdb -q /usr/bin/clang-11
Reading symbols from /usr/bin/clang-11...
(No debugging symbols found in /usr/bin/clang-11)
(gdb) r
Starting program: /usr/bin/clang-11
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x0000003ff215b098 in ?? () from /usr/lib/riscv64-linux-gnu/libLLVM-11.so.1

Here is the instruction which caused exception:

(gdb) x/i 0x0000003ff215b098
=> 0x3ff215b098: fence.tso

From RISC-V Instruction Set Manual: "The optional FENCE.TSO instruction is encoded as a FENCE instruction with fm=1000, predecessor=RW, and successor=RW. FENCE.TSO orders all load operations in its predecessor set before all memory operations in its successor set, and all store operations in its predecessor set before all store operations in its successor set. This leaves non-AMO store operations in the FENCE.TSO’s predecessor set unordered with non-AMO loads in its successor set."

Note: "The optional FENCE.TSO instruction".

This instruction is not supported by Allwinner D1 processor, that's why it causes exception. So it's wrong for clang to generate it unconditionally. I'm not sure how clang was built for Debian. Maybe this option was enabled somehow. In this case it's a fault of Debian maintainers.

@asb
Copy link
Contributor

asb commented Jun 21, 2021

Thanks for the bug report.

It actually is correct for this instruction to be generated unconditionally, as the explanatory note in the ISA manual states, compliant RISC-V implementations should ignore values in the 'fm' field that are unrecognised, meaning it falls back to a full fence "The FENCE.TSO encoding was added as an optional extension to the original base FENCE instruction encoding. The base definition requires that implementations ignore any set bits and treat the FENCE as global, and so this is a backwards-compatible extension."

As you'll see in table A.6 in the ISA manual, fence.tso is part of the standard lowerings to map the C/C++ memory model to RISC-V. In cores that don't implement fence.tso specifically, this should just be a stronger fence than necessary.

We can add a flag to enable a different lowering, but due to this bug that core is going to have problems with code not compiled specifically for it.

@sergev
Copy link
Mannequin Author

sergev mannequin commented Jun 29, 2021

Thank you for explanations. I understand, that it's probably a mistake of the Allwinner D1 chip designers, to treat FENCE.TSO as undefined opcode instead of just a regular FENCE. It would be a stricter semantics than needed, but it would work. It's the first version of the chip. Hopefully this issue will be fixed or at least documented in the errata sheet.

Anyway, I managed to build clang-13 from sources (https://github.com/llvm/llvm-project.git) and it works pretty well. Seems like the FENCE.TSO issue has been resolved somehow. I build clang directly on the RISC-V board itself, under Debian. I took a few days to finish, as a single-core 1GHz processor with 1GB RAM is clearly not enough for so huge source base, but it worked. I don't see any 'Illegal instruction' exceptions neigher from binaries I compile with clang, nor from other Linux software.

So we may probably consider the FENCE.TSO resolved, I guess, as the latest clang works fine.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
@jrtc27
Copy link
Collaborator

jrtc27 commented Jan 6, 2022

Not a bug, and reporter says it's also no longer causing issues

@sanderjo
Copy link

sanderjo commented Mar 23, 2022

Anyway, I managed to build clang-13 from sources (https://github.com/llvm/llvm-project.git) and it works pretty well. Seems like the FENCE.TSO issue has been resolved somehow. I build clang directly on the RISC-V board itself, under Debian. I took a few days

Can you tell how you did that? Because even the plain

git clone https://github.com/llvm/llvm-project.git

failed on my Sipeed Lichee (D1, 512MB RAM) with

fatal: fetch-pack: invalid index-pack output

So if that's already too much for my D1, I have doubts about the real compiling process.

EDIT:

Ah, git gets killed by the oom-killer:

[Wed Mar 23 12:21:42 2022] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),task=git,pid=3087,uid=1000
[Wed Mar 23 12:21:42 2022] Out of memory: Killed process 3087 (git) total-vm:522396kB, anon-rss:303800kB, file-rss:316kB, shmem-rss:0kB, UID:1000 pgtables:656kB oom_score_adj:0
[Wed Mar 23 12:21:42 2022] oom_reaper: reaped process 3087 (git), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

@brucehoult
Copy link

Can you tell how you did that? Because even the plain

git clone https://github.com/llvm/llvm-project.git

failed on my Sipeed Lichee (D1, 512MB RAM) with

512 MB is just barely enough to run a modern Linux desktop and do light development work.

LLVM is not light development work.

The git repo is 1.1 GB. Git expects to be able to hold a repo in RAM, to process it quickly.

You may be able to make it work using a large enough swap space. This will be very slow.

I wouldn't recommend trying to build LLVM on a machine with less than 8 GB RAM. At minimum, maximise what you can do with that 512 MB by not running a GUI.

@sanderjo
Copy link

@brucehoult Thanks. I agree. My hope is on a compiled LLVM that does run on my D1

@brucehoult
Copy link

You should be able to copy the SD card to a PC/Mac and boot it on qemu-system-riscv64 with lots of RAM and maybe lots of CPU cores.

You can most definitely do that with the Ubuntu images for the HiFive Unmatched, which should build binaries compatible with Debian and with the Nezha boards (they are all RV64GC after all).

https://wiki.ubuntu.com/RISC-V

The qemu instructions there might also help with booting the D1 image on qemu.

@EugeneZelenko EugeneZelenko added the question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! label Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V bugzilla Issues migrated from bugzilla question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Projects
None yet
Development

No branches or pull requests

6 participants