Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

busybox free printing stray characters #372

Closed
vineetgarc opened this issue Apr 28, 2021 · 23 comments
Closed

busybox free printing stray characters #372

vineetgarc opened this issue Apr 28, 2021 · 23 comments

Comments

@vineetgarc
Copy link
Contributor

vineetgarc commented Apr 28, 2021

Busybox free prints stray characters


Linux version 5.6.0-00223-gf03b92a6f9a7 (gcc version 10.2.0 (Buildroot 2021.02-6-g5e29ba7bf732)) #1 PREEMPT Tue Apr 20 11:54:40 PDT 2021
Memory @ 80000000 [1024M] 
Memory @ 100000000 [1024M] Not used
OF: fdt: Machine model: snps,zebu_hs
earlycon: uart8250 at MMIO32 0xf0000000 (options '115200n8')
printk: bootconsole [uart8250] enabled

IDENTITY	: ARCVER [0x54] ARCNUM [0x0] CHIPID [0xffff]
processor [0]	: HS38 R3.10a (ARCv2 ISA) 
ISA Extn	: atomic ll64 unalign mpy[opt 7] div_rem 
BPU		: partial match, cache:256, Predict Table:2048 Return stk: 4
MMU [v4]	: 8k/2M, swalk 2 lvl, JTLB 256x4, uDTLB 8, uITLB 4, PAE40 (not used) 
I-Cache		: 64K, 4way/set, 64B Line, VIPT aliasing
D-Cache		: 64K, 2way/set, 64B Line, PIPT
Peripherals	: 0xc0000000
Timers		: Timer0 Timer1 
Vector Table	: 0x80000000 [64-bit]
archs-intc	: 16 priority levels (default 1) FIRQ (not used) hw autosave
...
# free -m
              total        used        free      shared  buff/cache   available
Mem:           1013           3         965 %1@llu %1@�lu %1@�pu
Swap:   %1@llu %1@�lu %1@�pu

This is off of Buildroot 2021.02, upstream gcc 10.2 + (foss-for-synopsys-dwc-arc-processors/gcc@5aeabae "arc: Refurbish adc/sbc patterns"


BR2_GCC_TARGET_CPU="hs4x"
BR2_OPTIMIZE_2=y
BR2_BINUTILS_VERSION_2_36_X=y
BR2_BINUTILS_VERSION="2.36.1"
BR2_GCC_VERSION_10_X=y
BR2_GCC_VERSION="10.2.0"
BR2_TARGET_OPTIMIZATION="-mfpu=fpud_all"

Linux kernel is built with -O3 (CC_OPTIMIZE_FOR_PERFORMANCE_O3)

@vineetgarc
Copy link
Contributor Author

small test case

void main(void)
{
  unsigned long long a = 12345678901234567, b = 23456789012345678,  c = 34567890123456789,
                                 d = 45678901234567891, e = 56789012345678901, f = 67890123456789012;
  printf("%12llu %11llu %11llu %11llu %11llu %11llu\n", a, b, c, d, e, f);
}

$ arc-linux-gcc printf-6-double.c -O2 -o print6 -g -static -fno-delayed-branch

# /print6 
12345678901234567 23456789012345678 34567890123456789 %llu %�lu %�u

@vineetgarc
Copy link
Contributor Author

Started pouring into glibc printf() call stack - gdb (2021.03-eng1) is useless as it can't seem to unravel the heavily macro'ized vfprintf_internal() code so wading through assembly muck ! Deep inside the state machine for parsing format specifier, there's a little function read_int() which needs to be called 6 times for the 6 format specifiers, however it is called only 4 times, with it bailing early in the 4th round - which explains why only 3 specifiers are parsed and 3 numbers printed in output above,

@vineetgarc
Copy link
Contributor Author

4th read_int() call is tripping because format specifier seems corrupted.

(gdb) p/s (char *)0x51228
"%12llu %11llu %11llu %1\bllu %1\b\200lu %1\b\200\030u\n"

Expected
"%12llu %11llu %11llu %11llu %11llu %11llu\n"

The control characters in the 4th format specifier is the ones causing the issue.

objump of the binary shows the format specifier string is OK in the binary

Disassembly of section .rodata:
51228: 25 31 32 6c 6c 75 20 25 % 1 2 l l u ‘space’ %
51230: 31 31 6c 6c 75 20 25 31 1 1 l l u ‘space’ % 1
51238: 31 6c 6c 75 20 25 31 31 1 l l u 'space' % 1 1
51240: 6c 6c 75 20 25 31 31 6c
51248: 6c 75 20 25 31 31 6c 6c

@vineetgarc
Copy link
Contributor Author

vineetgarc commented Apr 28, 2021

At the time of main() entry the string is already clobbered.

Breakpoint 4, 0x00010300 in ?? () <-- main
2: x/i $pc
=> 0x10300: push_s blink
(gdb) x/b 0x5123e <-- OK
0x5123e: 0x31
(gdb) x/b 0x5123f <-- NOK
0x5123f: 0x08
(gdb) p/s (char *)0x51228
$3 = 0x51228 "%12llu %11llu %11llu %1\bllu %1\b\200lu %1\b\200\030u\n"

Next was to check the page contents when it first wired up into user-space. Address 0x5_123f is in .rodata, but shares the physical page with .text too.

  [ 3] .text             PROGBITS        00010140 000140 0408e0 00  AX  0   0  4
  [ 4] __libc_freeres_fn PROGBITS        00050a20 040a20 0007f8 00  AX  0   0  4
  [ 5] .fini             PROGBITS        00051218 041218 00000a 00  AX  0   0  1
  [ 6] .rodata           PROGBITS        00051224 041224 0185b0 00   A  0   0  4

This is wired up at the time libc startup due to a jump to .text in that page

@vineetgarc
Copy link
Contributor Author

vineetgarc commented Apr 28, 2021

Observed the ITLB Misses for the process, until one pertaining to 0x5_0000 hits (page size on ARC is 8k so 0x5_0000 to 0x5_1FFF). After update_mmu_cache() looked at the physical page containing the .rodata

  tlbpd0: 00050248: vaddr=      50000 sz0 Vg asid=48
  tlbpd1: be78a05b: paddr=   be78a000 k:RwX u:RwX f:C

The address of interest is 0x5123f => 0xbe78_a000 + 0x123f = 0xbe78_b23f
And it is clobbered even then. So the binary on host is fine, but somehow corrupted on target. It must be linux kernel causing this then.

Used nsim trace output to see where this address is being populated. Sure enough during initramfs unpacking / populating it is being memcpy() - with incorrect byte.

[0x80edee74] ldd.ab  r4,[r1,0x8] : ldd [0xbf17b3b0] => 0x08 31 25 20 75 6c 6c 31
                                                         ^^
[0x80edee84] std.ab  r4,[r3,0x8] : sw [0xbe78_b238] <= 0x08 31 25 20 75 6c 6c 31
                                                         ^^

@vineetgarc
Copy link
Contributor Author

The memcpy src in turn is being setup in kernel initramfs gzip inflate code - since initramfs is GZIP compressed per build.
Interestingly if I disable the initramfs compression

-CONFIG_INITRAMFS_COMPRESSION_GZIP=y
+CONFIG_INITRAMFS_COMPRESSION_NONE=y

and rebuild the kernel, the problem goes away with everything else being same

# /print6 
12345678901234567 23456789012345678 34567890123456789 45678901234567891 56789012345678901 67890123456789012

@vineetgarc
Copy link
Contributor Author

vineetgarc commented Apr 28, 2021

The memcpy source in turn is populated from a different copy loop in inflaste_fast() => initramfs gzip inflate code

|0x80d783d6> lp             0xe : LP_START <= 0x80d783da, LP_END <= 0x80d783e2, LP_COUNT = 3 *
[0x80d783da] ldd.a          r16,[r23,0x8] : ldd [0xbf17b3a9] => 0x08312520756c6c31
                                                                  ^^
[0x80d783de] std.ab         r16,[r13,0x8] :  sw [0xbf17b3b0] <= 0x08312520756c6c31
                                                                  ^^
[0x80d783da] ldd.a          r16,[r23,0x8] : ldd [0xbf17b3b1] => 0x8008312520756c6c
[0x80d783de] std.ab         r16,[r13,0x8] :  sw [0xbf17b3b8] <= 0x8008312520756c6c
[0x80d783da] ldd.a          r16,[r23,0x8] : ldd [0xbf17b3b9] => 0x188008312520756c
<0x80d783de| std.ab         r16,[r13,0x8] :  sw [0xbf17b3c0] <= 0x188008312520756c

In the first 8-byte copy, src and dst overlap which is NOK since these are short pointers, so only need to copy 2 bytes and not 8.

The relevant source code from Linux kernel : lib/zlib_inflate/inffast.c

271:		    sout = (unsigned short *)(out - OFF);
		    if (dist > 2) {
			unsigned short *sfrom = (unsigned short *)(from - OFF);
			loops = len >> 1;
			do
#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS     <-- this is enabled for ARCv2
279:			    PUP(sout) = PUP(sfrom);
#else
			    PUP(sout) = UP_UNALIGNED(sfrom);
#endif
			while (--loops);
			out = (unsigned char *)sout + OFF;
			from = (unsigned char *)sfrom + OFF;
		    }

So the issue is why is gcc generating 8-byte LDD when only 2 bytes are intended (given the short/uint16 pointer).
The culprit byte is 8th one, so if gcc was generating 2bytes LDH/STH the issue won't happen.

So summarize: the issue is gcc generating an 8 byte at a time access (LDD/STD) for a short pointer access which requires 2 bytes at a time.

Not sure of related, but arc64 gcc has a similar problem with unaligned access where it fetches more data than warranted (and the reason I disable unaligned access for arc64).

@vineetgarc
Copy link
Contributor Author

Weied that building the specific kernel file with -mno-ll64 changes the access to LDH/STH, however the issue remains (so happening elsewhere too)

@vineetgarc
Copy link
Contributor Author

testcase for gcc is attached:
arc-linux-gcc-O3 -c inffast.i --save-temps
inffast.tar.gz

look for zol generated at .L52

FWIW the problem is seen with upstream gcc 10.2, as well as gcc from 2020.09 release

@vineetgarc
Copy link
Contributor Author

vineetgarc commented Apr 29, 2021

FWIW the issue remains (gcc generating LDD/STD for uint16 * access) even with -mno-unaligned-access

With -mno-ll64 (for just this file), gcc still does 8 byte access in loop (vs. 2) albeit with two LD/ST instead of single LDD/STD

	mov	lp_count,r24
	lp	@.L223
	.align 2
.L52:
# ../lib/zlib_inflate/inffast.c:281:    PUP(sout) = PUP(sfrom);
	ld.a	r16,[r23,8]
	ld	r17,[r23,4]
	st	r17,[r13,4]
	st.ab	r16,[r13,8]
	.align 2
.L223:
	; ZOL_END, begins @.L52

And to prove that this and only this is the issue I replaced the "C" code with inline asm (and switched back -ll64 for whole file)

#if 0
			    PUP(sout) = PUP(sfrom);
#else
				asm volatile(
					"add %0, %0, 2	\r\n"
					"add %1, %1, 2	\r\n"
					"ldh %2, [%1]	\r\n"
					"sth %2, [%0]	\r\n"
				:"+r" (sout), "+r" (sfrom), "=r" (tmpreg)::"memory");
#endif

And the problem goes away

# /print6 
12345678901234567 23456789012345678 34567890123456789 45678901234567891 56789012345678901 67890123456789012

Over to you Claudiu. Phew !

@vineetgarc
Copy link
Contributor Author

This seems a mainline issue as I can reproduce it with aarch64 gcc 10.2 as well.
Created a gcc PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

@abrodkin
Copy link
Member

abrodkin commented May 3, 2021

For the record that's a work-around for the Linux kernel with use of more conservative -O2 optimization:

diff --git a/lib/zlib_inflate/Makefile b/lib/zlib_inflate/Makefile
index 27327d3e9f54..6f97070d08a3 100644
--- a/lib/zlib_inflate/Makefile
+++ b/lib/zlib_inflate/Makefile
@@ -18,3 +18,5 @@ obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate.o

 zlib_inflate-objs := inffast.o inflate.o infutil.o \
                     inftrees.o inflate_syms.o
+
+CFLAGS_inffast.o += -O2

With that contents of initramfs get decompressed correctly, in particularly that fixes #373.

@claziss
Copy link
Contributor

claziss commented May 4, 2021

The movmisalign pattern are active only when we do have unaligned accesses.
foss-for-synopsys-dwc-arc-processors/gcc@082536e

@claziss
Copy link
Contributor

claziss commented May 4, 2021

Per Richard, this is not a bug see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363#c17

I propose to close this issue.

@vineetgarc
Copy link
Contributor Author

vineetgarc commented May 4, 2021

Per Richard, this is not a bug see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363#c17

And I'm Mahatma Gandhi !

I propose to close this issue.

Seriously I'm surprised you say that. Did you even read through the whole bug report. There's clearly something the compiler is doing "more" than it is expected and the only way to make it NOT do that is either lower the optimization level or even worse align a uint16 to byte. Is that the solution you propose we adopt in the Linux kernel.

@vineetgarc
Copy link
Contributor Author

The movmisalign pattern are active only when we do have unaligned accesses.
foss-for-synopsys-dwc-arc-processors/gcc@082536e

Can you please explain a bit more for mere mortals what is trying to do - we already have unaligned access disabled by default - is this disabling something more when we have -munaligned-access. Is this yet more of sweeping the problem under the rug.

@claziss
Copy link
Contributor

claziss commented May 4, 2021

movmisalign is used by autovectorizer to deal with misaligned addresses of the vectors. If a user passes -mno-unaligned-access then generating ldd/ldh/ld instructions may be restricted when using the movmisalign pattern.
Thus, the patch in question just honors the user will of no unaligned access option.

@VVIsaev
Copy link
Contributor

VVIsaev commented Jul 1, 2021

Got this issue one more time (actually Cuper had it when debugging QEMU), now just ps doesn't work and checksums of busybox before and after unpacking differs. This was for almost empty buildroot config with release toolchain (2021.03) and buildroot (2021.05).

Do we plan to fix it somehow? As I can understand the issue is in kernel zlib inflate implementation with -O3 optimization level.
Looks like gcc guys won't fix it (this is not a gcc bug) and Linux guys also won't fix it (they recommended to use -O2).

So we can at least apply this patch #372 (comment) to our buildroot, because we need stable environment for testing.

@abrodkin
Copy link
Member

abrodkin commented Jul 1, 2021

@vineetgarc what's your take on this one? I.e. which would be the best solution to tis med- and long-term?

@cupertinomiranda
Copy link

Can someone please fix this. I want to have a Linux Image for HS that is not crippled with this issues.

@VVIsaev
Copy link
Contributor

VVIsaev commented Jul 2, 2021

@cupertinomiranda please use https://github.com/foss-for-synopsys-dwc-arc-processors/buildroot/tree/arc-2021.05.x-kernel-02 branch with snps_archs38_qemu_defconfig . This configuration uses custom kernel config with kernel optimization level set to O2.

@vineetgarc
Copy link
Contributor Author

Can someone please fix this. I want to have a Linux Image for HS that is not crippled with this issues.

The workaround is already mentioned several times in the issue - the simplest way is to do this w/o any source changes is in your buildroot folder, make linux-menuconfig -> General Setup -> Compiler optimization level -> O2

The actual fix is still pending (neither party wants to fix it) and will take lot more time to resolve correctly.

vineetgarc added a commit to foss-for-synopsys-dwc-arc-processors/linux that referenced this issue Jul 12, 2021
vineetgarc added a commit to foss-for-synopsys-dwc-arc-processors/linux that referenced this issue Jul 12, 2021
vineetgarc added a commit to foss-for-synopsys-dwc-arc-processors/linux that referenced this issue Aug 11, 2021
@VVIsaev
Copy link
Contributor

VVIsaev commented Jan 27, 2022

I think we agree that the issue is related to kernel code and there is nothing to fix in toolchain. So I opened kernel bug for it here: foss-for-synopsys-dwc-arc-processors/linux#68 .

@claziss since we have workaround in the kernel sources (see https://github.com/foss-for-synopsys-dwc-arc-processors/linux/blob/bc5fec251f1087dc999fa744aa64799fc02a5476/lib/zlib_inflate/Makefile#L22) I think you can enable unaligned access back for arc64 linux toolchain. And then please close this, because I don't see what can be done here in toolchain.

@claziss claziss closed this as completed Jan 28, 2022
geomatsi pushed a commit to foss-for-synopsys-dwc-arc-processors/linux that referenced this issue Feb 11, 2022
geomatsi pushed a commit to foss-for-synopsys-dwc-arc-processors/linux that referenced this issue Feb 18, 2022
geomatsi pushed a commit to foss-for-synopsys-dwc-arc-processors/linux that referenced this issue Feb 18, 2022
geomatsi pushed a commit to foss-for-synopsys-dwc-arc-processors/linux that referenced this issue Feb 18, 2022
geomatsi pushed a commit to foss-for-synopsys-dwc-arc-processors/linux that referenced this issue Feb 21, 2022
xxkent pushed a commit to foss-for-synopsys-dwc-arc-processors/linux that referenced this issue Aug 4, 2023
xxkent pushed a commit to foss-for-synopsys-dwc-arc-processors/linux that referenced this issue Oct 10, 2023
xxkent pushed a commit to foss-for-synopsys-dwc-arc-processors/linux that referenced this issue Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants