Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/bisect: detect incorrect bisections #1271

Closed
dvyukov opened this issue Jul 5, 2019 · 19 comments
Closed

pkg/bisect: detect incorrect bisections #1271

dvyukov opened this issue Jul 5, 2019 · 19 comments

Comments

@dvyukov
Copy link
Collaborator

dvyukov commented Jul 5, 2019

Branched off #1051
We could detect incorrect bisection results based on some patch characteristics. The following checks were proposed:

  1. Comment-only change (proposed by Al Viro).
  2. Documentation-only change (proposed to Eric Biggers).
  3. Change to a non-current arch (e.g. only arch/arm/* changes on an x86 instance).
  4. Change only to syscall.*.tbl files (proposed to Eric Biggers).
  5. Changes to tests only (e.g. pointing to tools/testing/selftests/, filter out all of tools/?)
  6. Merges (proposed to Eric Biggers).

These seem to suggest 2 types of incorrect results:

  1. Completely incorrect results (e.g. bisected to comment-only change due to flaky crash).
  2. Correct results from the strict point of view, but not the commit ultimately introduced the bug (syscall.*.tbl changes, KASAN/LOCKDEP changes that detect more bugs).

Open question: do we want to handle them differently? E.g. a change bisected to syscall.*.tbl change is a relevant bit of information too.

Open question: what to do with merges? Usually we falsely point to merges because of flaky crashes or disabled configs. However, they can as well be the right commit. They can also belong to type 2. But they also can be the ultimately correct commit too (depending on what the actual merge did).

@ebiggers
Copy link
Collaborator

ebiggers commented Jul 5, 2019

If the bisection result is not "certainly incorrect" but rather just "probably incorrect", IMO it needs to undergo human review before being mailed out. Otherwise syzbot will continue spamming too many wrong results, decreasing the percent of correct results that get looked into. So I suggest just not automatically mailing out "probably incorrect" results, but leaving them available on the syzbot dashboard and marked as such.

@dvyukov
Copy link
Collaborator Author

dvyukov commented Jul 22, 2019

Makes sense, thanks.

@dvyukov
Copy link
Collaborator Author

dvyukov commented Sep 2, 2019

Detecting comment-only changes is not possible looking at the patch alone. It would require getting full file copy before/after the patch and then doing some analysis.
An interesting option is to compare kernel binaries. If binaries are the same before/after the patch, then definitely the patch does not affect anything and is a false result. This should reliably handle all of 1, 2, 3, 5 and more (even lots of no-op code changes like adding parens, swapping variable declarations, etc).
It's not this simple because e.g. changes to debug info (changed lines when a comment is added) are not actually relevant. Kbuild also embeds build timestamps into binaries.

dvyukov added a commit that referenced this issue Sep 2, 2019
Kbuild inserts build timestamp into every build.
This makes vmlinux change always even if no present.
Make the build more deterministic.
We plan to use it for detecting no-op changes during bisection.

Update #1271
dvyukov added a commit to dvyukov/syzkaller that referenced this issue Sep 2, 2019
dvyukov added a commit that referenced this issue Nov 6, 2019
Add optional build signature for images, currently only implemented for linux.
This can be used in bisection process to detect changes that does not affect kernel.

Update #1271
@dvyukov
Copy link
Collaborator Author

dvyukov commented Nov 6, 2019

@zsm-oss FYI I am going to tackle this, just so that we don't step on each other. This is going to be useful for fix bisection too.

dvyukov added a commit to dvyukov/syzkaller that referenced this issue Nov 7, 2019
dvyukov added a commit that referenced this issue Nov 7, 2019
Detect bisection to merge commits and to commits that don't affect
kernel binary (comments, other arches, whitespaces, etc).
Such bisections are not reported in emails (but shown on web).

Update #1271
@dvyukov
Copy link
Collaborator Author

dvyukov commented Nov 7, 2019

This should be mostly done.
We don't do the "Correct results from the strict point of view, but not the commit ultimately introduced the bug (syscall.*.tbl changes, KASAN/LOCKDEP changes that detect more bugs)" part yet. But it's much less frequent and not as critical as completely bogus results.

dvyukov added a commit that referenced this issue Nov 7, 2019
Show [report pending] for fix bisections that we will send.
Help to analyze what will happen when we enable mailing of fix bisections.

Update #1271
dvyukov added a commit that referenced this issue Nov 25, 2019
@dvyukov
Copy link
Collaborator Author

dvyukov commented Nov 27, 2019

One case of non-deterministic/non-hermetic kernel build:
https://syzkaller.appspot.com/bug?extid=19b469021157c136116a
https://groups.google.com/forum/#!msg/syzkaller-bugs/4FU25pFDMAo/1wME8ZdHBQAJ
https://syzkaller.appspot.com/x/bisect.txt?x=143090dee00000

Bisecting: 107 revisions left to test after this (roughly 7 steps)
[910cd32e552ea09caa89cdbe328e468979b030dd] parisc: Fix and enable seccomp filter support
testing commit 910cd32e552ea09caa89cdbe328e468979b030dd with gcc (GCC) 5.5.0
kernel signature: 53254f14d716ef4e40b90890cbd5d216ccd0186d
...
# git bisect bad 910cd32e552ea09caa89cdbe328e468979b030dd
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[4f4acc9472e54ce702f1d85fc9e6d57767dec91f] parisc: Fix SIGSYS signals in compat case
testing commit 4f4acc9472e54ce702f1d85fc9e6d57767dec91f with gcc (GCC) 5.5.0
kernel signature: aeb302881aba93101d5445ecd72b75dd79be5e59
all runs: OK
# git bisect good 4f4acc9472e54ce702f1d85fc9e6d57767dec91f
910cd32e552ea09caa89cdbe328e468979b030dd is the first bad commit
commit 910cd32e552ea09caa89cdbe328e468979b030dd
Author: Helge Deller <deller@gmx.de>
Date:   Wed Mar 30 14:14:31 2016 +0200

Trying to reproduce this locally with:

KBUILD_BUILD_VERSION=0 KBUILD_BUILD_TIMESTAMP=now KBUILD_BUILD_USER=syzkaller KBUILD_BUILD_HOST=syzkaller CFLAGS="-w -Wno-error=attributes -Wno-error=restrict" HOSTCFLAGS="-w -Wno-error=attributes -Wno-error=restrict" HOSTCC=/usr/local/google/home/dvyukov/syzkaller/bisect_bin/gcc-5.5.0/bin/gcc CC=/usr/local/google/home/dvyukov/syzkaller/bisect_bin/gcc-5.5.0/bin/gcc make CC=/usr/local/google/home/dvyukov/syzkaller/bisect_bin/gcc-5.5.0/bin/gcc  CFLAGS="-w -Wno-error=attributes -Wno-error=restrict" HOSTCFLAGS="-w -Wno-error=attributes -Wno-error=restrict" KERNELVERSION=syzkaller LOCALVERSION=-syzkaller

building on commits 910cd32e552ea09caa89cdbe328e468979b030dd and 4f4acc9472e54ce702f1d85fc9e6d57767dec91f.

$ /tmp/binsig vmlinux.910cd32e552ea09caa89cdbe328e468979b030dd
.text	21639290	8f65e23ec3380384230719e8c3602630802c9200
__ex_table	15232	ca10750e9b0deae09977e7ab63709abb9b28f154
.rodata	7485623	9887d04cf8c795c067cc1ed295e25312e6659886
__bug_table	128316	16fae33bff376f930fd34bd25ab13b5c00fa7bc7
.pci_fixup	12720	eeface69f50fc3e9f2c5fd6d38fdc8938356807f
.builtin_fw	792	a0a6bc32bc802ef7b6b9b1a6140b805bdc4ac437
.tracedata	120	edae3c071d566e365e4b558e07d185322d62592a
__ksymtab	100208	4dd35545e7372c3a7158a2c58b109ca420e31587
__ksymtab_gpl	87392	29a544238591a311411abf691c5d6cf61e59be44
__ksymtab_strings	231192	f71fb6329ee25a029001c3d41a3480e0b81e5c45
__init_rodata	168	7caf4e2d0fcf8d34310ed119489b68a5ad510e38
__param	32240	de182728331e61beebe02b4e0480568e753d2798
__modver	2600	875523441672ba66ec2f715876ce8cebefcbda07
.data	2270784	7726b563f1dfdefcd6b4207b9251ae419f373343
_kprobe_blacklist	32	49da0faf510a49f2b37bff6d2219fbbf20b6f39e
.vvar	4096	1ceaf73df40e531df3bfb26b4fb7cd95fb7bff1d
.data..percpu	100376	ad8045b0a9bc62691cddd6956fa1a28c2e2c334c
.init.text	594518	187ae776657c3cbfad91b567642f017d5804d072
.init.data	661216	9acfb550b6427eac3442b07dd882bd68c5983574
.x86_cpu_dev.init	24	95a4a1d1f72bb713f081a2e9dab97a6497a69f2a
.parainstructions	65420	18758b04e79e529e03f1ac0c3c368e5007092a24
.altinstructions	34320	5d56465475bf65d0a4a6ef4facab222880cc071d
.altinstr_replacement	8807	c78c8c3f198ec91caa2b29ed3b5a7996ec6909ee
.iommu_table	240	a65915a1db1134278ee57463cc6f02308e41edd5
.apicdrivers	40	8982fb06574f4932de44fd0070b28f0ca24bf696
.exit.text	31839	3d29137239e081f2a620328c5ad3c97d561346ae
.smp_locks	69632	a856b18c833bcc827f114cbf7d4644d44f1e43e5
.data_nosave	4096	1ceaf73df40e531df3bfb26b4fb7cd95fb7bff1d
88159ffa1c5f5b380d250a7e5d95608d6ea645a5

$ /tmp/binsig vmlinux.4f4acc9472e54ce702f1d85fc9e6d57767dec91f
.text	21639290	8f65e23ec3380384230719e8c3602630802c9200
__ex_table	15232	ca10750e9b0deae09977e7ab63709abb9b28f154
.rodata	7485623	5377417846e39eb55691e7b9392f4150e8238167
__bug_table	128316	16fae33bff376f930fd34bd25ab13b5c00fa7bc7
.pci_fixup	12720	eeface69f50fc3e9f2c5fd6d38fdc8938356807f
.builtin_fw	792	a0a6bc32bc802ef7b6b9b1a6140b805bdc4ac437
.tracedata	120	edae3c071d566e365e4b558e07d185322d62592a
__ksymtab	100208	4dd35545e7372c3a7158a2c58b109ca420e31587
__ksymtab_gpl	87392	29a544238591a311411abf691c5d6cf61e59be44
__ksymtab_strings	231192	f71fb6329ee25a029001c3d41a3480e0b81e5c45
__init_rodata	168	7caf4e2d0fcf8d34310ed119489b68a5ad510e38
__param	32240	de182728331e61beebe02b4e0480568e753d2798
__modver	2600	875523441672ba66ec2f715876ce8cebefcbda07
.data	2270784	7726b563f1dfdefcd6b4207b9251ae419f373343
_kprobe_blacklist	32	49da0faf510a49f2b37bff6d2219fbbf20b6f39e
.vvar	4096	1ceaf73df40e531df3bfb26b4fb7cd95fb7bff1d
.data..percpu	100376	ad8045b0a9bc62691cddd6956fa1a28c2e2c334c
.init.text	594518	187ae776657c3cbfad91b567642f017d5804d072
.init.data	661216	f48009c823b40f737b8c2680d125ddfcfd016bfa
.x86_cpu_dev.init	24	95a4a1d1f72bb713f081a2e9dab97a6497a69f2a
.parainstructions	65420	18758b04e79e529e03f1ac0c3c368e5007092a24
.altinstructions	34320	5d56465475bf65d0a4a6ef4facab222880cc071d
.altinstr_replacement	8807	c78c8c3f198ec91caa2b29ed3b5a7996ec6909ee
.iommu_table	240	a65915a1db1134278ee57463cc6f02308e41edd5
.apicdrivers	40	8982fb06574f4932de44fd0070b28f0ca24bf696
.exit.text	31839	3d29137239e081f2a620328c5ad3c97d561346ae
.smp_locks	69632	a856b18c833bcc827f114cbf7d4644d44f1e43e5
.data_nosave	4096	1ceaf73df40e531df3bfb26b4fb7cd95fb7bff1d
df4873e341de435636e894242dd5572aa96ebff5

I see small content diffs in .rodata and .init.data.
Actually trying full clean build on just 910cd32e55, I see the same.
Is deterministic kernel build a myth? Or we need more magic env vars?....

@dvyukov
Copy link
Collaborator Author

dvyukov commented Nov 27, 2019

The .rodata diff is:

003C 8640: D6 86 17 83 FF FF FF FF  D8 86 17 83 FF FF FF FF  ........ ........ 
003C 8640: D7 86 17 83 FF FF FF FF  D8 86 17 83 FF FF FF FF  ........ ........

.init.data:

000A 1670: B8 B8 9A 5B 1A B9 A2 89  1B 43 69 43 1C FA 4C 60  ...[.... .CiC..L`
000A 1680: 8C 94 D4 32 06 06 06 03  64 3B 8D C0 94 A1 05 0E  ...2.... d;......
000A 1690: AD 06 86 44 DA 69 8A C6  77 46 B2 53 3F 39 3F AF  ...D.i.. wF.S?9?.
000A 16A0: 38 3F 27 15 DD 6E 63 88  7F 9D 0D A8 EC 5F 98 5B  8?'..nc. ....._.[
000A 16B0: 0C 8A F2 F3 4B 50 EC 24  0C 88 55 87 0E 9C 60 8C  ....KP.$ ..U...`.
000A 16C0: 90 20 47 4F 1F D7 20 45  45 45 06 22 00 00 C2 CD  . GO.. E EE."....
000A 16D0: 46 4A 00 02 00 00 00 00  86 00 00 00 00 00 00 00  FJ...... ........
vs
000A 1670: B8 B8 9A 3B 1A 5B A0 89  1B 43 69 43 1C FA 4C 60  ...;.[.. .CiC..L`
000A 1680: 8C 94 D4 32 06 06 06 03  64 3B 8D C0 94 A1 05 0E  ...2.... d;......
000A 1690: AD 06 86 44 DA 69 8A C6  77 46 B2 53 3F 39 3F AF  ...D.i.. wF.S?9?.
000A 16A0: 38 3F 27 15 DD 6E 63 88  7F 9D 0D A8 EC 5F 98 5B  8?'..nc. ....._.[
000A 16B0: 0C 8A F2 F3 4B 50 EC 24  0C 88 55 87 0E 9C 60 8C  ....KP.$ ..U...`.
000A 16C0: 90 20 47 4F 1F D7 20 45  45 45 06 22 00 00 73 55  . GO.. E EE."..sU
000A 16D0: 87 63 00 02 00 00 00 00  86 00 00 00 00 00 00 00  .c...... ........
vs
000A 1670: B8 B8 9A 3B 1B 19 A1 89  1B 43 69 43 1C FA 4C 60  ...;.... .CiC..L`  
000A 1680: 8C 94 D4 32 06 06 06 03  64 3B C1 66 19 19 5A E0  ...2.... d;.f..Z.  
000A 1690: D0 6A 60 48 A4 9D A6 68  7C 67 24 3B F5 93 F3 F3  .j`H...h |g$;....  
000A 16A0: 8A F3 73 52 D1 ED 36 86  F8 D7 D9 80 CA FE 85 B9  ..sR..6. ........  
000A 16B0: C5 A0 28 3F BF 04 C5 4E  C2 80 58 75 E8 C0 09 C6  ..(?...N ..Xu....  
000A 16C0: 08 09 72 F4 F4 71 0D 52  54 54 64 20 02 00 00 22  ..r..q.R TTd ..."  
000A 16D0: 18 D6 FC 00 02 00 00 00  87 00 00 00 00 00 00 00  ........ ........  

@dvyukov
Copy link
Collaborator Author

dvyukov commented Nov 27, 2019

The .rodata diff points somewhere into kallsyms_addresses?

  [ 4] .rodata           PROGBITS         ffffffff82600000  01601000
       00000000007238b7  0000000000000000   A       0     0     256

ffffffff828ac968 R kallsyms_addresses
ffffffff829d7608 R kallsyms_num_syms

@dvyukov
Copy link
Collaborator Author

dvyukov commented Nov 27, 2019

For .init.data I can't figure what it belongs to:

  [20] .init.data        PROGBITS         ffffffff830d7000  0208d000
       00000000000a16e0  0000000000000000  WA       0     0     4096

That's 0xffffffff830d7000 + 0xa1670 = 0xffffffff83178670
which contains some text?

ffffffff83178640 0000000000000008 t __initcall_apparmor_init
ffffffff83178648 0000000000000008 t __initcall_integrity_iintcache_init
ffffffff83178650 T __initramfs_start
ffffffff83178650 t __irf_start
ffffffff83178650 T __security_initcall_end
ffffffff831786d7 t __irf_end
ffffffff831786d8 T __initramfs_size
ffffffff831786e0 0000000000000008 r __cpu_dev_intel_cpu_dev
ffffffff831786e0 R __x86_cpu_dev_start
ffffffff831786e8 0000000000000008 r __cpu_dev_amd_cpu_dev
ffffffff831786f0 0000000000000008 r __cpu_dev_centaur_cpu_dev
ffffffff831786f8 R __parainstructions
ffffffff831786f8 R __x86_cpu_dev_end
ffffffff83188684 R __parainstructions_end
ffffffff83188688 R __alt_instructions
ffffffff83190c98 R __alt_instructions_end

dvyukov added a commit that referenced this issue Nov 27, 2019
3/4 added tests currently fail.
The problem is that we don't collect results
from initial bisection range detection.
As the result we won't detect "same binary"
for release and first commit in a release.

Update #1271
dvyukov added a commit that referenced this issue Nov 27, 2019
They should have been detected by "same binary" logic.
But the problem is that we may use different compilers
for different commits and they switch exactly at release commits.
So we can build the release with a differnet compiler than the
rest of commits and then obviously it won't be "same binary".
Detect release commits separately.

Update #1271
@dvyukov
Copy link
Collaborator Author

dvyukov commented Dec 4, 2019

Weird case or I don't understand something:
https://groups.google.com/d/msg/syzkaller-bugs/8gr_KXp-SYc/O_KcmdN5AQAJ
This is bisected to 2de50e9674fc4ca3c6174b04477f69eb26b4ee31
and it has this parent:
$ git log -n 1 --format="%P" 2de50e9674fc4ca3c6174b04477f69eb26b4ee31
388f7b1d6e8ca06762e2454d28d6c3c55ad0fe95
But the parent was never tested... how is this possible?

Another similar case:
https://syzkaller.appspot.com/bug?extid=4455ca3b3291de891abc
Points to 2de50e9674fc4ca3c6174b04477f69eb26b4ee31
$ git log -n 1 --format=%P 2de50e9674fc4ca3c6174b04477f69eb26b4ee31
388f7b1d6e8ca06762e2454d28d6c3c55ad0fe95
which was not tested...

@dvyukov
Copy link
Collaborator Author

dvyukov commented Dec 5, 2019

Moved the git bisect parent commit issue to #1527 (too much context and info).

@dvyukov
Copy link
Collaborator Author

dvyukov commented Jan 8, 2020

Another non-deterministic build:
https://groups.google.com/g/syzkaller-bugs/c/XEbliACT0zA/m/LIKSSEccCgAJ

@hogander-unikie
Copy link
Contributor

init.data part of this bug could be due to initramfs which is part of vmlinux. if CONFIG_BLK_DEV_INITRD is enabled in configuration and CONFIG_INITRAMFS_SOURCE is not -> small default initramfs is generated. This is in usr/Makefile. For some reason content of this generated initramfs in init.data is varying from build to build. This could be easily get rid of by disabling CONFIG_BLK_DEV_INITRD from syzbot configs?

@dvyukov
Copy link
Collaborator Author

dvyukov commented Oct 2, 2020

Good find!
Indeed I see diff in usr/initramfs_data.cpio after just removing old one and re-running make.
We can drop CONFIG_BLK_DEV_INITRD as we don't use initramfs, but wonder if the non-determinism it fixable easily maybe.

@dvyukov
Copy link
Collaborator Author

dvyukov commented Oct 2, 2020

I guess it's this guy in usr/gen_init_cpio.c

	default_mtime = time(NULL);

which means we can't fix it in the older kernels. I hoped for some env var or something similar.

@dvyukov
Copy link
Collaborator Author

dvyukov commented Oct 2, 2020

I think we need to remove CONFIG_BLK_DEV_INITRD in pkg/vcs/linux.go:linuxAlterConfigs. This will auto-heal all bisections even with already stored configs with CONFIG_BLK_DEV_INITRD=y.
Probably better to drop CONFIG_BLK_DEV_INITRD from current configs as well.

Jouni you previously provided a link to some page called along the lines of "Deterministic Kernel Builds". Can make sense to notify that group about the issue with CONFIG_BLK_DEV_INITRD.

@hogander-unikie
Copy link
Contributor

Jouni you previously provided a link to some page called along the lines of "Deterministic Kernel Builds". Can make sense to notify that group about the issue with CONFIG_BLK_DEV_INITRD.

Yes, I will ping them or update the document by myself.

@dvyukov
Copy link
Collaborator Author

dvyukov commented Oct 2, 2020

I think we need to remove CONFIG_BLK_DEV_INITRD in pkg/vcs/linux.go:linuxAlterConfigs. ...

@hogander-unikie will you send a PR for this? Want to understand if I need to keep track of this or I can wait for a PR.

@hogander-unikie
Copy link
Contributor

I think we need to remove CONFIG_BLK_DEV_INITRD in pkg/vcs/linux.go:linuxAlterConfigs. ...

@hogander-unikie will you send a PR for this? Want to understand if I need to keep track of this or I can wait for a PR.

Yes, I will prepare pull request for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants