Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic - 'vdso symbol __vdso_clock_getres's real size is 10 bytes, but trying to replace it with 16 bytes' #16

Closed
1 of 6 tasks
androm3da opened this issue Nov 23, 2022 · 15 comments
Labels
bug Something isn't working

Comments

@androm3da
Copy link

androm3da commented Nov 23, 2022

Describe the bug

I hit this panic using hermit built from 95b3ac7. The development node I'm allocated is a VM and has very limited PMU modeled. That may not be related to the vdso panic though.

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.6 LTS
Release:	18.04
Codename:	bionic
$ cat /proc/version
Linux version 5.4.0-120-generic (buildd@lcy02-amd64-037) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #136~18.04.1-Ubuntu SMP Fri Jun 10 18:00:44 UTC 2022

2022-11-23T01:04:51.638065Z  WARN reverie_ptrace::perf: Pmu bugs detected: HardwareCountersNotWorking { actual_events: 0, expected_min_events: 500, config: 5308625 }
thread 'main' panicked at 'vdso symbol __vdso_clock_getres's real size is 10 bytes, but trying to replace it with 16 bytes', /local/mnt/workspace/install/rust/git/checkouts/reverie-9a587e40a0d7d3be/6f03658/reverie-ptrace/src/vdso.rs:162:17

Indicate any of these common scenarios that apply:

  • a program hangs under hermit
  • hermit panics internally
  • hermit runs the program but divergence (nondeterminism) occurs

To Reproduce
Minimal input to reproduce the behavior.

Expected behavior
A clear and concise description of what you expected to happen.

Environment

  • Linux kernel version (uname -a):
  • CPU version (/proce/cpuinfo):
  • Linux distro flavor (/etc/issue, /etc/redhat-release):

Additional context
Attach the logs to this issue as a text file generated by hermit --log=trace --log-file=FOO run.

Add any other context about the problem here.

hermit_trace.log

@androm3da androm3da added the bug Something isn't working label Nov 23, 2022
@arjo129
Copy link

arjo129 commented Nov 23, 2022

I have the same problem on Ubuntu 22.04

Linux version 5.15.0-53-generic (buildd@lcy02-amd64-047) (gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #59-Ubuntu SMP Mon Oct 17 18:53:30 UTC 2022

It does feel like the error is coming from reverie though.

@EspenG
Copy link

EspenG commented Nov 23, 2022

I see a very similar issue. attached is attached trace log.
trace.txt

Environment:

$ uname -a
Linux eg 5.15.0-46-generic #49-Ubuntu SMP Thu Aug 4 18:03:25 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Ubuntu 22.04.1 LTS

proc cpuinfo

@samth
Copy link

samth commented Nov 23, 2022

This is really a bug in reverie (it reproduces running the reverie strace implementation on ls). It affects both clock_gettime and gettimeofday for me, but not clock_getres. I tested this by commenting out the entries for those two vdso entries in VDSO_SYMBOLS in reverie-prtrace/src/vdso.rs, after which I was successfully able to strace /bin/ls.

@samth
Copy link

samth commented Nov 23, 2022

The following change fixes reverie for me: samth/reverie@a7f6cae but I cannot be sure if it's right.

@jasonwhite
Copy link
Contributor

I have an upstream fix for this in Reverie that should be getting synced into the repo within the next hour or so. @samth was on the right on the money with the fix. The NOP padding at the end of the VDSO patches was unnecessary and can just be removed. I also have a fix for the getcpu vdso patch. I'll update this thread when it the fix is in.

Thanks @androm3da for reporting this issue! Keep those bug reports coming! :)

@samth
Copy link

samth commented Nov 23, 2022

I don't think this actually fixes the issue. The problem is now 5 bytes vs 8 bytes, but it still errors. You need the changes to handle things being up to 8-byte aligned that are in my patch, or something different (I still don't know if that change is right).

@jasonwhite
Copy link
Contributor

@samth Not quite sure I follow. Does facebookexperimental/reverie@debce82 not fix the issue? The vdso patches are now 8 bytes instead of 16 (not 5 bytes).

@samth
Copy link

samth commented Nov 24, 2022

Right, that commit does not fix the problem. On my system the original vdso entry is 5 bytes.

@jasonwhite
Copy link
Contributor

Ohh, now I understand. I thought you meant the patch was 5 bytes. That's a pretty small vdso entry size. What distro+version are you running? And what is the kernel version? I'd like to see what those entries are actually doing. Maybe we don't really need to patch them.

@jasonwhite jasonwhite reopened this Nov 24, 2022
@samth
Copy link

samth commented Nov 24, 2022

It's the same machine as this issue: #18

Ubuntu 22.10 and 5.19.0 is the short answer.

@EspenG
Copy link

EspenG commented Nov 24, 2022

fix does not work for me neither.

thread 'main' panicked at 'vdso symbol __vdso_clock_gettime's real size is 5 bytes, but trying to replace it with 8 bytes', /home/eg/.cargo/git/checkouts/reverie-9a587e40a0d7d3be/c448d10/reverie-ptrace/src/vdso.rs:148:17

@jasonwhite
Copy link
Contributor

I was able to reproduce on an Ubuntu 22.04 VM. This is the disassembly of gettimeofday and clock_gettime:

0000000000000bd0 <__vdso_gettimeofday@@LINUX_2.6>:
 bd0:   e9 4b fe ff ff          jmp    a20 <LINUX_2.6@@LINUX_2.6+0xa20>
 bd5:   66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
 bdc:   00 00 00 00

0000000000000c10 <__vdso_clock_gettime@@LINUX_2.6>:
 c10:   e9 9b fb ff ff          jmp    7b0 <LINUX_2.6@@LINUX_2.6+0x7b0>
 c15:   66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
 c1c:   00 00 00 00 

In these implementations, it's just a jmp to another internal function. Seems like the inner function didn't get inlined. Luckily, since both of these functions are aligned to 16 bytes via padding, they should be safe to patch. A real fix is landing soon.

facebook-github-bot pushed a commit to facebookexperimental/reverie that referenced this issue Nov 29, 2022
Summary: Helper script for debugging facebookexperimental/hermit#16.

Differential Revision: D41565407

fbshipit-source-id: 01cd3121dde671a563a8d2674cc42bfa6bce1226
@androm3da
Copy link
Author

I am using commit 159c343 and I'm on ubuntu 20.04

$ cat /proc/version
Linux version 5.4.0-122-generic (buildd@lcy02-amd64-095) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)) #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022

I did a cargo clean && cargo build && ./target/debug/hermit run --chaos ls -- this seems to fail the same way it did before. Would those steps get this fix from reverie mentioned above (facebookexperimental/reverie@fa44c91) or do I need to purge some cache somewhere?

@jasonwhite
Copy link
Contributor

@androm3da Try deleting Cargo.lock and doing the build again. (I don't think cargo clean will delete it.) Then, Cargo should pull down the latest commit.

Also, for reference, the commit with the fix is facebookexperimental/reverie@5478e47.

@androm3da
Copy link
Author

Try deleting Cargo.lock and doing the build again

This did the trick, tyvm @jasonwhite

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants