-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: vm: split no output from test machine errors #4808
Conversation
It's quite confusing that we in fact combine two kinds of problems into one: 1) We are no longer able to execute syzkaller-generated programs. 2) The VM has hanged. Let's use different error messages and different timeouts for these issues. In the first case, keep on monitoring for the program execution logs. In the second case, look at any output from the VM. Additionally, attempt periodic SSH connections. Now it becomes possible to test (2) using C reproducers.
vmDiagnosisStart = "\nVM DIAGNOSIS:\n" | ||
lostConnectionCrash = "lost connection to test machine" | ||
noOutputCrash = "no output from test machine" | ||
executionStalledCrash = "execution stalled" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be understandable by kernel developers who know nothing about syzkaller. All they will get is this string and likely not much else (no reproducer, no report).
} | ||
|
||
func (mon *monitor) monitorExecution() *report.Report { | ||
ticker := time.NewTicker(tickerPeriod * mon.inst.pool.timeouts.Scale) | ||
defer ticker.Stop() | ||
|
||
alive := make(chan bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change needs a test.
I don't understand the exact difference. In the first case it's also likely hanged, in the second case it's also not executing programs. So both cases are no output and hanged. We can create C reproducers for both of these failures already, no? |
The periodic
In the light of (2) and that we have explicitly tried to avoid running C reproducers for more than 5 minutes not to hit the no output timeout, it's very strange to see C reproducers in https://syzkaller.appspot.com/bug?extid=2e40940976be9f8fce8ba3d1d03b77aee9f4df9d. They should have remained syz reproducers. So the idea here is to detect kernel hangs more precisely: there must be no output at all and the VM should not accept any new connections anymore. It's the actual Independently from that, if we test a syz reproducer, we can look for |
We print "executing program" in C reproducers:
|
Hmm, interesting. I've just looked at ~10 random C repoducers from syzbot and I see |
Either they were not repeating, or something broke. |
The flag is indeed set before we start reproduction: syzkaller/pkg/csource/options.go Line 172 in 4130c19
Bug then we clear it right after we have found a reproducer: Lines 238 to 242 in 4130c19
We form the actual repro C code right before sending it to the dashboard, now with syzkaller/syz-manager/manager.go Line 1068 in 4130c19
It's this way for a very long time already (?), so we likely don't have UPD: sent #4816 |
It's quite confusing that we in fact combine two kinds of problems into one:
Let's use different error messages and different timeouts for these issues.
In the first case, keep on monitoring for the program execution logs. In the second case, look at any output from the VM. Additionally, attempt periodic SSH connections.
Now it becomes possible to test (2) using C reproducers.
TODO: