-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode in kernel print()s panics comm CPU (Utf8Error) #1379
Comments
It might be this line, but please decode the backtrace. The idea is that valid UTF-8 goes in, any partially overwritten characters are left out, valid UTF-8 goes out, so that |
Ah, the second chunk of lines in the above is already the decoded backtrace. |
Right, OK. I'm not sure what causes this. It'd be necessary to apply a local patch and take a look at the data, unless you see any obvious bugs in the log_buffer code. |
This turns out to be easy to reproduce, or at least an adjacent bug is: Just print a non-ASCII character from the kernel, e.g. from artiq.experiment import *
class UnicodeKernelLog(EnvExperiment):
def build(self):
self.setattr_device("core")
@kernel
def run(self):
print("σ") |
(This lead to:)
|
The generated IR seems obviously wrong: @S.nn = private unnamed_addr constant [3 x i8] c"n:n"
@S.sn = private unnamed_addr constant [3 x i8] c"s:n"
@S. = private unnamed_addr constant [2 x i8] c"\CF\83"
@typeinfo = local_unnamed_addr global [1 x %D.11*] zeroinitializer
[…]
define private fastcc void @_Z49artiq_run_unicode_kernel_log.UnicodeKernelLog.runzz() unnamed_addr #0 personality i32 (...)* @__artiq_personality !dbg !11 {
entry:
%rpc.ret.alloc = alloca {}, align 4, !dbg !13
%.9 = alloca { i8*, i32 }, align 4, !dbg !13
%.9.repack = getelementptr inbounds { i8*, i32 }, { i8*, i32 }* %.9, i32 0, i32 0, !dbg !13
store i8* getelementptr inbounds ([3 x i8], [3 x i8]* @S.sn, i32 0, i32 0), i8** %.9.repack, align 4, !dbg !13
%.9.repack1 = getelementptr inbounds { i8*, i32 }, { i8*, i32 }* %.9, i32 0, i32 1, !dbg !13
store i32 3, i32* %.9.repack1, align 4, !dbg !13
%rpc.stack = call i8* @llvm.stacksave(), !dbg !13
%rpc.args = alloca i8*, align 4, !dbg !13
%rpc.arg0 = alloca { i8*, i32 }, align 4, !dbg !13
%rpc.arg0.repack = getelementptr inbounds { i8*, i32 }, { i8*, i32 }* %rpc.arg0, i32 0, i32 0, !dbg !13
store i8* getelementptr inbounds ([2 x i8], [2 x i8]* @S., i32 0, i32 0), i8** %rpc.arg0.repack, align 4, !dbg !13
%rpc.arg0.repack2 = getelementptr inbounds { i8*, i32 }, { i8*, i32 }* %rpc.arg0, i32 0, i32 1, !dbg !13
store i32 1, i32* %rpc.arg0.repack2, align 4, !dbg !13
%0 = bitcast i8** %rpc.args to { i8*, i32 }**, !dbg !13
store { i8*, i32 }* %rpc.arg0, { i8*, i32 }** %0, align 4, !dbg !13
call void @rpc_send(i32 2, { i8*, i32 }* nonnull %.9, i8** nonnull %rpc.args), !dbg !13 σ should be U+03C3, and is |
Python disagrees:
|
Ah, interesting; then just the length is wrong. Edit: Fixed the above; I had accidentally used upper-case sigma (U+03A3). If I read the IR correctly, it's still a byte short, though. |
It still seems wrong that prints from the kernel CPU (invalid UTF-8 or not) can crash the comms CPU. Perhaps the Request::PullLog handler should just forward the bytes, and let the client code deal with the fallout? Inserting replacement characters at this stage is a little annoying to implement (string length changes, so we'd need to transcode directly into the socket buffer). |
Inserting invalid UTF-8 into a Rust |
Hmm, wouldn't all that is required be to not needlessly convert user data that is just ferried between host and kernel from &[u8] to str (and variants)? In other words, treat it as untrusted? Something along these lines (incomplete):
Will finish this up tomorrow… |
(NB: This issue now conflates three separate issues: Handling of kernel-generated strings on the comm CPU, the print() RPC literal argument issue, and whatever else (truncation/corruption?) the origin of the log forwarding crash might be.) |
Is there a way to reproduce the origin log forwarding crash? |
Fixed the typo in the above message; I simply meant the origin of the log forwarding crashes we have been seeing. To reproduce it, you'd need some way of generating an ARTIQ Python string that's not valid UTF-8, and then just log it. You could get it by running my above test case without your #1990, or perhaps generating an invalid string in a class attribute or RPC return value (though those might have some checks in place). As for the actual root cause of the crashes we've been sporadically seeing (i.e. the reason for the data to be invalid UTF-8 to begin with, not why it causes crashes), I'm not sure. There are, unfortunately, several soundness bugs in the ARTIQ Python memory lifetime analysis, so it could e.g. be memory corruption bugs caused by any of these. For instance, IIRC of them was related to corruption of exception messages when catching/re-raising exceptions before the rewrite that made them statically fixed. Ideally, there should never be any invalid strings in ARTIQ Python, of course, but in practice, I'd argue that the runtime crashing without any (even partial) indication of the source of the invalid data is much worse than just passing it through to the host (which is faster as well!), and dealing with it there (e.g. by inserting replacement characters). |
I don't think changing that behavior would be meaningful and possible, since logging relies on the third-party's |
Bug Report
One-Line Summary
The core device (current master) crashes every once in a while with a Utf8Error.
Issue Details
This is a bit hard to reproduce, but our core devices currently crash fairly reliably after a while (maybe once every day or two) with a Utf8Error. The backtrace is as follows:
->
This is in the core log forwarding handler. At this point, I'm not sure what the actual issue is: whether, say, memory corruption leads to invalid UTF-8 or reading beyond the initialised buffer, or e.g. a long backlog just gets truncated at a multi-byte boundary.
Clearly, crashing in the log forwarding code isn't very helpful, though, as it obscures the actual problem. I'd argue that we probably shouldn't be validating UTF-8 here at all for performance, but at the very least we should be inserting replacement characters (U+FFFD) for invalid UTF-8 rather than panicking (and thus eliminating any trace as to what could be going on).
Your System (omit irrelevant parts)
The text was updated successfully, but these errors were encountered: