Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
What to do about the zero terminator of a string in a Span? #273
The use of Span has raised some interesting questions.
In the SQLite API, when a parameter is a string, it is usually just a pointer, with the (documented) expectation that the string ends with a zero terminator byte.
So, at a higher layer, if we use a
Suppose the string is "hello". That's 5 bytes of actual string plus one byte for the zero terminator. We could pass a span of length 5 or a span of length 6. It doesn't matter to SQLite, because it's just going to look for that zero byte anyway.
Heck, we could pass a span of length 1, and it still wouldn't matter. But I mention that just to illustrate the problem. Clearly, if we're going to use a span, we should have the length be correct.
But which is more correct? Should the Length include the zero terminator or not?
Including the zero byte in the length seems wrong because the zero is not part of the string data. If we called strlen() on the pointer, it would return a length which did not include the zero. If we passed this pointer to System.Text methods to convert it from UTF-8 to System.String, the length we provide should not include the zero byte. The length of the string "hello" is 5.
However, it also seems wrong to not include the zero byte in the length. The zero byte may not be part of the string, but it is a documented and required part of the block of bytes the function expects. And if we don't include the zero byte, then we are passing a span to a function which is going to walk beyond the end of that span's length, which seems wrong. The length of the zero-terminated block of memory that represents "hello" is 6.
At the moment, the solution I have chosen is to include the zero byte in the span length. And the provider code checks to make sure the last byte of the span is a zero.
But this is only half the story. We have a similar but slightly different set of problems for cases where SQLite is returning a string or passing one in a callback.
In this case, SQLite itself is responsible for putting that zero byte in place, and it does. But when we construct a
In this case, a big difference is that the code that receives that span is likely to use the Length and ignore the presence of any zero terminator For example, using System.Text functions to convert from UTF-8 to a string, the span Length must be correct and must not include the zero terminator.
If we did include the zero terminator in this upward-moving set of strings, then most cases that use the spans would need to subtract one from the Length before using it. And should it check to make sure the last byte actually is a zero before doing so?
So for the moment, the solution I have chosen is to NOT include the zero terminator when constructing a span for a string that is coming from SQLite.
Neither of these decisions seems correct. Each one merely seems to be the least incorrect option.
And then there is the inconsistency. I have two very similar questions, but I came up with two different answers. That too seems wrong. For example, as things stand right now, if you receive a string from SQLite as a span and then pass that span directly back to another SQLite function, it won't work.
For public .NET Core APIs that take or return
The internal .NET Core implementation has a few places that create and operate on Spans that are zero terminated. This is local internal implementation detail that does not leak out through public APIs.
P/Invoke methods cannot take Spans directly today.
I think you are really asking whether you should include zero byte in the Span length in your internal implementation details that prepare string passed to PInvoke. It is up to you and what makes sense for the type of code that you are writing.
For public APIs, I would recommend to be on the same plan as public .NET Core APIs.
For others who might visit this issue, I want to add a bit more explanation for why "be on the same plan as public .NET Core APIs" doesn't seem helpful in this context.
The underlying C function requires a zero terminated string. That's just the way it is.
So let's say I expose a C# wrapper for that function which takes a Span, and to be consistent with .NET practices, the caller is not expected to include a zero terminator.
What that means is that I have to make a copy of the data in order to append the zero terminator before passing it down to the unmanaged function.
And this would defeat the whole point of using Span.
A tidbit of good news noticed by sgjennings on Twitter:
This writeup on the upcoming Utf8String type:
contains this bit:
"ensuring a null terminator (important for p/invoke scenarios)"
which indicates an awareness of this issue.
I was aware of Utf8String, but had not yet seen any indication that dealing with zero terminator cases was (or was not) being considered as part of the design.
The notion here is that
As mentioned in the commit message, this is an experimental idea. I'm just throwing some code around to see what works.
After continuing the
I'm starting to believe the best practice here is to NOT publicly use
As an alternative, the
This may in fact be what @jkotas meant when he said "in your internal implementation details that prepare string passed to PInvoke", but I initially didn't understand it because I had not yet implemented my own
Compared to the Span approach, afaik, Utf8String won't allow you to wrap native memory. Nor can you stackalloc the string.
In a callback the
Yeah, Utf8String (someday) will be complementary, but occupies a different place in the world. I'm trying to make it possible to use SQLitePCLRaw in a situation where everything is UTF-8, with minimal conversion and copying. For now, if the enclosing app or library is built around System.String, it won't be able to take full advantage, but I'll be more ready for the day when Utf8String is available.
Yeah, the method I experimentally converted over was already returning