Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider changing the treatment of null pointers. #137

Open
dtarditi opened this issue Apr 4, 2017 · 3 comments
Open

Consider changing the treatment of null pointers. #137

dtarditi opened this issue Apr 4, 2017 · 3 comments

Comments

@dtarditi
Copy link
Contributor

dtarditi commented Apr 4, 2017

Overview

I am concerned that the treatment of null pointers in Checked C will lead to too many runtime checks. We have been implementing the runtime checks required by the current Checked C specification. At memory accesses using an array_ptr, there would be a null pointer check followed by a bounds checks. At pointer arithmetic involving array_ptr, there will also be a non-null check before the pointer arithmetic operation. There will be a lot of checking.

The problem is the semantics that we’ve chosen for bounds when null pointers are around: a pointer is either null or has valid bounds. The problem is that this means that a null pointer may not have valid bounds. From Section 3.1 of the Checked C v0.6 specification:

The meaning of a bounds expression can be defined more precisely. At runtime, given an expression e with a bounds expression bounds(lb , ub ), let the runtime values of e , lb , and ub be ev ,lbv , and ubv , respectively. The value ev will be 0 (null) or have been derived via a sequence of
operations from a pointer to some object obj with bounds(low , high ). The following statement will be true at runtime: ev == 0 || (low <= lbv && ubv <= high ). In other words, if ev is null, the bounds may or may not be valid. If ev is non-null, the bounds must be valid. This implies
that any access to memory where ev != 0 && lbv <= ev && ev < ubv will be within the bounds of obj .

We chose this definition because C treats null pointers as interchangeable with other pointers. The definition results in less work and typing when converting programs. However, it has led to several issues in the semantics:

  • We can’t allow arithmetic involving a null pointer because that could lead to the forging of a non-null pointer with invalid bounds. This is why we need runtime checks on pointer arithmetic.
  • We “lose” bounds information when a pointer becomes null.

Proposal

We’re running into problems because we’re trying to combine bounds checking and the handling of null pointers. The fact that C pointers can either be null or point to valid objects is a source of complexity when reasoning about C programs.

I propose that we adapt the idea of nullable pointers to Checked C. We would use types to distinguish between the different ways in which null will be allowed or handled:

  • ptr values must point to valid objects that can hold values of type T. ptr values cannot be null.
  • array_ptr values can point anywhere in memory or be null. Bounds for array_ptr values must always be valid (a subrange of a valid object). This restricts when array_ptr values that have bounds can be null. It also prevents array_ptr values that are null from being used to access memory. Null is not within the range of any object, so bounds checks will always fail. No runtime checks are needed for pointer arithmetic.
  • We introduce a nullable modifier that can be applied to ptr and array_ptr types.
  • For a pointer of type nullable ptr<T>, a runtime null check is done before accessing memory.
  • For a pointer of type nullable array_ptr<T>, a runtime null check is done before accessing memory. The runtime null check precedes the bounds check. The bounds for a nullable array_ptr<T> are only required to be valid when the value is non-null.
  • Null pointer constants have empty bounds (corresponding to the empty object) instead of having ‘any’ bounds.
  • We may decide to allow conditional bounds expressions. I’d prefer to put this off for now.

Examples

It is a valid to assign a ptr variable a value that is guaranteed to be non-null. The following declarations and assignments are valid:

int y;
ptr<int> px = &y;
int arr[10];
px = &arr[5];

It is not valid to assign a ptr variable a value that is null. The following will be rejected at compile time:

ptr<int> px = NULL;

void f(int *a) {
  ptr<int> p = &*a;  // a could be null and a may not have valid bounds. 
}

It is valid to assign null to an array_ptr variable with bounds, if the bounds are empty:

int len = 0;
array_ptr<int> x : count(len) = NULL;

The empty bounds are a subrange of any valid object.

It is invalid to assign to null to an array_ptr variable with non-empty bounds. This declaration is invalid:

array_ptr<int> x : count(5) = NULL;

bounds(NULL, NULL + 5) is not a subrange of a valid object.

It is valid to assign null to an nullable array_ptr variable with non-empty bounds. This declaration is valid:

nullable array_ptr<int> x : count(5) = NULL

Additional thoughts

  • There is another way to understand why values with ptr cannot be null. The declaration ptr<T> x is equivalent to array_ptr<T> x : count(1). The bounds (NULL, NULL + 1) are invalid because no valid object includes NULL in it is bounds.
  • ptr values become pointers that can be used unconditionally (without runtime checks).
  • array_ptr only requires bounds checks.

Bounds-safe interfaces

My strawman proposal is to allow the keyword nullable to precede the in-line bounds declaration for an unchecked pointer type. For example:

void *calloc(size_t num, size_t size) : nullable byte_count(num * size);

This implies in a checked context that calloc returns a nullable array_ptr<void>.

For interface types, nullable can be applied as a type qualifier to _Ptr types. For example, the bounds-safe interface for the string-to-double function would be:

double strtod(const char * restrict nptr,
                char ** restrict endptr : itype(restrict _Nullable _Ptr<char *>));

If endptr is non-null, strtod returns the location where the conversion stopped by modifying *endptr.

Conversions

  • ptr values and array_ptr values can always be converted to nullable ptr and nullable array_ptr, respectively.
  • The reverse conversion (from nullable ptr and nullable array_ptr to ptr and array_ptr, respectively) is allowed only when it provable that the value being converted is not null.
  • Conversions from array_ptr to ptr continue to require that the array_ptr have bounds large enough to hold the ptr value.

Next steps

I modified the Checked C wrappers from the C standard library to add nullable type modifiers where necessary. I didn’t modify functions involving strings because we haven’t added support for null-terminated arrays. The results are on Github at https://github.com/dtarditi/checkedc/tree/nullable. There are two quick take-aways:

  • Most functions aren’t expecting or prepared to handle a null pointer : nullable modifiers were not needed in too many places.
  • It makes the interface descriptions more precise. This is no surprise; comparisons with SAL may arise. It seems better to have machine-checkable descriptions than to rely on imprecise English descriptions.
@dtarditi dtarditi changed the title Consider changing the treatment of null pointers in Checked C. Consider changing the treatment of null pointers. Apr 4, 2017
@lenary
Copy link
Collaborator

lenary commented Apr 5, 2017

Two things we need to think about, as brought up in the meeting today:

  • Array to Pointer Decays: What kind of pointers do arrays become? This probably changes when we consider local arrays vs global arrays vs parameter arrays (which "decay" in the declaration to array_ptrs)
  • Function Pointers: I assume named functions become non-null ptrs to functions, and nullary ptrs to functions get a dynamic check before every call.

@dtarditi
Copy link
Contributor Author

There was additional feedback from the meeting that it would be useful to understand where nullable pointers would be useful. The conjecture was that nullable pointers would be used a lot in data structures, but not used that much for local variables. A suggestion was that it would be useful to take some real-world code (such as OpenSSL) and mock up part of it with the proposed changes.

I think one reason why nullable pointers might be used in data structures is that requiring that members have non-nullable pointers means that we need to check that members are initialized properly to non-null before they are used. This would mean expanding the treatment of initialization of data structures in the Checked C specification - zeroing allocated data would not be sufficient. For objects, we would have to make sure that an object with a non-null ponter does not escape before it is initialized. We would probably also need a flow-sensitive treatement of initialization of variables and data structures.

@dtarditi
Copy link
Contributor Author

After thinking about this, we decided that this would be a substantial language change that would require some effort to implement in the Checked C version of clang. We think it is more useful to get a working compiler with runtime checking first before making this language change, that is, implement the version 0.6 spec before making substantial changes to the language extension in this area.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants