Skip to content

Detailed status

David Tarditi edited this page Nov 21, 2017 · 11 revisions

We are posting detailed status updates on this page.

2017 Oct. 17 to 2017 Nov. 20

Implementation status

Significant advances in the handling of null-terminated string and pointers and checking bounds declarations, as well as a complete set of checked headers for the C standard library:

  • Renamed bound(none) to bounds(unknown). This terminology is clearer.
  • String literals with a checked type. In checked scopes, the type of string literals is now "checked null-terminated array of character". This allow them to be used in checked scopes.
  • Better interoperation of array literals with checked code. Array literals can be converted implicitly to checked pointer types, when they occur in a context where an expression with checked pointer type is expected.
  • Implemented bounds checking for null-terminated pointers. The element just at the upper bounds can be read so that it can be checked for NUL.
  • Default bounds for null-terminated pointers of count(0). If no bounds are specified, use count(0) instead of bounds(unknown).
  • Correctly determine type of integer literals for inferred bounds expressions for array types. For bounds for array types, we were always typing the size as an unsigned long long integer. This caused subtle problems when comparing expressions lexically because the C rule is to use the smallest sized integer type for an integer.
  • Bounds declaration checking for constant-sized ranges. The specification requires checking that declared bounds for the left-hand side of an initializer or assignment be implied by the right-hand side. The compiler now identifies bounds of the form (e1 + c1, e1 + c2), where c1 and c2 are constants. It reasons about these ranges to sometimes prove (or disprove) that the left-hand side bounds imply the right-hand sound bounds (the analysis returns true, false, or maybe).
  • Warn about out-of-bounds accesses. The compiler uses the same analysis to determine when a memory access will definitely be out-of-bounds. This is helpful for accesses beyond the bounds of null-terminated pointers with the default bounds of count(0).
  • Warn about unprovable bounds declarations in checked scopes. The compiler now always warns about bounds declarations that are not provably true or false in checked scopes. It is still off-by-default for unchecked scopes.
  • Update the Checked C header files for the C standard library to include bounds declarations for null-terminated strings. There are now bounds-safe interfaces for the entire C standard library. Most functions can be used in a checked scope, except for varargs functions and a few unsafe string functions.
  • Improve Fortify source support for the Checked C headers.

Project news

  • A revised paper on Checked C has been submitted to a conference: http://www.cs.umd.edu/~mwh/papers/ruef17checkedc.html. This paper is more focused on the language than the prior paper.
  • Added examples to the Checked C repo of some code from Kernighan and Ritchie's book converted to Checked C.

Language design news

  • Replace bounds(none) with bounds(uknown).
  • Include null terminators in the number of elements for null-termianted arrays.

2017 Oct. 3 to 2017 Oct. 16

Implementation status

  • Completed parsing and type checking for new Checked C null-terminated array types (nt_checked) and pointers to null-terminated arrays (nt_array_ptr) This included implementing the implicit conversion rules between pointer types. Most of the work involved extending the existing Checked C tests for array_ptrs to test nt_array_ptrs.
  • Finish support for literals and initializers with checked types.
    • In checked scopes, give string literals null-terminated array types.
    • Allow string or array literals to convert implicitly to checked pointer types at assignments, function calls, and within initializers. This includes allowing conversions to null-terminated pointers (the literal has to be null-terminated - we still need to implement that check - tracked by issue #397).
    • Checked array initializers already mostly worked. Fixed an issue where initializers for incomplete checked arrays did not work properly (int arr checked[] = { 0, 1, 2, 3};). The type is supposed to be completed based on the number of elements in the initializer. However, the checked property was being dropped when the array type was completed, causing a type checking complaint when the array variable was used in a checked scope.
    • In checked scopes, require that the element types of compound array literals be checked (or have bounds-safe interfaces).
  • Disallow array_ptrs of function types. Programmers should use ptrs to function types instead. Functions have indeterminate sizes, so bounds checking does not make sense for them.
  • Improved the error messages for uses in checked scops of unchecked types in C-style casts. For complex constructed types, it might not be clear what the problem is. We now provide similar diagnostics to those provided for declarations and issue a note identifying the problematic type if it isn't obvious.

Language design status

  • Updated specification to disallow array_ptrs to function types.

2017 Sept. 19 to 2017 Oct. 2

Implementation status

  • We now enforce that all uses of variables and functions with bounds-safe interfaces are treated as though they have checked types in checked scopes. This means that in checked scopes, they have bounds checking and are subject to the same type restrictions as checked pointers.
  • We now check that call arguments have bounds when expected.
  • Use the declared bounds for an array if they are present, instead of using the bounds implied by the size of the array. This allows us to check bounds for accesses to variable-length arrays at the ends of structures:
struct S {
   int len;
   int arr checked[] : count(len);
};
The bounds specified by `count(len)` will be used to check memory accesses through `arr`.
  • Bug fixes:
    • Change the way that paths are computed in the Checked converter (commited by awruef)
    • Fix incorrect logic for inserting casts into the clang AST at bounds-safe interfaces (issue 369).
    • Avoid duplicate error messages T * is cast to ptr<T> (issue 242).
    • Fix duplicate error message when initializer has an error (issue 386).
    • Checked pointer types display incorrect in aka clauses for typedef'ed types (issue 254).

2017 May 14 to 2017 Sept. 18

Implementation news

  • We upgraded to clang/LLVM sources as of May, 2017. We are relatively close to the mainline clang sources.

  • We looked at the performance of Checked C programs with null checks and bounds checks on pointer dereferences, when compiled by the Checked C fork of clang. Sam Elliott did a 2nd internship this summer and focused on that. Sam improved the LLVM IR that we are generating to enable better optimization. For a subset of the Olden and PtrDist benchmarks, Sam found that the runtime overhead was < 2%. A number of programs had no overhead at all. I expect the overhead to increase as larger programs are added and null checks on pointer arithmetic are added. This is a great start, though!

    We are pretty happy with the level and sophistication of LLVM optimizations for the runtime checks that we are injecting. For performance outliers, Sam investigated what runtime checks were being left behind by the compiler in inner loops. He found two possible improvements:

    • For bounds checks, we might need an advanced optimization such as partial redundancy elimination of checks (this has been discussed in the literature on bounds checking before. See the paper “ABCD: eliminating array bounds checks on demand” ).
    • It looks like we need to work on the representation and optimization of null checks. We represent them as explicit checks in LLVM IR that fault if a pointer is null. The checks could sometimes be combined with subsequent memory accesses (which would fault anyway on most OSes).

    We were able to optimize C programs by hand to reduce the cost of bounds checking by inserting runtime tests before inner loops, as we had hoped. We used this to improve the performance of two benchmarks.

    When we upgraded the clang/LLVM sources, the average runtime overhead went down. It did not get worse for any individual benchmark, which is good.

  • Our goal is to implement enough of Checked C that it can be used for runtime bounds checking of real-world code. This includes implementing bounds-safe interfaces. We would also like to check the correctness of simple bounds declarations. We made good progress:

    • We improved the inference of bounds expressions so that most C expressions are covered now. Sam extended it to cover function calls and integer operators used in code that treats pointers as integers. The following work remains to be done: support for current_expr_value, detecting expressions that modify variables that are used in bounds checks in the same expression without an intervening sequence point, bounds for conditional expressions, and bounds for compound literal expressions.
    • Typechecking now works properly for bounds-safe interfaces for callback functions. We implemented missing functionality for bounds-safe interfaces and function pointers. Thanks to Saeed Nejati for reporting this issue.
    • The Checked C header files are now included in the system header directory for our fork of clang. Files don’t need to be copied across repos and programmers do not need to figure how to add the Checked C headers to build paths.
    • We improved error messages for accidental uses of unchecked pointers in checked scope so that it is easier for programmers to understand what to fix. With a complicated constructed types such as function pointers and typedefs, it isn’t always obvious what the problem is.
  • We would also like to check the correctness of simple bounds declarations. For any assignment or initializer where the left-hand has declared bounds, we must check that the right-hand side expression implies the validity of the declared bounds. This is complicated to implement, so we plan to bring this up in stages. While this is going on, programmers will need to manually review bounds declarations for correctness. It is helpful to filter out simple bounds declaration that can already be proved correct:

    • We added a compiler flag -Wcheck-bounds-decls that checks bounds declarations for correctness and warns when they cannot be proved correct. It is disabled by default. Currently, it only checks for syntactic equality of bounds expressions, as well as uses of bounds(any)/bounds(none) so there are lots of warnings. David Tarditi has been extending it to handle syntactic equality of bounds expressions modulo dataflow facts about variable equality, but this has not landed yet.
  • Jay Lim implemented support for simple generic functions for C (see the design section for more details).

Design news

  • We extended the draft Checked C specification 0.7 with a new type for representing pointers to null terminated arrays (_Nt_array_ptr). The description of the static checking of bounds declaration still needs to be added to the specification.

  • We began to design language extensions that programmers can use to replace void pointers in a type-safe fashion. Our design is inspired by Dan Grossman’s work using polymorphic types to replace uses of void * in Cyclone (see Dan’s thesis) This was the focus of Jay Lim’s summer internship.

    Jay analyzed the use of void * pointers in 4 open-source code bases and we looked at OS header files as well. He found that the use of void pointers by standard library functions was a main cause of the use of void pointers (or void pointer conversions) in these code bases. It accounted for about 60% of the usage. This can be addressed by adding language support for simple restricted generic functions. Type variables are treated as incomplete types in the definitions and declarations of generic functions, so they are restricted to being used as the referent types of pointers. Jay implemented restricted generic functions in the Checked C fork of clang.

Project news

  • We submitted a paper on Checked C to the ACM Computer and Communications Security Conference, but it was not accepted.
  • We hope to submit another paper this fall to a different conference, with larger and improved benchmarks.

2017 March 27 to 2017 May 7th

Implementation news

  • David Tarditi implemented runtime bounds checking for member bounds. Reading or writing memory using an array_ptr obtained via a member access now is bounds checked. This also includes member accesses to arrays in structs. An interesting point about member arrays is that the bounds checking is at sub-object granularity (less than object allocation sizes). This is something that approaches based on red zones do not do.
  • Sam Elliott implemented non-null checks for uses of array_ptrs to access memory.
  • Wonsub Kim implemented checked and unchecked scopes. He also implemented the BOUNDS_CHECKED pragma. The implementation includes parsing and enforcing restrictions in checked scopes on types (no unchecked pointers), cast operations (no casts to unchecked pointers) and uses of variable argument functions (not allowed in checked scopes). He also changed the type produced by the address-of operators in checked blocks.
  • Jijoong Moon implemented IR support, parsing, and typechecking for relative alignment clauses in bounds declarations.
  • Jijoong Moon implemented IR support, parsing, typechecking, and some static checking for dynamic_bounds_cast and assume_bounds_cast operators.
  • We now have reliable automated build tests running on Linux and Windows for Checked C, implemented by David Tarditi. The automated build tests use Visual Studio Team Services. They are run on every commit to the Checked C clang repo. The tests include the Checked C tests, the clang regression test suite, and on Linux)the LLVM Nightly Test Suite The tests on Windows are run on both x86 and x64. The tests on Linux are run on x64. David learned how to make parallel builds of clang using msbuild work well on Windows better controlling parallelism
  • Test changes:
    • Sam, Wonsub, and Jijoong finished converting 10 benchmarks from Olden and PtrDist to use Checked C. The converted benchmarks are in the checkedc-llvm-test-suite repo (and built and run as part of the automated build testing).
    • Tests have been written and committed for the new functionality from Wonsub and Jijoong.
    • David Tarditi has put out a pull request for testing of runtime bounds check of member bounds, as well as reorganizing existing runtime tests of bounds checking.

Design news

  • The span type has been removed from version 0.6 of the language extension design. The span types seems more like a library type than a fundamental concept of the Checked C extension. We may decide to add it back in the future. For now, we have removed it until the need becomes clearer.
  • The address-of operator in checked scopes now produces an array_ptr type, not a ptr type. Using a ptr type requires that the resulting pointer value be in bounds. However, &(a[i]) is equivalent to &(*(a+i)), which the same as (a + i). This is no requirement that a + i produce an in-bounds pointer.
  • David Tarditi wrote an initial proposalfor adding nullable pointer to the Checked C extension. This would reduce the number of required dynamic checks. We decided to postpone further work on this in favor of getting a working compiler with runtime checking and until we had performance data to support adding this.

Project news

  • Sam Elliott finished his internship at MSR on April 7th and returned to Ras Bodik’s group at the University of Washington.
  • David Tarditi published an implementation roadmap to the Checked C clang Wiki. This lists the Checked C features that we intend to implement in a “Version 1” compiler and the order in which we plan to implement them. The goal is to implement enough of the language extension that it can be used for significant parts of real-world C code bases.

Research news

  • Sam Elliott gave a MSR talk where he presented an evaluation of the changes to the converted Olden and PtrDist benchmarks to use Checked C, as well as performance. These results are quite preliminary given the small size of the benchmarks. He found that about 10% of the code needed to be changed. The performance results had a few benchmarks that were significant outliers, with slowdowns of up to 140%. we'll need to investigate them.
  • Andrew Ruef and Mike Hicks continue to work on formalizing the Checked C semantics and properties of the semantics.