Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checked-c conversion for Ptrdist/KS benchmark (#007) #8

Merged
merged 9 commits into from
Mar 24, 2017

Conversation

wonsubkim
Copy link
Collaborator

PR for checked-c conversion for Ptrdist/KS benchmark

  • Ptrdist/KS benchmark is converted into checked-c code
  • Ptrdist/anagram, ft is slightly modified to apply fix-up

Ptrdist llvm nightly test (lnt) are completely passed

 - This is first conversion for applications in Ptrdist benchmarks
  - anagram benchmark exploits pointer intensively to handle input string
  - it reads input file, makes dictionary, extract some phrase and builds new words
 - CheckedC feature lists touched by conversion
  - CheckedC _Ptr used
  - CheckedC _Array_ptr used
   - bounds(start, end) used
   - count(length) used
  - CheckedC _Checked array type used
   - checked array type of T == checked array pointer to T
  - CheckedC function call
   - formal parameter bounds information
   - return value bounds information
  - CheckedC pointer arithmetic - assume that dynamic bounds check will be inserted
  - CheckedC pointer dereference - assume that dynamic bounds check will be inserted
 - feature lists not touched by conversion
  - _Where syntax is not yet implemented
   - bounds definition at expression statement but not compiled
   - disalbe _Where syntax
 - Conversion status
  - LNT (LLVM Nightly Tests) framework passed
   - compilation passed - clang compilation with "--cflags -fcheckedc-extension" arguments
   - execution passed (identical result to pure C code)
 - What is done for conversion
  - unchecked pointer type to checked pointer type (_Array_ptr, _Ptr)
  - add bounds information for checked array pointer type
  - add qualifier _Checked for checked array type
  - add specific comment pattern to represent code modification
   - refer to comment (// CHECKEDC) to understand meaning of checkedc pointer types in converted source
   - Also include added runtime bounds check which is inserted by compiler
 - application property
  - minimum spanning tree calculation, use heap structure (Fheap)
  - lots of dynamic memory allocation and deallocation (malloc, free)
  - very little pointer arithmetic -> _Array_ptr type is rare
 - CheckedC conversion
  - application has very little pointer arithmetic
  it means that there are almost _Ptr pointer type in pointer type
  - application uses graph and heap data structure
  graph consists of vertex and edge and each vertex/edge has related edge/vertex pointer
  heap is also similar to tree/graph represented by linked list having child/parent/sibling pointer
  - Checkedc feature list
   - CheckedC _Ptr type
    - automatic checked pointer variable initializer(=0)
   - CheckedC _Array_ptr type
 - LNT verification
  - LNT framework passed with "--cflags -fcheckedc-extension"
   - compilation passed
   - execution passed
 - Follow-up feedback
 - Remove incorrect dynamic_check insertion
  - pointer arithmetic checks if pointer is non-null & expression is integer overflow
 - Rewrite function call
  - If parameter does not have enough bounds information, declare more parameters
  to provide bounds information
 - Correction bounds information
  - since compiler assumes that function call can clobber global variables
  modify bounds declaration without global pointer variable if function call is in body
 - Apply bounds-safe interface using checked header files
  - include checked header files
  - there is a problem when using bounds-safe interface, refer #143 issue in checkedc-clang
  - after resolved, I will remove unncessary explicit pointer type casting in standard library function
 - Modify comments
 - Fix up missed conversion
+ Make include directory for checked header
 - For LNT (llvm nightly test), added a include directory
 - Those files are same as checkedc/include files except string_checked.h
 - While compiling, there is an error related to strncmp function
 - After issue#143 is solved, we will remove unnecessary explicit type casting
 - fix up indentation for comments
 - algorithmic bounds to represent more precise bounds
  - AddWords - build the list of words
   - pointer traverses dictionary to iterate entire words within dictionary
   - pointer bounds is set using pointer variable to indicate dictionary start/end
  - BuildWord - build a word
   - To represent more precise bounds for each word, use wordStart/end instead of dictionary start/end
   - To guarantee that wordEnd is within dictionaryEnd (wordEnd <= pchUpperBounds)
    add additional facts using programmer-inserted dynamic_check
 - checked array type
 - ptr type
 - when deferencing, runtime dynamic check is required
+ anagram, ft modification
 - fix error according to the checked header
+ follow changes of checked header file
@msftclas
Copy link

@wonsubkim,
Thanks for having already signed the Contribution License Agreement. Your agreement was validated by Microsoft. We will now review your pull request.
Thanks,
Microsoft Pull Request Bot

@wonsubkim
Copy link
Collaborator Author

Is it possible to merge this pull request by myself?
It was also passed in lnt test.

Copy link
Collaborator

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good.

I really like the // CHECKEDC: comments, very helpful.

I like the dynamic_checks you've inserted. I wonder (but don't need an answer for yet) whether they speed up the benchmarks, or let the compiler optimize out some bounds checks.

If you use #include <stdlib_checked.h> in some places and it works, you should use the <> bracket includes on all checked c headers, and this may mean you don't need to copy the headers into this directory. Let's make sure we're not copying the checkedc headers around too much, because changes will be hard to pick up.

Address these few comments and then you can merge this pull request.

@@ -67,7 +67,7 @@ _Ptr<Vertices> MST(_Ptr<Vertices> graph);
int debug = 1;

int
main(int argc, _Array_ptr<_Ptr<char>> argv )
main(int argc, _Array_ptr<const char*> argv : count(argc) )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. If you can, you should #include "stdchecked.h" so that you can use ptr<T> instead of _Ptr<t>. Is there a problem with names overlapping?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I have only just found out that the automated converter can currently only output _Ptr<T> types, and will only use the underscore version. If it's the case you used the converter, leave them as _Ptr<T>, when we improve the converter it will manage to output ptr<T>.

@@ -279,7 +279,7 @@ void Fatal(char *pchMsg, unsigned u) {
* byte streams are concatenated, and terminated with a 0.
*/

void ReadDict(_Ptr<char> pchFile) {
void ReadDict(char *pchFile : itype(_Ptr<char>)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. Strings are allowed to stay char* for the moment, and probably don't even need a itype(ptr<char>). All other pointers, if possible, should be checked pointers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will remember that string pointers are allowed to stay char * for the moment.

TRY(fgets(line, BUF_LEN, inFile),
// CHECKEDC : bounds-safe interface is required (implemented, stdio_checked.h)
// related issue#143
TRY(fgets(line, BUF_LEN, (FILE*)inFile),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that issue#143 is fixed, you probably don't need this cast.

Copy link
Collaborator Author

@wonsubkim wonsubkim Mar 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have missed. I will fix-up this

@wonsubkim
Copy link
Collaborator Author

I have a question about checked c header file location
As you have mentioned, I have copied checked header file into checkedc-llvm-test-suite/include from checkedc test directory.
If that include directory are not located at that path, lnt framework fails to find out proper header file
that is included via #include <stdio_checked.h>
I have used <> brackes to include checked header files but fails to find out proper checked c header file
In current lnt environment, default include path does not include that path where original checked c header files are located

 - remove unnecessary casting, use bounds-safe interface
 - remove unnecessary comments
@lenary
Copy link
Collaborator

lenary commented Mar 23, 2017

To be clear, as my assumptions about the headers are incorrect:

  • If #include <stdchecked.h> works, use that, otherwise use "" instead of <>.
  • If you don't need to copy the headers to make the benchmarks work, don't, otherwise it's fine.

By "work", I mean make the benchmarks compile and run. This is, after all, the aim of the conversion - that the benchmarks run, but perform the extra checks we've implemented.

@wonsubkim
Copy link
Collaborator Author

Thanks for your review
I will merge this pull request

@wonsubkim wonsubkim merged commit ff6a463 into microsoft:master Mar 24, 2017
@lenary lenary modified the milestone: Preliminary Evaluation Mar 29, 2017
@wonsubkim wonsubkim deleted the issue-007 branch May 19, 2017 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants