Skip to content

Commit

Permalink
maint: add a definition-based syscall decoder generator
Browse files Browse the repository at this point in the history
Implement a code generation tool capable of parsing system call definitions
and generating system call decoders.

* maint/gen/.gitignore: New file.
* maint/gen/Makefile: Likewise.
* maint/gen/README.md: Likewise.
* maint/gen/ast.c: Likewise.
* maint/gen/ast.h: Likewise.
* maint/gen/codegen.c: Likewise.
* maint/gen/deflang.h: Likewise.
* maint/gen/defs/common.def: Likewise.
* maint/gen/lex.l: Likewise.
* maint/gen/parse.y: Likewise.
* maint/gen/preprocess.c: Likewise.
* maint/gen/preprocess.h: Likewise.
* maint/gen/symbols.c: Likewise.
* maint/gen/symbols.h: Likewise.
  • Loading branch information
srikavin authored and ldv-alt committed Nov 13, 2021
1 parent 63ef457 commit 5cad0ff
Show file tree
Hide file tree
Showing 14 changed files with 3,001 additions and 0 deletions.
6 changes: 6 additions & 0 deletions maint/gen/.gitignore
@@ -0,0 +1,6 @@
lex.yy.c
parse.tab.c
parse.tab.h
parse
parse.output
/gen
16 changes: 16 additions & 0 deletions maint/gen/Makefile
@@ -0,0 +1,16 @@
CFLAGS += -ggdb -std=gnu99 -Wall -Wextra

all: gen

gen: parse.tab.o lex.yy.o ast.o codegen.o symbols.o parse.tab.h lex.yy.c preprocess.o
$(CC) $(CFLAGS) parse.tab.o lex.yy.o ast.o codegen.o symbols.o preprocess.o -o ./gen

lex.yy.c: lex.l parse.tab.h
flex lex.l

parse.tab.c parse.tab.h: parse.y
bison -d parse.y

clean:
rm -f lex.yy.o ast.o parse.tab.o codegen.o preprocess.o symbols.o
rm -f gen parse.tab.c parse.tab.h lex.yy.c lex.yy.h
115 changes: 115 additions & 0 deletions maint/gen/README.md
@@ -0,0 +1,115 @@
Syscall Definitions
====

This syscall definition language is based on the [syzkaller description language](https://github.com/google/syzkaller/blob/master/docs/syscall_descriptions.md).

All non-syscall statements maintain their relative ordering and are placed
before syscall statements in the generated C code.

## Syntax

### Types

Types have the following format `type_name[type_option]`.
The `type_name` can include alphanumeric characters and `$_`.
The `type_option` can be another type or a number.

Numbers can be specified as a decimal number (`65`), as a hex number (`0x41`), or as a character constant (`'A'`).

The default types are the following:
* standard C types: `void`, `int`, `char`, `long`, `uint`, `ulong`, `longlong`, `ulonglong`, `double`, `float`
* `stddef.h` types: `size_t`, `ssize_t`, ...
* `stdint.h` types: `uint8_t`, `int8_t`, `uint64_t`, `int64_t`, ...
* kernel types: `kernel_long_t`, `kernel_ulong_t`, ...
* `fd`: A file descriptor
* `tid`: A thread id
* `string`: A null terminated char buffer
* `path` A null terminated path string
* `stringnoz[n]`: A non-null terminated char buffer of length `n`
* `const[x]`: A constant of value `x` that inherits its parent type
* `const[x:y]`: A constant with a value between `x` and `y` (inclusive) that inherits its parent type
* `ptr[dir, typ]`: A pointer to object of type `typ`; direction can be `in`, `out`, `inout`
* `ref[argname]`: A reference to the value of another parameter with name `argname` or `@ret`
* `xor_flags[xlat_name, ???, underlying_typ]`: A integer type (`underlying_typ`)
containing mutually exclusive flags with xlat symbol name `xlat_name`
* `or_flags[xlat_name, ???, underlying_typ]`: A integer type (`underlying_typ`)
containing flags that are ORed together with xlat symbol name `xlat_name`

Constants (`const`) can only be used within variant syscalls.

### Syscalls
Syscall definitions have the format
```
syscall_name (arg_type1 arg_name1, arg_type2 arg_name2, ...) return_type
```

The `return_type` is optional if no special printing mode is needed.

Some system calls have various modes of operations. Consider the `fcntl` syscall.
Its second parameter determines the types of the remaining arguments. To
handle this, a variant syscall definition can be used:
```
fcntl(filedes fd, cmd xor_flags[fcntl_cmds, F_???, kernel_ulong_t], arg kernel_ulong_t) kernel_ulong_t
fcntl$F_DUPFD(filedes fd, cmd const[F_DUPFD], arg kernel_ulong_t) fd
fcntl$F_DUPFD_CLOEXEC(filedes fd, cmd const[F_DUPFD_CLOEXEC], arg kernel_ulong_t) fd
...
```

The `$` character is used to indicate that a syscall is a variant of another one.
The `const` parameters of a variant syscall will be used to determine which
variant to use. If no variant syscalls match, the base syscall will be used.

### Custom Decoders

Custom decoders have the format
```
:type[argname, arg2[$3], $1] %{
do_something(tcp, $$, $1);
%}
```

The type following the `:` indicates which type this decoder should apply to.
Template variables (`$` followed by 1 or more numbers) can be used to reference
the value of a type option. These variables can be used within the body of the
custom decoder and will be substituted with the resolved value.

The special `$$` variable refers to the root argument.

For example, the syscall `example(arg1 type[test, type2[5], 1]` would have the
following decoder for the arg1 parameter:
```
do_something(tcp, tcp->u_arg[1], 1);
```

### #import

Import statements have the format
```
#import "filename.def"
```

The contents of the `filename.def` will be treated as if they were placed in the current file.

### #ifdef/#ifndef

Ifdef, ifndef statements have the format
```
#ifdef condition
#ifndef condition
#endif
#endif
```

Ifdef, ifndef, and define statements will be included as-is in the generated output.
Unlike C, these cannot be placed in the middle of another statement.

### define/include

Include and define statements have the format
```
define DEBUG 1
include "filename.h"
include <filename.h>
```

The contents of include and define statements will be included as-is in the generated output.

0 comments on commit 5cad0ff

Please sign in to comment.