Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tool Trace2Syz #767

Merged
merged 22 commits into from Dec 6, 2018
Merged

Add Tool Trace2Syz #767

merged 22 commits into from Dec 6, 2018

Conversation

@shankarapailoor
Copy link
Contributor

@shankarapailoor shankarapailoor commented Oct 11, 2018

This PR is meant to review a new Syzkaller tool syz-trace2syz that converts outputs of strace to syzkaller programs.

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Oct 12, 2018

The buildbot has failed with:

go install ./...
# github.com/google/syzkaller/tools/syz-trace2syz/parser
tools/syz-trace2syz/parser/parser.go:42:9: undefined: newStraceLexer
tools/syz-trace2syz/parser/parser.go:43:9: undefined: StraceParse

We generally check-in all generated files to not require all users to install wide range of various tools. I think we need to do the same here.

Please extend make install_prerequisites to install goyacc and ragel.

And goyacc and ragel steps need to be moved to a separate generate_trace2syz target. See for example, generate_fidl target.

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Oct 12, 2018

Ok Sounds good. Would it be nicer to put the generated files in their own package like parser/gen similar to how you do it for the targets?

Makefile Outdated
(cd tools/syz-trace2syz/parser; goyacc -o strace.go -p Strace strace.y)
GOOS=$(HOSTOS) GOARCH=$(HOSTARCH) $(HOSTGO) build $(GOHOSTFLAGS) -o ./bin/syz-trace2syz github.com/google/syzkaller/tools/syz-trace2syz


This comment has been minimized.

@dvyukov

dvyukov Oct 12, 2018
Collaborator

Remove second new line.

This comment has been minimized.

@dvyukov

dvyukov Oct 15, 2018
Collaborator

still holds

@@ -0,0 +1,36 @@
{

This comment has been minimized.

@dvyukov

dvyukov Oct 12, 2018
Collaborator

This file is not needed, right?


// SpecialConsts contains values for flags found in strace. Some of these have different names in Syzkaller
// and others are not yet added to Syzkaller.
SpecialConsts = map[string]uint64{

This comment has been minimized.

@dvyukov

dvyukov Oct 12, 2018
Collaborator

I think we need either eliminate this entirely, or at least try to minimize as much as possible and document exact reasons some of these consts are still needed.

I've looked at some consts:

  1. PROT_NONE and FALLOC_FL_ZERO_RANGE are present in syzkaller consts and have the same values.
  2. USR1/BUS/etc: if strace gives them different names it would be better to have a mapping from strace names to actual constant names as used in source code (and in syzkaller), e.g.
// strace invents own const names, map them to real const names.
constMap = map[string]string{
    "USR1": "SIGUSR1",
   ...
}

and then we could first apply this mapping and then lookup then in syzkaller consts. This will also help with the following point.
3. Lots of ioctl consts, e.g. FS_IOC_GETFLAGS have different values depending on arch. I guess your consts relate to amd64, but we don't want to limit this to amd64. Using syzkaller machinery would help with this.
4. Re missing consts, we should add them to syzkaller descriptions. E.g. you can do:

_ = FUTEX_WAIT_PRIVATE, FUTEX_WAKE_PRIVATE, ...

and them make extract should add these consts to .const files.


import (
"bufio"
"github.com/google/syzkaller/pkg/log"

This comment has been minimized.

@dvyukov

dvyukov Oct 12, 2018
Collaborator

The common convention in Go and in syzkaller is to have:

import (
  ... standard packages ...
  [empty line]
  ... custom packages ...
)

Please do this here and in other files.

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 12, 2018
Author Contributor

Sounds good. I don't need to add spaces between packages in trace2syz and other syzkaller packages correct?

`<... rt_sigtimedwait resumed> ) = 10 (SIGUSR1)` + "\n" +
`fstat() = 0`,
`open(0, SNDCTL_TMR_START or TCSETS,` +
`{c_cc[VMIN]=1, c_cc[VTIME]=0} <unfinished ...>` + "\n" +

This comment has been minimized.

@dvyukov

dvyukov Oct 12, 2018
Collaborator

These:

` + "\n" +

look a bit clumsy. Raw string literals support new lines, so I would something like this:

		`
open(0, SNDCTL_TMR_START or TCSETS, {c_cc[VMIN]=1, c_cc[VTIME]=0} <unfinished ...>
<... open resumed> , FLAG|FLAG) = -1 FLAG (sdfjfjfjf)
fstat() = 0
`,

This allows to read and write the input as is, without additional escaping and manually added new lines.

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 12, 2018
Author Contributor

Wow i overthought that. Will fix!

}

func TestParseLoopPid(t *testing.T) {
/*

This comment has been minimized.

@dvyukov

dvyukov Oct 12, 2018
Collaborator

No C++-style comments please:

// Parses two basic calls. Make sure the trace tree just has one entry with two calls

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 12, 2018
Author Contributor

Ah missed one. Why do you prefer not to have C++ style comments? Just curious :)

@@ -0,0 +1,150 @@
package proggen

This comment has been minimized.

@dvyukov

dvyukov Oct 12, 2018
Collaborator

Add the standard syzkaller copyright for all files.

return ctx
}

func testMatchingCallSequence(calls []*prog.Call, seq []string) error {

This comment has been minimized.

@dvyukov

dvyukov Oct 12, 2018
Collaborator

prog.P has String method which returns concatenation of all calls.
So you can encode the expected result as "mmap-open-connect" and then just compare this with p.String().

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 12, 2018
Author Contributor

That has simplified things. Thanks!

package proggen

import (
//"fmt"

This comment has been minimized.

@dvyukov

dvyukov Oct 12, 2018
Collaborator

remove

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 12, 2018
Author Contributor

Ack

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Oct 12, 2018

Ok Sounds good. Would it be nicer to put the generated files in their own package like ```parser/gen`` similar to how you do it for the targets?

FWIW the only reason to do that is gomeralinter. Even parsing all generated files in sys/ is very slow and takes tons of memory, so adding generated files to exclude section did not work. Instead we put them into skip section which makes gomeralinter skip whole packages.

I don't see strong reasons for your files either way. If it's easy to move them to a separate package, I guess we could do that.

Makefile Outdated
@@ -305,6 +313,8 @@ install_prerequisites:
sudo apt-get install -y -q g++-aarch64-linux-gnu || true
sudo apt-get install -y -q g++-powerpc64le-linux-gnu || true
sudo apt-get install -y -q g++-arm-linux-gnueabihf || true
sudo apt-get install ragel

This comment has been minimized.

@dvyukov

dvyukov Oct 15, 2018
Collaborator

add -y -q so that it does not ask questions

Makefile Outdated
(cd tools/syz-trace2syz/parser; goyacc -o strace.go -p Strace strace.y)
GOOS=$(HOSTOS) GOARCH=$(HOSTARCH) $(HOSTGO) build $(GOHOSTFLAGS) -o ./bin/syz-trace2syz github.com/google/syzkaller/tools/syz-trace2syz


This comment has been minimized.

@dvyukov

dvyukov Oct 15, 2018
Collaborator

still holds

SpecialConsts = map[string]uint64{
"SIG_0": 0,
"BUS": 10,
"USR1": 30,

This comment has been minimized.

@dvyukov

dvyukov Oct 15, 2018
Collaborator

I see that in *.const files we have SIGBUS=7 and SIGUSR1=10 and here you have BUS=10 and USR1=30. Why? Where these numbers come from?
If it's just a mistake, then it would be better to use Consts map more, i.e. map BUS to SIGBUS and USR1 to SIGUSR1, etc. And check out if we can use it for other consts too. This will save us from incorrect values, typos, differences between archs, etc.

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 15, 2018
Author Contributor

Yes this is a mistake. I will move BUS, USR1, IO, ALRM to Consts map. Additional question: the R_OK, W_OK flags are just for the access system call which is not in supported by Syzkaller. Is there a reason you don't support access or should I add the its description along with these flags?

This comment has been minimized.

@dvyukov

dvyukov Oct 15, 2018
Collaborator

I suspect we don't have it because it's effectively the same as faccessat that we have:

SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
{
	return do_faccessat(dfd, filename, mode);
}

SYSCALL_DEFINE2(access, const char __user *, filename, int, mode)
{
	return do_faccessat(AT_FDCWD, filename, mode);
}

So there is probably not much value in adding it separately. And there is actually some negative effect of adding it: arm64 kernels generally have only *at versions of syscalls and make libc emulate the non-*at versions via *at versions. So if syzkaller finds a crash with access call on amd64, it will not be possible to reuse the program on arm64.

"X_OK": 1,
"F_OK": 0,
"SCHED_OTHER": 0,
"FS_IOC_GETFLAGS": 26113,

This comment has been minimized.

@dvyukov

dvyukov Oct 15, 2018
Collaborator

In the const files we have:

sys/linux/fs_ioctl_386.const:FS_IOC_GETFLAGS = 2147771905
sys/linux/fs_ioctl_amd64.const:FS_IOC_GETFLAGS = 2148034049
sys/linux/fs_ioctl_arm64.const:FS_IOC_GETFLAGS = 2148034049
sys/linux/fs_ioctl_arm.const:FS_IOC_GETFLAGS = 2147771905
sys/linux/fs_ioctl_ppc64le.const:FS_IOC_GETFLAGS = 1074292225

Where did 26113 come from?

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 15, 2018
Author Contributor

This is a mistake :).

This comment has been minimized.

@dvyukov

dvyukov Oct 15, 2018
Collaborator

So relying on automatic const extraction is already useful :)

"FS_IOC_GETFLAGS": 26113,
"FS_IOC_SETFLAGS": 26112,
"PHN_NOT_OH": 28676,
"SNDCTL_TMR_START": 21506,

This comment has been minimized.

@dvyukov

dvyukov Oct 15, 2018
Collaborator

Please add these SNDCTL_TMR* consts to sys/linux/sndtimer.txt
Bonus points for adding real descriptions for this SNDCTL_TMR thing.

// ShouldSkip lists system calls that we should skip when parsing
// Some of these are unsupported or not worth executing
ShouldSkip = map[string]bool{
"brk": true,

This comment has been minimized.

@dvyukov

dvyukov Oct 15, 2018
Collaborator

I would like to see exact reasons we ignore these calls, because in future we may want to enable more and it would be useful to know if there is some fundamental reason why it's ignored or if support is not implemented yet.
No need to comment each syscall independently, but at least do some groups of syscalls, e.g.:

// These don't give any interesting coverage:
getcpu
getcwd
// These something else...:
...
...

Also I see some syscalls here that are not even present in main descriptions (notably, "reboot" and ""). I think the first step should be to check target.SyscallMap and if it's not present there, then skip it. No need to duplicate such calls here (we never want to do reboot).

But I would expect that things like mprotect/mremap/futex we want to support in future.

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 15, 2018
Author Contributor

Yeah I agree. I'm thinking we could have a config file similar to mgr.cfg where you can specify "Disable Calls" or "Enable Calls". Users may have a set of traces and want to only generate programs that contain "open, ioctl, write" or only exclude a few like "execve, clone, etc".

This comment has been minimized.

@dvyukov

dvyukov Oct 15, 2018
Collaborator

Let's get the basic version working first ;)

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 15, 2018
Author Contributor

Sure!

}
ctx.CurrentSyzCall = call

length := (parseLength(syscall.Args[1], ctx)/ctx.Target.PageSize + 1) * ctx.Target.PageSize //RoundUp PageSize

This comment has been minimized.

@dvyukov

dvyukov Oct 15, 2018
Collaborator

This looks wrong. If the original value is multiple of PageSize, then the result will be 1 page larger.
I think we want: (x + PageSize - 1) / PageSize * PageSize

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Oct 15, 2018

I've tried to use it on a simple program and have 2 comments:

  1. Latest versions of strace already define -k flag, which means backtraces. It produces output like the one below which can't be parsed by syz-trace2syz. You need to use some other flag for kcov. I don't know if strace flag parser allows long flags, if yes, then -kcov would be more resilient for such changes in strace.
84737 access("\x2f\x65\x74\x63\x2f\x6c\x64\x2e\x73\x6f\x2e\x6e\x6f\x68\x77\x63\x61\x70", F_OK) = -1 
ENOENT (No such file or directory)
 > /lib/x86_64-linux-gnu/ld-2.24.so(__GI_access+0x7) [0x195a7]
 > /lib/x86_64-linux-gnu/ld-2.24.so(_dl_load_cache_lookup+0x66) [0x168e6]
  1. I've took a simple getpid() programs, converted it to C with:
syz-prog2c -prog /tmp/prog -threaded -collide -sandbox=namespace -segv -tmpdir -tun > /tmp/prog.c

Then:

strace -o tracefile -s 65500 -v -xx -f ./a.out

And tried to parse and it failed:

$ syz-trace2syz -file tracefile
2018/10/15 12:09:42 Parsing 1 traces
error: syntax error
2018/10/15 12:09:42 Error parsing line: 84069 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=52, type=NLMSG_ERROR, flags=0, seq=0, pid=4}, {error=-ENODEV, msg={{len=32, type=0x10 /* NLMSG_??? */, flags=NLM_F_REQUEST|NLM_F_ACK, seq=0, pid=0}, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"}}}, iov_len=16384}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 52
@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Oct 15, 2018

I see the issue with that trace. It seems I didn't handle the unary minus operator during the parsing so -ENODEV breaks it. Fixed locally. More generally, do you think it is better to fail with Fatalf when we cannot parse a line or return an error, log it with verbosity 0, and then continue. With a verbosity of 0, users will see the error and report it and we can fix it, but they can still convert most of their trace.

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Oct 15, 2018

I think we should try to proceed with fatal errors as much as possible, and gave up to ignoring errors only if it turns out to be absolutely necessary.
If the thing completes successfully, 99% of users will report nothing and will not allow us to fix anything, they will just ignore the confusing output. Unfortunately that's how things work in real life.
In this case, error=-ENODEV is just a normal strace output that we need to handle, nothing that would suggest giving up on strict errors.

Makefile Outdated
@@ -176,6 +179,10 @@ ifeq ($(TARGETOS),fuchsia)
else
endif

generate_trace2syz:
(cd tools/syz-trace2syz/parser; ragel -Z -G2 -o lex.go straceLex.rl)
(cd tools/syz-trace2syz/parser; goyacc -o strace.go -p Strace strace.y)

This comment has been minimized.

@dvyukov

dvyukov Oct 16, 2018
Collaborator

add -v="" flag, then it won't generate y.output and then you can revert .gitignore change

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 16, 2018
Author Contributor

ack

@@ -108,7 +108,7 @@ func (target *Target) lazyInit() {
target.SanitizeCall = func(c *Call) {}
target.initTarget()
target.initArch(target)
target.ConstMap = nil // currently used only by initArch
target.ConstMap = nil

This comment has been minimized.

@dvyukov

dvyukov Oct 16, 2018
Collaborator

add the comment back

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 16, 2018
Author Contributor

ack

bpf_reg = BPF_REG_0, BPF_REG_1, BPF_REG_2, BPF_REG_3, BPF_REG_4, BPF_REG_5, BPF_REG_6, BPF_REG_7, BPF_REG_8, BPF_REG_9, BPF_REG_10
bpf_reg = BPF_REG_0, BPF_REG_1, BPF_REG_2, BPF_REG_3, BPF_REG_4, BPF_REG_5, BPF_REG_6, BPF_REG_7, BPF_REG_8, BPF_REG_9, BPF_REG_10, __MAX_BPF_REG

define MAX_BPF_REG __MAX_BPF_REG

This comment has been minimized.

@dvyukov

dvyukov Oct 16, 2018
Collaborator

Is MAX_BPF_REG used anywhere? I can't find where.

This comment has been minimized.

@dvyukov

dvyukov Oct 16, 2018
Collaborator

Or is MAX_BPF_REG what strace prints? If yes, then move this to your Const mapping. That's exactly what it is for -- mapping strace names to kernel names.

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 16, 2018
Author Contributor

It is in the kernel. It is defined to be equal to __MAX_BPF_REG. See here. I will remove the define and just add it to the list

@@ -12,6 +12,6 @@ IOCB_FLAG_RESFD = 1
__NR_io_cancel = 247
__NR_io_destroy = 244
__NR_io_getevents = 245
# __NR_io_pgetevents is not set
__NR_io_pgetevents = 399

This comment has been minimized.

@dvyukov

dvyukov Oct 16, 2018
Collaborator

You need to rebase, this is already added here:
6ce1793

@@ -38,6 +38,8 @@
],
"exclude": [
"(sys/(akaros|freebsd|fuchsia|linux|netbsd|test|windows)/init.*|sys/targets/common.go).* don't use ALL_CAPS in Go names",
"(lex.go)",

This comment has been minimized.

@dvyukov

dvyukov Oct 16, 2018
Collaborator

Please use full path because these names are quite generic, they can appear in other dirs too:
tools/syz-trace2syz/parser/(lex|strace).go

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 16, 2018
Author Contributor

ack

// ShouldSkip lists system calls that we should skip when parsing
// Some of these are unsupported or not worth executing
ShouldSkip = map[string]bool{
// execve terminates program execution. Although there is Syzkaller support

This comment has been minimized.

@dvyukov

dvyukov Oct 16, 2018
Collaborator

  1. We never spell syzkaller with a capital letter.
  2. I've noticed here and in some other places that you refer to syzkaller in third person form. It made sense while your code was a separate project, but this is a part of syzkaller now. So you should refer to it in first person form. It's not there now, it's here. This code is also part of syzkaller now. So referring to "some syzkaller" is not useful anymore. You could say something along the following lines "Although syscall descriptions contain execve ...".

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 16, 2018
Author Contributor

ack. Hopefully I got all of them...probably not.

i++
if arg.Address >= memAllocMaxMem {
return 0, fmt.Errorf("Unable to allocate space to store arg: %#v"+
"in Call: %v. Required memory is larger than what is allowed by Syzkaller."+

This comment has been minimized.

@dvyukov

dvyukov Oct 16, 2018
Collaborator

Same here. This is syzkaller now, not something else.

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 16, 2018
Author Contributor

ack

arg.Res = nil
if arg.Address >= memAllocMaxMem || arg.Address+arg.VmaSize > memAllocMaxMem {
return fmt.Errorf("Unable to allocate space for vma Call: %#v "+
"Required memory is larger than what is allowed by Syzkaller."+

This comment has been minimized.

@dvyukov

dvyukov Oct 16, 2018
Collaborator

Same here.

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 16, 2018
Author Contributor

ack

fdArg,
prog.MakeConstArg(mmap.Args[5], 0),
}
//All mmaps have fixed mappings in syzkaller

This comment has been minimized.

@dvyukov

dvyukov Oct 16, 2018
Collaborator

Same here. Referring to syzkaller is not useful anymore. "in prog package" for example refers to at least something.

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 16, 2018
Author Contributor

ack. Stale comment anyways since the calls get sanitized.

}

func parseShmat(shmat *prog.Syscall, syscall *parser.Syscall, ctx *Context) *prog.Call {
/*

This comment has been minimized.

@dvyukov

dvyukov Oct 16, 2018
Collaborator

No C++ comments.

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 16, 2018
Author Contributor

I swear they have a life of their own

This comment has been minimized.

@dvyukov

dvyukov Oct 16, 2018
Collaborator

:)

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Oct 17, 2018

Is this ready for another look? Or somethings are still unresolved? At least github still shows "This branch cannot be rebased due to conflicts".

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Oct 17, 2018

I think this is ready for another look; I merged yesterday. I also see this as having no conflicts with the base branch.

@xairy
Copy link
Collaborator

@xairy xairy commented Oct 17, 2018

Please do not merge syzkaller master branch into yours, but rebase your commits onto syzkaller master instead.

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Oct 18, 2018

Super sorry. I merged out of habit. Just so I'm clear, if I make some updates but there are changes in master I need to incorporate I should just do git rebase and then git push -f my-origin master (my understanding following instructions here)

@xairy
Copy link
Collaborator

@xairy xairy commented Oct 18, 2018

Correct, rebase and then force push. Thanks!

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Oct 18, 2018

Hi @dvyukov and @xairy. I rebased and force pushed last night so I think everything should be ready for review. Whenever you have time take a look. Thanks!

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Oct 27, 2018

Hi @dvyukov! This is ready for review. Thanks!

@@ -692,5 +692,5 @@ erofs_options [

codepage_nums = "1250", "1251", "1255", "437", "737", "775", "850", "852", "855", "857", "860", "861", "862", "863", "864", "865", "866", "869", "874", "932", "936", "949", "950"
codepages_names = "macceltic", "maccenteuro", "maccroatian", "maccyrillic", "macgaelic", "macgreek", "maciceland", "macinuit", "macroman", "macromanian", "macturkish", "ascii", "default", "cp1250", "cp1251", "cp1255", "cp437", "cp737", "cp775", "cp850", "cp852", "cp855", "cp857", "cp860", "cp861", "cp862", "cp863", "cp864", "cp865", "cp866", "cp869", "cp874", "cp932", "cp936", "cp949", "cp950", "euc-jp", "iso8859-13", "iso8859-14", "iso8859-15", "iso8859-1", "iso8859-2", "iso8859-3", "iso8859-4", "iso8859-5", "iso8859-6", "iso8859-7", "iso8859-9", "koi8-r", "koi8-ru", "koi8-u", "utf8", "none"
mount_flags = MS_BIND, MS_DIRSYNC, MS_MANDLOCK, MS_MOVE, MS_NOATIME, MS_NODEV, MS_NODIRATIME, MS_NOEXEC, MS_NOSUID, MS_RDONLY, MS_RELATIME, MS_REMOUNT, MS_SILENT, MS_STRICTATIME, MS_SYNCHRONOUS, MS_REC, MS_POSIXACL, MS_UNBINDABLE, MS_PRIVATE, MS_SLAVE, MS_SHARED, MS_I_VERSION, MS_LAZYTIME
mount_flags = MS_BIND, MS_DIRSYNC, MS_MANDLOCK, MS_MOVE, MS_NOATIME, MS_NODEV, MS_NODIRATIME, MS_NOEXEC, MS_NOSUID, MS_RDONLY, MS_RELATIME, MS_REMOUNT, MS_SILENT, MS_STRICTATIME, MS_SYNCHRONOUS, MS_REC, MS_POSIXACL, MS_UNBINDABLE, MS_PRIVATE, MS_SLAVE, MS_SHARED, MS_I_VERSION, MS_LAZYTIME, MS_KERNMOUNT, MS_ACTIVE

This comment has been minimized.

@dvyukov

dvyukov Oct 29, 2018
Collaborator

MS_KERNMOUNT, MS_ACTIVE does not seem to be used anywhere in kernel code. Do we really need them? And MS_KERNMOUNT looks like something that would be used as an internal flag, rather than something that is supposed to be passed from userspace.

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 29, 2018
Author Contributor

Yeah it currently appears to be ignored. However, strace adds this flag in its output. E.g.

mount("...", "...", 0x7ffc3231a330, MS_NOEXEC|MS_SYNCHRONOUS|MS_NODIRATIME|MS_BIND|MS_MOVE|MS_SILENT|MS_UNBINDABLE|MS_PRIVATE|MS_SHARED|MS_KERNMOUNT|MS_I_VERSION|MS_ACTIVE|0xa6508ea300000300, 0x7ffc3231a390) = -1 EINVAL (Invalid argument)

What is the best way to handle this? This example is strace output from a Syzkaller program.

This comment has been minimized.

@dvyukov

dvyukov Oct 29, 2018
Collaborator

How do you handle unknown consts? If you ignore them, then we can simply ignore MS_KERNMOUNT.
MS_KERNMOUNT is not supposed to be passed from user-space, so perhaps it's a bug in strace? Does it confuse it with some other const?

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 29, 2018
Author Contributor

Currently if I cannot find the constant in the constant map I call log.Fatalf since it is possible that this is a constant that should be added. We could instead not panic and log a warning with low verbosity

Also, doesn't Syzkaller sometimes throw random values into the flag field like -1 so strace would naturally expand it to all possible flags?

@@ -0,0 +1,3 @@
include <uapi/linux/btrfs.h>

This comment has been minimized.

@dvyukov

dvyukov Oct 29, 2018
Collaborator

Copyright header please.

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 29, 2018
Author Contributor

ack

@@ -14,6 +14,8 @@ include <uapi/linux/netfilter_ipv6/ip6t_REJECT.h>
include <uapi/linux/netfilter_ipv6/ip6t_NPT.h>
include <uapi/linux/netfilter_ipv6/ip6t_HL.h>

_ = IP6T_BASE_CTL, IP6T_SO_SET_REPLACE, IP6T_SO_SET_ADD_COUNTERS, IP6T_SO_SET_MAX, IP6T_SO_GET_INFO, IP6T_SO_GET_ENTRIES, IP6T_SO_GET_REVISION_MATCH, IP6T_SO_GET_REVISION_TARGET, IP6T_SO_GET_MAX

This comment has been minimized.

@dvyukov

dvyukov Oct 29, 2018
Collaborator

IP6T_BASE_CTL, IP6T_SO_SET_MAX, IP6T_SO_GET_MAX are not ioctl commands afair
we already have IP6T_SO_SET_REPLACE, IP6T_SO_SET_ADD_COUNTERS, IP6T_SO_GET_INFO, IP6T_SO_GET_ENTRIES and the rest
do we need this at all?

@@ -43,8 +43,8 @@ socket_domain = AF_UNIX, AF_INET, AF_INET6, AF_IPX, AF_NETLINK, AF_X25, AF_AX25,
socket_type = SOCK_STREAM, SOCK_DGRAM, SOCK_RAW, SOCK_RDM, SOCK_SEQPACKET, SOCK_DCCP, SOCK_PACKET, SOCK_NONBLOCK, SOCK_CLOEXEC
accept_flags = SOCK_NONBLOCK, SOCK_CLOEXEC
shutdown_flags = SHUT_RD, SHUT_WR
send_flags = MSG_CONFIRM, MSG_DONTROUTE, MSG_DONTWAIT, MSG_EOR, MSG_MORE, MSG_NOSIGNAL, MSG_OOB, MSG_PROBE, MSG_BATCH, MSG_FASTOPEN, MSG_ZEROCOPY
recv_flags = MSG_CMSG_CLOEXEC, MSG_DONTWAIT, MSG_ERRQUEUE, MSG_OOB, MSG_PEEK, MSG_TRUNC, MSG_WAITALL, MSG_WAITFORONE
send_flags = MSG_CONFIRM, MSG_DONTROUTE, MSG_DONTWAIT, MSG_EOR, MSG_MORE, MSG_NOSIGNAL, MSG_OOB, MSG_PROBE, MSG_BATCH, MSG_FASTOPEN, MSG_ZEROCOPY, MSG_CTRUNC, MSG_FIN, MSG_EOF, MSG_SYN, MSG_RST

This comment has been minimized.

@dvyukov

dvyukov Oct 29, 2018
Collaborator

MSG_CTRUNC is output flag of recvmsg
the other flags seems to be unused, or output too
why do we need this?

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 29, 2018
Author Contributor

I should have looked more carefully at the actual usage. I added it here because strace had the following output:

sendmsg(-1, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=20, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}, "\x00\x00\x00\x00"}, iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_PEEK|MSG_CTRUNC|MSG_DONTWAIT|MSG_EOR|MSG_WAITALL|MSG_FIN|MSG_SYN|MSG_CONFIRM|MSG_ERRQUEUE|MSG_NOSIGNAL|MSG_WAITFORONE|MSG_BATCH|MSG_CMSG_COMPAT|0x1ba00000) = -1 EINVAL
send_flags = MSG_CONFIRM, MSG_DONTROUTE, MSG_DONTWAIT, MSG_EOR, MSG_MORE, MSG_NOSIGNAL, MSG_OOB, MSG_PROBE, MSG_BATCH, MSG_FASTOPEN, MSG_ZEROCOPY
recv_flags = MSG_CMSG_CLOEXEC, MSG_DONTWAIT, MSG_ERRQUEUE, MSG_OOB, MSG_PEEK, MSG_TRUNC, MSG_WAITALL, MSG_WAITFORONE
send_flags = MSG_CONFIRM, MSG_DONTROUTE, MSG_DONTWAIT, MSG_EOR, MSG_MORE, MSG_NOSIGNAL, MSG_OOB, MSG_PROBE, MSG_BATCH, MSG_FASTOPEN, MSG_ZEROCOPY, MSG_CTRUNC, MSG_FIN, MSG_EOF, MSG_SYN, MSG_RST
recv_flags = MSG_CMSG_CLOEXEC, MSG_DONTWAIT, MSG_ERRQUEUE, MSG_OOB, MSG_PEEK, MSG_TRUNC, MSG_WAITALL, MSG_WAITFORONE, MSG_SENDPAGE_NOTLAST, MSG_NO_SHARED_FRAGS, MSG_CMSG_COMPAT

This comment has been minimized.

@dvyukov

dvyukov Oct 29, 2018
Collaborator

MSG_SENDPAGE_NOTLAST is an internal flag, not a part of api
why do we need this?

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 29, 2018
Author Contributor

Same reason as MS_KERNMOUNT.

@@ -199,7 +199,7 @@ ioctl$sock_SIOCINQ(fd sock, cmd const[SIOCINQ], arg ptr[out, int32])

ioctl$sock_SIOCGIFCONF(fd sock, cmd const[SIOCGIFNAME], arg ptr[inout, ifconf])

ifreq_ioctls = SIOCGIFNAME, SIOCSIFLINK, SIOCGIFFLAGS, SIOCSIFFLAGS, SIOCGIFADDR, SIOCSIFADDR, SIOCGIFDSTADDR, SIOCSIFDSTADDR, SIOCGIFBRDADDR, SIOCSIFBRDADDR, SIOCGIFNETMASK, SIOCSIFNETMASK, SIOCGIFMETRIC, SIOCSIFMETRIC, SIOCGIFMEM, SIOCSIFMEM, SIOCGIFMTU, SIOCSIFMTU, SIOCSIFNAME, SIOCSIFHWADDR, SIOCGIFENCAP, SIOCSIFENCAP, SIOCGIFHWADDR, SIOCGIFSLAVE, SIOCSIFSLAVE, SIOCADDMULTI, SIOCDELMULTI, SIOCGIFINDEX, SIOCSIFPFLAGS, SIOCGIFPFLAGS, SIOCDIFADDR, SIOCSIFHWBROADCAST, SIOCGIFCOUNT, SIOCGIFTXQLEN, SIOCSIFTXQLEN, SIOCETHTOOL, SIOCGMIIPHY, SIOCGMIIREG, SIOCSMIIREG, SIOCWANDEV, SIOCGIFMAP, SIOCSIFMAP, SIOCBONDENSLAVE, SIOCBONDRELEASE, SIOCBONDSETHWADDR, SIOCBONDSLAVEINFOQUERY, SIOCBONDINFOQUERY, SIOCBONDCHANGEACTIVE, SIOCBRADDIF, SIOCBRDELIF, SIOCSHWTSTAMP, SIOCGHWTSTAMP
ifreq_ioctls = SIOCGIFNAME, SIOCSIFLINK, SIOCGIFCONF, SIOCGIFFLAGS, SIOCSIFFLAGS, SIOCGIFADDR, SIOCSIFADDR, SIOCGIFDSTADDR, SIOCSIFDSTADDR, SIOCGIFBRDADDR, SIOCSIFBRDADDR, SIOCGIFNETMASK, SIOCSIFNETMASK, SIOCGIFMETRIC, SIOCSIFMETRIC, SIOCGIFMEM, SIOCSIFMEM, SIOCGIFMTU, SIOCSIFMTU, SIOCSIFNAME, SIOCSIFHWADDR, SIOCGIFENCAP, SIOCSIFENCAP, SIOCGIFHWADDR, SIOCGIFSLAVE, SIOCSIFSLAVE, SIOCADDMULTI, SIOCDELMULTI, SIOCGIFINDEX, SIOCSIFPFLAGS, SIOCGIFPFLAGS, SIOCDIFADDR, SIOCSIFHWBROADCAST, SIOCGIFCOUNT, SIOCGIFTXQLEN, SIOCSIFTXQLEN, SIOCETHTOOL, SIOCGMIIPHY, SIOCGMIIREG, SIOCSMIIREG, SIOCWANDEV, SIOCGIFMAP, SIOCSIFMAP, SIOCBONDENSLAVE, SIOCBONDRELEASE, SIOCBONDSETHWADDR, SIOCBONDSLAVEINFOQUERY, SIOCBONDINFOQUERY, SIOCBONDCHANGEACTIVE, SIOCBRADDIF, SIOCBRDELIF, SIOCSHWTSTAMP, SIOCGHWTSTAMP

This comment has been minimized.

@dvyukov

dvyukov Oct 29, 2018
Collaborator

We have a description for SIOCGIFCONF, but there was a typo. I fixed the typo:
fe65cc8
So now this change is unnecessary.

@@ -849,8 +851,9 @@ kcmp_epoll_slot {
toff int32
}

open_flags = O_RDONLY, O_WRONLY, O_RDWR, O_APPEND, FASYNC, O_CLOEXEC, O_CREAT, O_DIRECT, O_DIRECTORY, O_EXCL, O_LARGEFILE, O_NOATIME, O_NOCTTY, O_NOFOLLOW, O_NONBLOCK, O_PATH, O_SYNC, O_TRUNC, __O_TMPFILE
open_flags = O_RDONLY, O_WRONLY, O_RDWR, O_APPEND, FASYNC, O_CLOEXEC, O_CREAT, O_DIRECT, O_DIRECTORY, O_EXCL, O_LARGEFILE, O_NOATIME, O_NOCTTY, O_NOFOLLOW, O_NONBLOCK, O_PATH, O_SYNC, O_TRUNC, __O_TMPFILE, O_ACCMODE, O_TMPFILE

This comment has been minimized.

@dvyukov

dvyukov Oct 29, 2018
Collaborator

O_ACCMODE is an or of O_RDONLY, O_WRONLY, O_RDWR
Since we have these consts, no need to add O_ACCMODE.
Does strace produce O_ACCMODE? It would be pretty weird.

This comment has been minimized.

@dvyukov

dvyukov Oct 29, 2018
Collaborator

ditto
#define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 29, 2018
Author Contributor

yes it does for both.

7042  open("\x2e", O_ACCMODE|O_NOCTTY|O_TMPFILE, 000) = 4
_ = SIG_0, SIGHUP, SIGINT, SIGQUIT, SIGILL, SIGTRAP, SIGABRT, SIGIOT, SIGBUS, SIGFPE, SIGKILL, SIGUSR1, SIGSEGV, SIGUSR2, SIGPIPE, SIGALRM, SIGTERM, SIGSTKFLT, SIGCHLD, SIGCONT, SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU, SIGURG, SIGXCPU, SIGXFSZ, SIGVTALRM, SIGPROF, SIGWINCH, SIGIO, SIGPOLL, SIGPWR, SIGSYS, SIGUNUSED, SIGRTMIN, SIGRTMAX
_ = RLIM64_INFINITY
_ = PHN_NOT_OH
_ = _IOC_NRBITS, _IOC_TYPEBITS, _IOC_SIZEBITS, _IOC_DIRBITS, _IOC_NRMASK, _IOC_TYPEMASK, _IOC_SIZEMASK, _IOC_DIRMASK, _IOC_NRSHIFT, _IOC_TYPESHIFT, _IOC_SIZESHIFT, _IOC_DIRSHIFT, _IOC_NONE, _IOC_READ, _IOC_WRITE

This comment has been minimized.

@dvyukov

dvyukov Oct 29, 2018
Collaborator

Does strace produce all these consts? In what context?

This comment has been minimized.

@shankarapailoor

shankarapailoor Oct 29, 2018
Author Contributor

No it doesn't add all these constants but sometimes it will expand the ioctl command into _IOC(..,...,...). Eg.

ioctl(3, _IOC(_IOC_WRITE, 0x66, 0x8, 0x28), 0x20000040) = -1 EPERM (Operation not permitted)

I try and evaluate those and so I added the shifts. I was going to use CGO to evaluate but I read that you should avoid that if possible.

This comment has been minimized.

@dvyukov

dvyukov Oct 29, 2018
Collaborator

Then this is fine.
Just make sure that you check and fail if the consts are missing so that somebody does not remove them tomorrow because they don't understand why they are here.

@@ -49,7 +49,7 @@ ioctl$TIOCMBIC(fd fd_tty, cmd const[TIOCMBIC], arg ptr[in, int32])
ioctl$TIOCMBIS(fd fd_tty, cmd const[TIOCMBIS], arg ptr[in, int32])
ioctl$TIOCGSOFTCAR(fd fd_tty, cmd const[TIOCGSOFTCAR], arg ptr[out, int32])
ioctl$TIOCSSOFTCAR(fd fd_tty, cmd const[TIOCSSOFTCAR], arg ptr[in, int32])

ioctl$TIOCGPTN(fd fd_tty, cmd const[TIOCGPTN], arg ptr[out, int32])

This comment has been minimized.

@dvyukov

dvyukov Oct 29, 2018
Collaborator

This is not implemented in kernel, no point in testing it. You can add a comment about this instead if you want.


define ADJ_OFFSET_SS_READ 0xa001
define SIG_0 0
define SCHED_OTHER SCHED_NORMAL

This comment has been minimized.

@dvyukov

dvyukov Oct 29, 2018
Collaborator

I think it's better to move this to the trace2syz const map as this is just an alias.

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Oct 29, 2018

This becomes very hard to review this as the change grew significantly due to *.txt changes, and github review system makes it hard to track old comments.
Please factor out *.txt changes from this. I.e. copy a single .txt file to a separate branch, run make extract+generate and submit that as separate pull request. A single txt file change should be easy to review, and we can get them merged one by one, which will relief this change from these distracting parts.
Thanks

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Oct 29, 2018

I will answer re all const questions here.

When I proposed to add some consts to txt files, I looked at one use case. And there it made sense to just add the const to txt file. But there seems to be several completely different cases re different consts. And I think we should handle these classes differently.

  1. Some consts make sense to have in txt files. For example, if we already have flags that enumerate a group of consts, but one of them is just missing. Or when we miss descriptions for some subsystem, but we should have it eventually (it's just that we did not describe it yet) for example, SIOCSIWNWID.

  2. strace aliases, when strace just use a different const name for a const that we already have. These are better to keep in the trace2syz map that just maps strace names to kernel names.

  3. Consts that kernel returns to user space, but userspace is not supposed to pass to kernel (MSG_CTRUNC). Or internal kernel consts that userspace is not supposed to pass to kernel (MS_KERNMOUNT). These consts are not interesting for fuzzing and in fact they make fuzzer less efficient because it has larger search space. So you say that trace2syz fails on unknown consts (which is good). I think we can introduce another map in trace2syz which will contain known, but uninteresting consts. We can check on startup that non of them are present in txt descriptions, and then if we see it in strace output, we don't fail but just use 0.

  4. Complex aliases like #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY). It also does not make sense to have them in txt descriptions as duplicate in flags, because we already have __O_TMPFILE and O_DIRECTORY separately. For these I think we should do:

define O_TMPFILE     __O_TMPFILE | O_DIRECTORY

Which will make the const available to trace2syz but won't affect fuzzing.

Are there other classes?
Overall I mean that we don't have to treat all of them in the same way, because they are in fact different cases.

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Oct 29, 2018

Your breakdown is very nice and I completely agree. The other case is when trace2syz must evaluate some macros like _IOC or makedev but you already answered that previously. In those cases would it be better to create a new macros.txt to put all of these so I don't clutter sys.txt

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Oct 29, 2018

In those cases would it be better to create a new macros.txt to put all of these so I don't clutter sys.txt

Interesting option.
What exactly will go there?
I think, for example, the BPF consts probably better stays in bpf.txt.
Are we talking about only _IOC consts? If so I don't have strong opinion. If it's just _IOC, then I would probably leave them in sys.txt for now (maybe with a comment).
If we can come up with at leasdt 3-5 groups of very special consts that don't belong to any other file, then it may be a good idea to create trace2syz.txt for them.

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Oct 29, 2018

I haven't found that many. Just these two:

  1. _IOC, _IO, (ioctl variants)
  2. QCMD. See here
    Example:
quotactl(QCMD(0 /* Q_??? */, USRQUOTA), "\x2e\x2f\x66\x69\x6c\x65\x30", 0, 0x20000100) = -1 ENOTBLK (Block device required)

I will start by adding them to sys.txt with a comment and if we get a critical mass then we can move them over to trace2syz.txt

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Oct 29, 2018

Sounds good to me.

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Oct 30, 2018

Hi Dmitry,

I have updated the PR. I forgot 1 file tty.txt with TIOCGPTN. Hopefully that doesn't clutter things up too much otherwise I can send a separate PR for that.

EDIT: I removed TIOCGPTN from tty.txt so the PR only has trace2syz files :)

.gitignore Outdated
@@ -2,5 +2,4 @@
*~
*.cfg
workdir*

This comment has been minimized.

@dvyukov

dvyukov Nov 1, 2018
Collaborator

please revert unrelated changes

@@ -6,6 +6,7 @@ package prog
import (
"fmt"
"path/filepath"
// "path/filepath"

This comment has been minimized.

@dvyukov

dvyukov Nov 1, 2018
Collaborator

remove

@@ -49,7 +49,6 @@ ioctl$TIOCMBIC(fd fd_tty, cmd const[TIOCMBIC], arg ptr[in, int32])
ioctl$TIOCMBIS(fd fd_tty, cmd const[TIOCMBIS], arg ptr[in, int32])
ioctl$TIOCGSOFTCAR(fd fd_tty, cmd const[TIOCGSOFTCAR], arg ptr[out, int32])
ioctl$TIOCSSOFTCAR(fd fd_tty, cmd const[TIOCSSOFTCAR], arg ptr[in, int32])

This comment has been minimized.

@dvyukov

dvyukov Nov 1, 2018
Collaborator

revert

// syz-trace2syz converts strace traces to syzkaller programs.
//
// Simple usage:
// strace -o trace -s 65500 -v -xx -f ./a.out

This comment has been minimized.

@dvyukov

dvyukov Nov 1, 2018
Collaborator

-a 1 should reduce trace size a bit
not that it's super radical reduction, but since they are huge any reduction will be useful

var names []string
infos, err := ioutil.ReadDir(dir)
if err != nil {
log.Fatalf("Failed to read dir: %s", err.Error())

This comment has been minimized.

@dvyukov

dvyukov Nov 1, 2018
Collaborator

log.Fatalf("failed to read dir: %v", err)

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Nov 1, 2018

I've tried to run it on another syzkaller program, but it fails with:

error: syntax error
2018/11/01 15:13:19 Error parsing line: 6037  futex(0x6c3c08, FUTEX_WAIT_PRIVATE, 0, NULL <ptrace(SYSCALL):No such process>

I've tried to recollect trace 3 times, but I always get this line.

I am starting wondering if using strace is a good idea. Maybe writing own tracer will turn out to be much simpler and more robust. The main part of strace is all these syscall descriptions, but we don't need it, we have our own. If we parse arguments according to our descriptions then we don't need to do any conversion at all, and don't need to mess with all these constants because we will simply read constants as numbers. And all that unfinished ... and ... resumed we will simply not inject. We would write out syzkaller programs on the fly.

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Nov 1, 2018

I've tried it on this syzkaller program:

socket$can_raw(0x1d, 0x3, 0x1)
perf_event_open(&(0x7f000001d000)={0x1, 0x70, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @perf_config_ext}, 0x0, 0xffffffffffffffff, 0xffffffffffffffff, 0x0)
getsockopt$inet_sctp_SCTP_RESET_STREAMS(0xffffffffffffffff, 0x84, 0x77, &(0x7f00000005c0)={0x0, 0x0, 0x3, [0x7, 0x3, 0x5]}, &(0x7f0000000400)=0xe)
r0 = syz_open_procfs(0x0, &(0x7f0000000000)="2f65786500000000000409004bddd9de91be10eeaf000ee9a90f798058439ed554fa07424ada75af1f02ac06edbcd7a071fb35331ce39c5a00000000")
fsetxattr(r0, &(0x7f0000000280)=@known='user.syz\x00', &(0x7f00000002c0)='\x00', 0x398, 0x0)
fremovexattr(r0, &(0x7f00000000c0)=@known='user.syz\x00')

Here is the strace:
https://gist.githubusercontent.com/dvyukov/8e82525e74b6a49c8835d743410a950b/raw/58156feace776a2f292b06695a2d1154a78c07c3/gistfile1.txt

And here is the result:

mmap(&(0x7f0000000000/0x1000)=nil, 0x1000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
set_robust_list(&(0x7f0000000010)={&(0x7f0000000000), 0x0, &(0x7f0000000008)}, 0x18)
r0 = socket$can_raw(0x1d, 0x3, 0x1)
perf_event_open(&(0x7f0000000029)={0x1, 0x70, 0x0, 0x7f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @perf_bp={&(0x7f0000000028)}}, 0x0, 0xffffffffffffffff, r0, 0x0)
getsockopt$inet_sctp_SCTP_RESET_STREAMS(r0, 0x84, 0x77, &(0x7f0000000099), &(0x7f00000000a1)=0x8)
open(&(0x7f00000000a5)='/proc/self//exe\x00', 0x2, 0x0)
r1 = open(&(0x7f00000000b5)='/proc/self//exe\x00', 0x0, 0x0)
fsetxattr(r1, &(0x7f00000000c5)=@known='user.syz\x00', &(0x7f00000000ce)="00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000e000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000030007000300050000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000", 0x399, 0x0)
fremovexattr(r1, &(0x7f0000000467)=@known='user.syz\x00')

It actually did good job for fsetxattr/fremovexattr which is nice.

But there seems to be a bug related to usage of socket result instead of -1. E.g. the original program and the trace use -1:

getsockopt(-1, SOL_SCTP, SCTP_RESET_STREAMS,  <unfinished ...>

but the resulting program suddenly uses the socket descriptor:

r0 = socket$can_raw(0x1d, 0x3, 0x1)
getsockopt$inet_sctp_SCTP_RESET_STREAMS(r0, 0x84, 0x77, &(0x7f0000000099), &(0x7f00000000a1)=0x8)
return nil
}

func genDefaultArg(syzType prog.Type, ctx *Context) prog.Arg {

This comment has been minimized.

@dvyukov

dvyukov Nov 1, 2018
Collaborator

Can't we just use prog.DefaultArg instead of this.
For StructType and UnionType this simply copy-pastes code from prog, for most remaining types it calls prog.DefaultArg.
I see a difference for PtrType, but if we are here then we already gave up on generating good value, so we could as well just use DefaultArg for PtrType too.
I don't fully understand what happens for ResourceType. But I suspect it may be the root cause of the bug that I described before where fd=-1 transforms into result of socket call. No, if we use default resource value, we don't need to cache it and reuse in future (which one if there are several calls?). If fd=-1 is used later, then we just use fd=-1 rather than result of some preceding call.
So if I am not terribly mistaken, we can just replace this whole function with prog.DefaultArg.

This comment has been minimized.

@shankarapailoor

shankarapailoor Nov 1, 2018
Author Contributor

The problem is that for calls like pipe() or socketpair() their descriptors get lost and any case where descriptors are out arguments

The reason for the bug is here: https://github.com/shankarapailoor/syzkaller/blob/master/tools/syz-trace2syz/proggen/proggen.go#L474, which I added a few days ago while trying to parse traces of syzkaller generated programs.

This comment has been minimized.

@dvyukov

dvyukov Nov 1, 2018
Collaborator

But this does not make sense for the default value (fd=-1).
First, when -1 is used later we don't need to map it to a previous result arg, we just use -1.
Second, the cache holds only 1 value for given type/value, but we can generate multiple default resource args all with the same value of -1, which one will be cached and used later? which one should be cached and used later? We can't answer this question. Whichever we take is wrong.

This comment has been minimized.

@shankarapailoor

shankarapailoor Nov 1, 2018
Author Contributor

Ok so I agree that this function can be removed. I don't think prog.DefaultArg can be called only in genArgs for now mainly because of resource out-dir args.

"rt_sigqueueinfo": true,
"rt_sigsuspend": true,
// Require function pointers which are not recovered by strace
"rt_sigaction": true,

This comment has been minimized.

@dvyukov

dvyukov Dec 6, 2018
Collaborator

I think we also need to ignore set_robust_list and set_tid_address. They are executed by glibc on process/thread start and are present in just any program in the beginning, but normal programs usually don't use them and we probably won't be able to build interesting programs with set_robust_list.

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Dec 6, 2018

What do you think if I merge this and need do a clean up pass over code? I fell that it can be faster for me just fix some assorted things rather then write it all down.

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Dec 6, 2018

Sounds good to me!

@dvyukov dvyukov merged commit 6a60a19 into google:master Dec 6, 2018
2 checks passed
2 checks passed
@googlebot
cla/google All necessary CLAs are signed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Dec 6, 2018

I've pushed first batch of commits:
https://github.com/google/syzkaller/commits/master
Please double check for sanity and also that's what I would write in comments.

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Dec 6, 2018

I looked over the commits and they seem good to me. The main commit that changed code in trace2syz was the "tidy up code" commit and I think the changes made it better. I forgot to add copyright headers to return_cache.go and context.go so I created a new PR for that.

Presently, the tool cannot convert strace input that has inner call arguments like inet_addr, inet_pton. I wanted to make sure you were aware of that. I have been sending patches to @ldv-alt to remove those under -Xraw but this work is ongoing.

EDIT: Can we also close #706?

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Dec 7, 2018

Presently, the tool cannot convert strace input that has inner call arguments like inet_addr, inet_pton.

Please share links to patches here. I also noticed that SIG* consts are still in strace output and are not parsed.

EDIT: Can we also close #706?

Done

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Dec 8, 2018

Please share links to patches here.

Merged

Need Revision

Not reviewed

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Dec 9, 2018

Didn't you also hit signal numbers SIG*? Signal numbers SIG* failed parsing for me even on a simplest program.

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Dec 9, 2018

Not yet. I'm working on that today.

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Jan 5, 2019

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Jan 6, 2019

Nice! Thanks for the update.

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Jan 10, 2019

Hi @dvyukov ,

trace2syz has trouble generating sockaddr_storage_in6structs defined here. The issue is that strace decodes these structs as sockaddr_in6 struct so trace2syz tries to generate the addr field of sockaddr_storage_in6 from the af_family field of sockaddr_in6 which triggers an error.

I'm thinking of handling this as a special case, but I'm guessing you have a better idea :).

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Jan 11, 2019

Couple more questions:

In call_selector.go it seems we use the file descriptor as a discriminator arg. Can the third argument of an ioctl change for a given command if the device is different? If not, could we just use the command to discriminate.

This brings up another point. I agree that if we match on file descriptor we should match exactly, but I think in order for this to work for ioctls we need to use syz_open_dev. Should we handle this case separately (open->syz_open_dev) because I don't see a clean way for it to fit with our current callSelector?

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Jan 14, 2019

Hi @dvyukov ,

trace2syz has trouble generating sockaddr_storage_in6structs defined here. The issue is that strace decodes these structs as sockaddr_in6 struct so trace2syz tries to generate the addr field of sockaddr_storage_in6 from the af_family field of sockaddr_in6 which triggers an error.

I'm thinking of handling this as a special case, but I'm guessing you have a better idea :).

I don't have any great ideas.
An obvious heuristic for this case would be as follows. If we are trying to put a non-struct strace arg into a struct syzkaller type, recurse into the struct syzkaller type.
A more limited version would be to do this check only when we are already recursing into a syzkaller struct. I.e. in this case we recurse into sockaddr_storage_in6, and at this point see that we actually need to recurse into 2 structs.

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Jan 14, 2019

But I don't know how well it will work for other cases, and if it will break other things.

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Jan 14, 2019

Couple more questions:

In call_selector.go it seems we use the file descriptor as a discriminator arg. Can the third argument of an ioctl change for a given command if the device is different? If not, could we just use the command to discriminate.

This brings up another point. I agree that if we match on file descriptor we should match exactly, but I think in order for this to work for ioctls we need to use syz_open_dev. Should we handle this case separately (open->syz_open_dev) because I don't see a clean way for it to fit with our current callSelector?

Unfortunately, ioctl command are not unique, e.g.
VIDIOC_ENUM_DV_TIMINGS on /dev/video0
#define VIDIOC_ENUM_DV_TIMINGS _IOWR('V', 98, struct v4l2_enum_dv_timings)

and VIDIOC_SUBDEV_ENUM_DV_TIMINGS on /dev/v4l-subdev0
#define VIDIOC_SUBDEV_ENUM_DV_TIMINGS _IOWR('V', 98, struct v4l2_enum_dv_timings)

It's only a matter of matching magic char ('V' in this case) and argument size.
So ideally we need both device type and command constant.

Re open vs syz_open_dev. We have one for some devices and another for other devices. It won't work if we emit only syz_open_dev as it won't cover lots of fd types. Perhaps we could check if name matches against one of open variants, emit that open; if it matches against one of openat variants, emit that openat; if it matches against syz_opendev, emit that syz_opendev. Then we will get proper fd type.

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Jan 16, 2019

In traces of real programs you will often get calls that use stdin, stdout, stderr. This creates a slight issue when selecting calls like ioctl$TCGETS because the file descriptor is not in the result cache so it cannot get an exact match.

Should we also look at the default values in the resource description when making a selection? If so, are 0, 1, 2 default values for an existing file descriptor type and if not, could we add them to, say, fd_tty? Do you have concerns about the fuzzer messing around with stdin, stdout?

@dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Jan 16, 2019

syzkaller programs should not use stdin/out/err. Using the real ones will break IPC with fuzzer. In executor we remap stderr to 0 and 1, so these are not what was used in the traced program. E.g. read from stdin will fail since it's write-only pipe.
We either need to explicitly create tty for stdin/stdout/stderr and use it when traced program uses 0/1/2. Or maybe simply ignore any calls with fd=0/1/2? Programs usually don't do anything interesting with them. These are usually used to print some logs, e.g. if we trace an LTP test then whatever it logs to stdout is not related to the test itself.

@shankarapailoor
Copy link
Contributor Author

@shankarapailoor shankarapailoor commented Jan 16, 2019

Or maybe simply ignore any calls with fd=0/1/2? Programs usually don't do anything interesting with them. These are usually used to print some logs, e.g. if we trace an LTP test then whatever it logs to stdout is not related to the test itself.

I think ignoring them is better. Most of the ones I see are pretty uninteresting. Probably best to log that we are skipping though.

However, in general, when we select a system call variant with a resource type should we check if value in the trace is one of the default ones for the particular resource?

@dvyukov