Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/linux: enhanced descs for io_uring #1926

Merged
merged 33 commits into from
Jul 24, 2020
Merged

Conversation

necipfazil
Copy link
Collaborator

@necipfazil necipfazil commented Jul 9, 2020

@necipfazil necipfazil marked this pull request as draft July 9, 2020 13:20
executor/common_linux.h Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
pkg/host/syscalls_linux.go Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
@necipfazil
Copy link
Collaborator Author

The code does not compile due to this, which will be addressed shortly.

@dvyukov
Copy link
Collaborator

dvyukov commented Jul 11, 2020

I missed that some offsets are dynamic and depend on size.

For check_add_overflow/struct_size/etc, I think we should ignore the possibility of overflows and just open-code the operations they do. First, we are a fuzzer and if we do more nonsense, nothing bad happens. Maybe the opposite: the more nonsense, the better! :)
Second, what matters for us is if we overflow the already allocated sizes for these rings, not the UINT64_MAX or whatever they check in the kernel before the allocation. And overflowing the smaller allocated sizes if much easier. So this check mostly does not do anything useful for us.

For the static offsets I was thinking about calling io_uring_create inside of syz_io_uring_submit and using the values it returns rather then hardcoding them in executor. These offsets may change across kernel versions. And just less logic in executor this way.
But let's think what's the best way to handle the dynamic offset first, because it may affect the best way to handle static offsets as well.

@dvyukov
Copy link
Collaborator

dvyukov commented Jul 11, 2020

FTR, the array offset calculation in the kernel seems wrong to me:
https://lore.kernel.org/io-uring/20200711093111.2490946-1-dvyukov@google.com/T/#u

@dvyukov
Copy link
Collaborator

dvyukov commented Jul 11, 2020

Yes, and for SMP and cache-line alignment, I think we should assume the kernel is SMP maybe with a comment about this assumption. First, non-SMP kernels are quite rare today (maybe only the smallest ARM boards), definitely not anything we care about right now. Second, figuring out if kernel was build as SMP or not may be tricky and can only be done dynamically.

@dvyukov
Copy link
Collaborator

dvyukov commented Jul 11, 2020

The offsets calculation does not look too complex. I used test program for experimentation:
https://gist.githubusercontent.com/dvyukov/2fe31bfd154592f3840dc4676294b77e/raw/c42cb4a3da0a032b0f4d5e28b7f6f0526f43a36f/gistfile1.txt
It prints:

res=3 time=3745
sq: head=0 tail=64 ring_mask=256 ring_entries=264 array=4928
cq: head=128 tail=192 ring_mask=260 ring_entries=268 array=320
sq_entries=128 cq_entries=256
sq_array_off=4928

I see 2 options:

  1. Either duplicate whole calculation in executor (what you do now). But then I think it's simpler to just hardcode all offsets as defines (we duplicate all of that logic, so there is little in doing something more complex).
    or 2. call io_uring_setup with size 1 to get static offsets and then calculate sq array offset manually (as my test program does).

Which one is better?... Hard to say 2 is less hard-coded things that duplicate kernel logic and can break in future (but still some), but 1 is simpler and faster...
Let's do 1 for now as simpler and faster, with some comment and just difines for offsets.

@codecov
Copy link

codecov bot commented Jul 17, 2020

Codecov Report

Merging #1926 into master will increase coverage by 0.0%.
The diff coverage is 80.0%.

Impacted Files Coverage Δ
pkg/host/syscalls_linux.go 72.2% <80.0%> (+0.1%) ⬆️
prog/mutation.go 90.4% <0.0%> (-1.3%) ⬇️
prog/any.go 85.6% <0.0%> (+0.9%) ⬆️
prog/target.go 68.2% <0.0%> (+5.8%) ⬆️

executor/common_linux.h Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
sys/linux/io_uring.txt Outdated Show resolved Hide resolved
sys/linux/io_uring.txt Outdated Show resolved Hide resolved
sys/linux/io_uring.txt Outdated Show resolved Hide resolved
sys/linux/io_uring.txt Outdated Show resolved Hide resolved
sys/linux/io_uring.txt Outdated Show resolved Hide resolved
sys/linux/io_uring.txt Outdated Show resolved Hide resolved
sys/linux/io_uring.txt Outdated Show resolved Hide resolved
executor/common_linux.h Outdated Show resolved Hide resolved
@dvyukov
Copy link
Collaborator

dvyukov commented Jul 21, 2020

There are few remaining TODO's, please finalize it, and we will do final review passes.

@dvyukov
Copy link
Collaborator

dvyukov commented Jul 21, 2020

One additional thing we need since this is a tricky subsystem and we use custom pseudo-syscalls is a test program.
These are stored in sys/linux/test/*
At least we need 1 successful scenario: create io_uring, submit 1 operation using syz_io_uring_submit, wait for completion, call syz_io_uring_complete, check that the operation has succeeded.
Checking that the operation was successful can be done either with syz_io_uring_complete result (if openat2 operation returns an fd), or by observing some side-effect of the operations (e.g. if we do a write to a pipe, we can then read from the pipe).
Check out other test programs for the use of AUTO, it's handy for specifying address and const values.
The test can be run with syz-execprog utility inside of a VM. Or using go install ./tools/syz-runtest && syz-runtest -config=manager.config -tests yourtestfilename.

sys/linux/io_uring.txt Outdated Show resolved Hide resolved
sys/linux/io_uring.txt Outdated Show resolved Hide resolved
sys/linux/io_uring.txt Show resolved Hide resolved
@dvyukov
Copy link
Collaborator

dvyukov commented Jul 22, 2020

One additional thing we need since this is a tricky subsystem and we use custom pseudo-syscalls is a test program.
These are stored in sys/linux/test/*
At least we need 1 successful scenario: create io_uring, submit 1 operation using syz_io_uring_submit, wait for completion, call syz_io_uring_complete, check that the operation has succeeded.
Checking that the operation was successful can be done either with syz_io_uring_complete result (if openat2 operation returns an fd), or by observing some side-effect of the operations (e.g. if we do a write to a pipe, we can then read from the pipe).
Check out other test programs for the use of AUTO, it's handy for specifying address and const values.
The test can be run with syz-execprog utility inside of a VM. Or using go install ./tools/syz-runtest && syz-runtest -config=manager.config -tests yourtestfilename.

OK, so we have syz_io_uring_complete return an fd in some cases.
A test that obtains that fd and then closes it would be an awesome demonstration of the power of the new descriptions.
And I usually also find lots of bugs in my descriptions when I run such tests :)

@necipfazil
Copy link
Collaborator Author

The test to test the new descriptions and the pseudo-calls now added, which indeed helped as I found a bug in one of the pseudo-calls' implementation.

@necipfazil
Copy link
Collaborator Author

While writing the mmap calls' arguments in the tests, something drew my attention.

In the current state of the descriptions, the length arg for the mmap$IORING_* is written in a very generic way [1].

On the other hand, if we want the fuzzer to be able to utilize the ring in full, we may want to compute the size for the memory region we want to map (as in here) and call the mmap with that size (as in here).

I cannot think of a way to achieve the exact computation without another pseudo-call. One approach might be to have syz_io_uring_setup(), which would wrap io_uring_setup() and do the mapping right after the setup inside the pseudo-call, and return us the ring_ptr. Another benefit of such approach might be that we can make io_cqring_offsets and io_sqring_offsets as out args of the new pseudo-call, and fill them once io_uring_setup() returns inside. This way, we might be able avoid hard-coding the offsets. This might complicate the things for syz_memcpy_off(), though..

Another approach might be to set lower&upper bounds for mmap$IORING_*'s length. The num of sq and cq ring entries are limited, thus, it is doable. This would be the fastest solution.

Copy link
Collaborator

@dvyukov dvyukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the TODO re detecting supported syscalls this looks good to me.

sys/linux/io_uring.txt Outdated Show resolved Hide resolved
@dvyukov dvyukov requested a review from xairy July 23, 2020 15:35
pkg/host/syscalls_linux.go Outdated Show resolved Hide resolved
pkg/host/syscalls_linux.go Outdated Show resolved Hide resolved
Introduced pseudo-call "syz_io_uring_put_sqes_on_ring()" for writing
submission queue entries (sqes) on sq_ring, which was obtained by
mmap'ping the offsets obtained from io_uring_setup().

Added descriptions for io_ring_register operations that were missing
earlier.

Did misc changes to adapt the descriptions for the updates on the
io_uring subsystem.
addr and len are for the buffer located at buf_index
As a result, IOSQE_BUFFER_SELECT_BIT is included in the iosqe_flags.
The usage of cq_ring->flags is only for manipulating
IORING_CQ_EVENTFD_DISABLED bit. This is achieved by a pseudo-syscall,
which toggles the bit.
Removed syz_io_uring_cq_eventfd_toggle() and introduced
syz_io_uring_put_ring_metadata() instead. We have many pieces of
metadata for both sq_ring and cq_ring, for which we are given the
offsets, and some of are not supposed to be manipulated by the
application. Among them, both sq and cq flags can be changed. Both valid
and invalid cases might cause interesting outcomes. Use the newly
introduced pseudo syscall to manipulate them randomly while also
manipulating the flags to their special values.
Removed syz_io_uring_put_ring_metadata() and instead added a much more
generic pseudo systemcall to achieve the task. This should benefit other
subsystems as well.
syz_io_uring_submit() is called with a union of sqes to reduce
duplication of other parameters of the function.

io_uring_sqe is templated with io_uring_sqe_t, and this template type is
used to describe sqes for different ops.

The organization of io_uring.txt is changed.
The files are registered using
io_uring_register$IORING_REGISTER_FILES(). When IOSQE_FIXED_FILE_BIT is
enabled in iosqe_flags in sqe, a variety of operations can use those
registered files using the index of the file instead of fd.

Changed the sqe descriptions for the eligible operations to utilize
this.
…sqes

A personality_id can be registered for a io_uring fd using
io_uring_register$IORING_REGISTER_PERSONALITY(). This id can be utilized
within sqes. This commit improves the descs for io_uring to utilize it.

In addition, the descriptions for the misc field in io_uring_sqe_t is
refactored as most are shared among sqes.
io_uring_cqe.res is used to carry the return value of operations
achieved through io_uring. The only operations with meaningful return
values (in terms of their possible usage) are openat and openat2. The
pseudo-syscall syz_io_uring_complete() is modified to account for this
and return those fds. The description for sqe_user_data is splitted into
two to identify openat and non-openat io_uring ops.

IORING_OP_IOCTL was suggested but never supported in io_uring. Thus, the
note on this is removed in the descriptions.

tee() expects pipefds, thus, IORING_OP_TEE. The descriptions for the
pipe r/w fds are written as ordinary fd. Thus, in the description for
IORING_OP_TEE, which is io_uring_sqe_tee, fd is used in the place where
pipefds are expected. The note on this is removed in the descriptions.
The changes successfully pass the sys/linux/test/io_uring test.

sys/linux/io_uring.txt: sq_ring_ptr and cq_ring_ptr are really the same.
Thus, they are replaced with ring_ptr.

executor/common_linux.h: thanks to io_uring test, a bug is found in
where the sq_array's address is computed in syz_io_uring_submit().
Fixed. In addition, similar to the descriptions, the naming for the
ring_ptr is changed from {sq,cq}_ring_ptr to ring_ptr.
Used a smaller range to ease the collisions. Used comperatively unique
and magic numbers for openat user_data to avoid thinking as if the cqe
belongs to openat while the user_data is coming from some random
location.
Copy link
Collaborator

@dvyukov dvyukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minus the linter warnings.

@necipfazil necipfazil marked this pull request as ready for review July 24, 2020 12:50
@necipfazil
Copy link
Collaborator Author

Thanks for your valuable reviews!

The last commit fixes the io_uring test and the linter warnings.

This should be ready for the final review passes @xairy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants