Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscalls: use gperf for lookups #204

Closed
wants to merge 1 commit into from
Closed

Conversation

giuseppe
Copy link
Contributor

use gperf to generate a perfect hash to lookup syscall names. It
improves significantly the complexity for seccomp_syscall_resolve_name.*
since it replaces the expensive strcmp for each syscall in the
database, with a lookup table.

The complexity for x86_64_syscall_resolve_num is not changed and it
uses the linear search, that is anyway less expensive than
seccomp_syscall_resolve_name.* as it uses an index for comparison
instead of doing a string comparison.

On my machine, calling 1000 seccomp_syscall_resolve_name_arch and
seccomp_syscall_resolve_num_arch over the entire syscalls DB passed
from ~0.45 sec to ~0.06s.

I've implemented it only for x86_64 as I've no access to other archs
for benchmarking the results.

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

@giuseppe giuseppe changed the title [RFC] syscals, x86_64: use gperf [RFC] syscalls, x86_64: use gperf Jan 24, 2020
@giuseppe giuseppe requested a review from pcmoore January 24, 2020 22:55
@coveralls
Copy link

coveralls commented Jan 24, 2020

Coverage Status

Coverage increased (+2.1%) to 92.392% when pulling 9e7214a on giuseppe:gperf into dc2831e on seccomp:master.

@giuseppe giuseppe changed the title [RFC] syscalls, x86_64: use gperf syscalls, x86_64: use gperf Jan 28, 2020
src/Makefile.am Outdated Show resolved Hide resolved
@giuseppe
Copy link
Contributor Author

@pcmoore could you please take a look?

Copy link
Member

@drakenclimber drakenclimber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this could be a good improvement for applications that are quickly creating/deleting containers with seccomp filters on them. Nice work.

src/arch-x86_64-syscalls.c.perf Outdated Show resolved Hide resolved
src/arch-x86_64-syscalls.c.perf Outdated Show resolved Hide resolved
src/Makefile.am Outdated Show resolved Hide resolved
src/arch-x86_64-syscalls.c.perf Outdated Show resolved Hide resolved
@pcmoore
Copy link
Member

pcmoore commented Jan 30, 2020

@pcmoore could you please take a look?

Hi @giuseppe, thanks for you work on this. I think this goes well with some other work I wanted to get done with libseccomp, but as I'm looking for the associated issue I realize that I never created one; the only reference I had outside my head was a discussion with @drakenclimber one day at lunch :/

Let me create that issue today and then we can talk further.

@pcmoore
Copy link
Member

pcmoore commented Jan 30, 2020

Hi @giuseppe, thanks for you work on this. I think this goes well with some other work I wanted to get done with libseccomp, but as I'm looking for the associated issue I realize that I never created one; the only reference I had outside my head was a discussion with @drakenclimber one day at lunch :/

Let me create that issue today and then we can talk further.

Okay, take a look at #207 and let me know what you think.

@pcmoore
Copy link
Member

pcmoore commented Jan 30, 2020

Also, if someone wants to work on this and #207, I think we could consider this for the v2.5 milestone/release.

(This would make @drakenclimber and my lives much easier from a maintenance perspective, so anyone willing to work on this get's a large number of gold stars!)

@giuseppe
Copy link
Contributor Author

Okay, take a look at #207 and let me know what you think.

thanks for writing this up. I think it makes a lot of sense to not spread the same information among different files.

Would you be ok with having the .perf file as the single source instead of a .csv file?

It will be really easy to generalize the x86_64 code to handle multiple archs. We will need to add a column for each arch, in the same way as you've proposed for the .csv file, so it will look more like:

+struct arch_syscall_def_internal { int name; int x86_64; int x86; ...; int index; };

We still need the index column as gperf doesn't keep the same ordering we provide.

If you prefer the csv file then we would need to generate the .perf file based on that.

The main disadvantage with using directly gperf is that the lookup by syscall number is still O(n) as we need to scan the entire table, although we are comparing a single int so it is significantly faster than strcmp.
Instead, if we'll use a .csv file we could easily build an additional lookup array for each arch so that the syscall number -> name can also be O(1)

@pcmoore
Copy link
Member

pcmoore commented Jan 30, 2020

Would you be ok with having the .perf file as the single source instead of a .csv file?

Well, a CSV file easy to manipulate with any number of tools, the *.perf file format looks a bit more convoluted. Besides ...

The main disadvantage with using directly gperf is that the lookup by syscall number is still O(n) as we need to scan the entire table, although we are comparing a single int so it is significantly faster than strcmp.
Instead, if we'll use a .csv file we could easily build an additional lookup array for each arch so that the syscall number -> name can also be O(1).

Is using the *.perf file format as the source even desirable given what you've written above?

The CSV file format is tool agnostic and gives us plenty of options for other "stuff", including this gperf optimization. As of right now I think the CSV source format is our best option of the two.

@giuseppe
Copy link
Contributor Author

Is using the *.perf file format as the source even desirable given what you've written above?

The CSV file format is tool agnostic and gives us plenty of options for other "stuff", including this gperf optimization. As of right now I think the CSV source format is our best option of the two.

I was not sure as I didn't want to add yet another level of source generation.

I went forward and added a .csv file. The perf file is generated from there. I've also adapted the code for arm, as an example of how it is possible to move existing archs to gperf.

@pcmoore
Copy link
Member

pcmoore commented Jan 30, 2020

I'm going to preface this by saying I know almost nothing about gperf at the moment, but it would be really nice if we could limit the amount of C code in the "syscalls.perf.template" file to close-to-nothing. It looks like we would need XXX_syscall_resolve_name, XXX_syscall_resolve_num, and XXX_syscall_iterate and that is it, right?

I would also like to see the "syscalls.csv to syscalls.perf" generation commands encapsulated into a separate script instead of open-coded into the makefile.

It would also be nice to drop the quotes around the syscall names in the CSV file. Let's also convert the "__PNR_xxx" references into something generic like just "PNR"; the conversion script can change that into the appropriate "__PNR_xxx" value. Further, can we get rid of the explicit offsets in the CSV, e.g. "(__SCMP_NR_BASE + 140)"? It seems like the conversion script could always assume an offset and generate the "(__SCMP_NR_BASE + xxx)" C code and it would be up to the ABI definition to define the offset; for ABIs like x86 that offset could be zero.

@pcmoore
Copy link
Member

pcmoore commented Jan 30, 2020

Side note: you also need to find a way to get your code coverage numbers back up. It might just be a symptom of the PR being a work-in-progress, but we've worked way too hard to have it the coverage numbers drop down into the 60% range.

@giuseppe
Copy link
Contributor Author

I'm going to preface this by saying I know almost nothing about gperf at the moment, but it would be really nice if we could limit the amount of C code in the "syscalls.perf.template" file to close-to-nothing. It looks like we would need XXX_syscall_resolve_name, XXX_syscall_resolve_num, and XXX_syscall_iterate and that is it, right?

I would also like to see the "syscalls.csv to syscalls.perf" generation commands encapsulated into a separate script instead of open-coded into the makefile.

It would also be nice to drop the quotes around the syscall names in the CSV file. Let's also convert the "__PNR_xxx" references into something generic like just "PNR"; the conversion script can change that into the appropriate "__PNR_xxx" value. Further, can we get rid of the explicit offsets in the CSV, e.g. "(__SCMP_NR_BASE + 140)"? It seems like the conversion script could always assume an offset and generate the "(__SCMP_NR_BASE + xxx)" C code and it would be up to the ABI definition to define the offset; for ABIs like x86 that offset could be zero.

can we do these cleanups for the .csv file incrementally? I've taken care of the other comments, and also the coverage seems to be again ok now.

@pcmoore
Copy link
Member

pcmoore commented Jan 31, 2020

can we do these cleanups for the .csv file incrementally?

I would like to see them all in the same PR. The same goes for supporting all the arch/ABIs.

I'm not a big fan of incomplete work where there is just a promise to do it later ;)

@giuseppe
Copy link
Contributor Author

I'm not a big fan of incomplete work where there is just a promise to do it later ;)

I wasn't even promising that :-) I thought it was fine to just add it for x86_64 and other arches could be adapted later by whoever cares about them.

Can we just use the existing values to populate the csv for now? I am not exactly sure how you'd like to rewrite them an it seems like an extra step that could be done separately.

Some arches, like s390, have some extra logic in $ARCH_syscall_resolve_num, and there are a series of if (num == VALUE) before iterating the table. Is it just an optimization for the most used syscalls or is it something else?

@giuseppe giuseppe changed the title syscalls, x86_64: use gperf syscalls: use gperf Jan 31, 2020
@giuseppe
Copy link
Contributor Author

I've pushed a new version where I've adapted all the architectures

@pcmoore
Copy link
Member

pcmoore commented Mar 9, 2020

Thanks for renaming test 56 :)

Copy link
Member

@drakenclimber drakenclimber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a couple small comments and questions, but I think the code looks really good. I'm excited for the performance improvements.

With this large of a change, @pcmoore and I will have to be extra diligent when we validate the syscalls for the next kernel release, but that doesn't seem tremendously onerous.

src/generate_syscalls_perf.sh Outdated Show resolved Hide resolved
src/Makefile.am Show resolved Hide resolved
src/syscalls.csv Outdated Show resolved Hide resolved
tests/56-basic-iterate_syscalls.c Outdated Show resolved Hide resolved
tests/56-basic-iterate_syscalls.c Show resolved Hide resolved
@giuseppe giuseppe changed the title syscalls: use gperf for lookups [WIP] syscalls: use gperf for lookups Mar 10, 2020
@giuseppe giuseppe force-pushed the gperf branch 3 times, most recently from 93c3a77 to 5e6f2eb Compare March 10, 2020 18:41
@pcmoore pcmoore added this to the v2.5 milestone Mar 11, 2020
use gperf to generate a perfect hash to lookup syscall names.   It
improves significantly the complexity for seccomp_syscall_resolve_name.*
since it replaces the expensive strcmp for each syscall in the
database, with a lookup table.

The complexity for syscall_resolve_num is not changed and it
uses the linear search, that is anyway less expensive than
seccomp_syscall_resolve_name.* as it uses an index for comparison
instead of doing a string comparison.

On my machine, calling 1000 seccomp_syscall_resolve_name_arch and
seccomp_syscall_resolve_num_arch over the entire syscalls DB passed
from ~0.45 sec to ~0.06s.

Closes: seccomp#207

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe changed the title [WIP] syscalls: use gperf for lookups syscalls: use gperf for lookups Mar 11, 2020
@rhatdan
Copy link

rhatdan commented Mar 11, 2020

@giuseppe @pcmoore @drakenclimber Is this ready to merge? Would love to get this merged and a new release out, so we could take advantage.

@drakenclimber
Copy link
Member

Awesome! Thanks for all the hard work @giuseppe

Reviewed-by: Tom Hromatka <tom.hromatka@oracle.com>

@pcmoore pcmoore linked an issue Mar 11, 2020 that may be closed by this pull request
@giuseppe
Copy link
Contributor Author

@pcmoore does the last version look good to you?

@pcmoore
Copy link
Member

pcmoore commented Mar 20, 2020

Hi @giuseppe, can you explain the changes to src/arch-syscall-validate?

@pcmoore
Copy link
Member

pcmoore commented Mar 20, 2020

Hi @giuseppe, can you explain the changes to src/arch-syscall-validate?

Actually nevermind; I think I know what you were doing and that is just hiding some underlying problems with these changes and the syscalls table. Let me fixup arch-syscall-validate properly so we can use it to generate the CSV from the kernel sources. We are going to need to do that anyway, and it should avoid having to do what you did to hide problems in the CSV.

@pcmoore
Copy link
Member

pcmoore commented Mar 20, 2020

@giuseppe @drakenclimber I think this is what we need for arch-syscall-validate:

It's really crude but it works (hint: run it with the '-c' option and point it at a recent kernel source tree):

PS1> ./arch-syscall-validate -c ~/sources/linux/linux-linus
# libseccomp syscall table
#
# kernel: 5.6.0-rc6 (Fri, 20 Mar 2020 03:49:05 +0000)
#
#syscall,x86,x86_64,x32,arm,aarch64,mips,mips64,mips64n32,ppc,ppc64,riscv64,s390,s390x
accept,PNR,43,43,285,202,168,42,42,330,330,202,PNR,PNR
accept4,364,288,288,366,242,334,293,297,344,344,242,364,364
access,33,21,21,33,PNR,33,20,20,33,33,PNR,33,33
acct,51,163,163,51,89,51,158,158,51,51,89,51,51
add_key,286,248,248,309,217,280,239,243,269,269,217,278,278
adjtimex,124,159,159,124,171,124,154,154,124,124,171,124,124
...

@giuseppe if that looks okay to you, you can include that in your patchset as a second patch or I can add it when I merge your patch; either approach is fine with me.

@drakenclimber as you mentioned previously, we are going to really need to verify that we haven't screwed up the syscall tables with this change. Thankfully this should be a one-time event, moving forward we can just generate a new CSV file for each release (and give the diff a quick sanity check of course).

@giuseppe
Copy link
Contributor Author

Hi @giuseppe, can you explain the changes to src/arch-syscall-validate?

Actually nevermind; I think I know what you were doing and that is just hiding some underlying problems with these changes and the syscalls table.

I think the issue was in the previous implementation for some archs.

For example s390 had:

int s390_syscall_resolve_name(const char *name)
{
       unsigned int iter;
       const struct arch_syscall_def *table = s390_syscall_table;

       /* XXX - plenty of room for future improvement here */

       if (strcmp(name, "accept") == 0)
               return __PNR_accept;
       if (strcmp(name, "accept4") == 0)
               return __PNR_accept4;
       else if (strcmp(name, "bind") == 0)
               return __PNR_bind;
       else if (strcmp(name, "connect") == 0)
               return __PNR_connect;
       ....

so the value returned by *_resolve_name was different than the value stored into the syscalls table. With the new implementation there is not such per-arch customization and the value returned by *_resolve_name is the same as the value stored in the table.

src/arch-syscall-validate reads directly into the syscalls table and doesn't check for the real value returned by _resolve_name.

@giuseppe
Copy link
Contributor Author


@giuseppe if that looks okay to you, you can include that in your patchset as a second patch or I can add it when I merge your patch; either approach is fine with me.

@pcmoore I am fine with including your patch in the PR, but I think we still need the filters I've added initially to arch-syscall-validate for the comment I've left previously.

Are you fine with something like:

diff --git a/src/arch-syscall-validate b/src/arch-syscall-validate
index 1a508bf..4c21653 100755
--- a/src/arch-syscall-validate
+++ b/src/arch-syscall-validate
@@ -664,15 +664,21 @@ function gen_csv() {
        done
        printf "\n"
        # output the syscall csv details
+       for abi in $abi_list; do
+               dump_sys_$abi $1 > sys_$abi.list
+       done
        for sc in $sc_list; do
                printf "%s" $sc
                for abi in $abi_list; do
-                       num=$(dump_sys_$abi "$1" | grep "^$sc," | awk -F "," '{ print $2 }' )
+                       num=$(cat sys_$abi.list | grep "^$sc," | awk -F "," '{ print $2 }' )
                        [[ -z $num ]] && num="PNR"
                        printf ",%s" $num
                done
                printf "\n"
        done
+       for abi in $abi_list; do
+               rm sys_$abi.list
+       done
 }

on top of your patch?

With the patch above arch-syscall-validate -c takes ~14 seconds on my system, which I think is acceptable.

@pcmoore
Copy link
Member

pcmoore commented Mar 20, 2020

I think the issue was in the previous implementation for some archs.

Sorta, there is an issue in s390 and s390x where we aren't handling the syscall lookup correctly, see the discussion in #215 for more information.

This of course is a reminder that we really need to be able to specify ABI specific lookup functions, e.g. x86_syscall_resolve_name() and x86_syscall_resolve_num(). The default raw hash table lookup function should work for most of the ABIs, but we will need special handling for some like x86, s390, and s390x.

@giuseppe do you mind if I spend some time hacking on your patch/PR? I'm not sure what my weekend is looking like (COVID-19 has introduced a lot of uncertainty in the world), but I may have some time to work on this if that is okay with you. Of course if you would prefer to fix it yourself, please do!

@pcmoore
Copy link
Member

pcmoore commented Mar 20, 2020

@pcmoore I am fine with including your patch in the PR, but I think we still need the filters I've added ...

See my previous comment, those filters are hiding a problem; we do not want them.

As I was reworking that patch (you had the right idea with only dumping the ABI lists once, but we really should use mktemp files like we do in other places) I realized that there is no reason we couldn't merge this now, independent of this PR. I'm going to do a test build and if all looks good I'll merge it "soon".

@drakenclimber I'm going to be forgiveness on this one and merge the arch-syscall-validate without the proper two ACKs. My reasoning is that we are the only ones that use this tool (and it is potentially blocking this PR) so if I mess it up too badly the pain will be limited :)

@giuseppe
Copy link
Contributor Author

@giuseppe do you mind if I spend some time hacking on your patch/PR? I'm not sure what my weekend is looking like (COVID-19 has introduced a lot of uncertainty in the world), but I may have some time to work on this if that is okay with you. Of course if you would prefer to fix it yourself, please do!

of course I won't mind that :)

pcmoore added a commit to pcmoore/misc-libseccomp that referenced this pull request Mar 22, 2020
pcmoore added a commit to pcmoore/misc-libseccomp that referenced this pull request Mar 22, 2020
@pcmoore
Copy link
Member

pcmoore commented Mar 22, 2020

Okay, please check out PR #223, that has my changes incorporated with @giuseppe's changes.

@giuseppe
Copy link
Contributor Author

closing in favor of #223

@giuseppe giuseppe closed this Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFE: move arch/ABI syscall mapping tables into a common location
6 participants