Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Update seccomp filter for Fedora 29 #354

Merged
merged 3 commits into from
Jan 4, 2019

Conversation

droberts195
Copy link
Contributor

Fedora 29 uses different system calls to platforms
we've previously tested on, and hence suffers from
certain functionality failing due to the seccomp
filter.

This commit permits 3 additional system calls:

  1. __NR_gettimeofday
  2. __NR_unlinkat
  3. __NR_getdents64

(It is likely that other Linux distributions using
modern glibc would also hit one or more of these
system calls. Non-fatal problems probably got
progressively worse in the lead up to the fatal
problem that surfaced in Fedora 29.)

Fixes #350

Fedora 29 uses different system calls to platforms
we've previously tested on, and hence suffers from
certain functionality failing due to the seccomp
filter.

This commit permits 3 additional system calls:

1. __NR_gettimeofday
2. __NR_unlinkat
3. __NR_getdents64

(It is likely that other Linux distributions using
modern glibc would also hit one or more of these
system calls.  Non-fatal problems probably got
progressively worse in the lead up to the fatal
problem that surfaced in Fedora 29.)

Fixes elastic#350
@davidkyle
Copy link
Member

davidkyle commented Jan 4, 2019

There is an error in the existing code here

BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, AUDIT_ARCH_X86_64, 0, 5),

That line checks the architecture is x64, the false branch of the jump eq statement should go to line 100 where the call is disallowed.

I think the 5 should be changed to 41 42.

This is impossible to check as we don't build for any other architectures, TBH I'm struggling to remember why it is there in the first place as the code will not run on any other architectures.

@droberts195
Copy link
Contributor Author

I think the 5 should be changed to 41 42.

I think it should be 41 rather than 42, because we want to jump to disallow rather than allow.

But maybe a better solution is to remove that check. Like you say, since the filter is being configured in native code that will only run on x86_64, if the hardware architecture was something different then the program would have bombed out immediately when it was run.

I can see that this check is useful in the Java seccomp code where the calls are being made via dynamic symbol lookups in the OS libraries using JNA, and the program could be running in a JVM on some other hardware architecture. But for native code that's already tied to a specific architecture it seems like a step too far.

@davidkyle
Copy link
Member

Ha I was right the first time there's a lesson about double checking there.

The java seccomp does make that check but I followed the example on the seccomp BPF man page http://man7.org/linux/man-pages/man2/seccomp.2.html. Scroll down to the bottom and you'll see that I was guilty of some copy & paste. I now know where the 5 came from.

               /* [1] Jump forward 5 instructions if architecture does not
                      match 't_arch' */
               BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, t_arch, 0, 5),

But maybe a better solution is to remove that check.

I agree, the check is redundant

Since the filter is being installed by native code
that will only work on one architecture, there is
no need to check the architecture as part of the
filter.
Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@droberts195 droberts195 merged commit d4e74a1 into elastic:master Jan 4, 2019
@droberts195 droberts195 deleted the seccomp_fedora29_changes branch January 4, 2019 15:04
droberts195 added a commit to droberts195/ml-cpp that referenced this pull request Jan 4, 2019
Fedora 29 uses different system calls to platforms
we've previously tested on, and hence suffers from
certain functionality failing due to the seccomp
filter.

This commit permits 3 additional system calls:

1. __NR_gettimeofday
2. __NR_unlinkat
3. __NR_getdents64

(It is likely that other Linux distributions using
modern glibc would also hit one or more of these
system calls.  Non-fatal problems probably got
progressively worse in the lead up to the fatal
problem that surfaced in Fedora 29.)

Fixes elastic#350
Backport of elastic#354
droberts195 added a commit that referenced this pull request Jan 4, 2019
Fedora 29 uses different system calls to platforms
we've previously tested on, and hence suffers from
certain functionality failing due to the seccomp
filter.

This commit permits 3 additional system calls:

1. __NR_gettimeofday
2. __NR_unlinkat
3. __NR_getdents64

(It is likely that other Linux distributions using
modern glibc would also hit one or more of these
system calls.  Non-fatal problems probably got
progressively worse in the lead up to the fatal
problem that surfaced in Fedora 29.)

Fixes #350
Backport of #354
@droberts195
Copy link
Contributor Author

To be clear, the changes in this PR that allow the ML code to work on Fedora 29 are also required for any Linux distribution that uses glibc 2.28.

Fedora 29 uses glibc 2.28: https://fedoraproject.org/wiki/Changes/GLIBC228#Summary

The change in glibc 2.28 that results in the need for getdents64 is https://sourceware.org/git/?p=glibc.git;a=commit;h=298d0e3129c0b5137f4989275b13fe30d0733c4d

@Bobarshad
Copy link

hi,
I am trying to compile for AArch64/A53 here are the error:

g++ -std=gnu++14 -c -o.objs/CSystemCallFilter_Linux.o -g -O3 -Wdisabled-optimization -fstack-protector -fno-math-errno -fno-permissive -Wall -Wcast-align -Wconversion -Wextra -Winit-self -Wparentheses -Wpointer-arith -march=native -Wswitch-enum  -Wno-ctor-dtor-privacy -Wno-deprecated-declarations -Wold-style-cast -fvisibility-inlines-hidden -fPIC  -isystem /home/ngd/project/ml/ml-cpp/3rd_party/include -isystem /usr/local/gcc75/include -DLinux -D_REENTRANT -DNDEBUG -DEXCLUDE_TRACE_LOGGING -D_FORTIFY_SOURCE=2 -DBOOST_ALL_DYN_LINK -DBOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS -I/home/ngd/project/ml/ml-cpp/include -isystem /usr/local/gcc75/include/boost-1_71 CSystemCallFilter_Linux.cc
In file included from CSystemCallFilter_Linux.cc:11:0:
CSystemCallFilter_Linux.cc:61:41: error: '__NR_lstat' was not declared in this scope
     BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_lstat, 36, 0),
                                         ^
CSystemCallFilter_Linux.cc:61:41: note: suggested alternative: '__NR_fstat'
CSystemCallFilter_Linux.cc:65:41: error: '__NR_readlink' was not declared in this scope
     BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_readlink, 32, 0),
                                         ^
CSystemCallFilter_Linux.cc:65:41: note: suggested alternative: '__NR_readlinkat'
CSystemCallFilter_Linux.cc:66:41: error: '__NR_stat' was not declared in this scope
     BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_stat, 31, 0),
                                         ^
CSystemCallFilter_Linux.cc:66:41: note: suggested alternative: '__NR_fstat'
CSystemCallFilter_Linux.cc:68:41: error: '__NR_open' was not declared in this scope
     BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_open, 29, 0),
                                         ^
CSystemCallFilter_Linux.cc:68:41: note: suggested alternative: '__NR_openat'
CSystemCallFilter_Linux.cc:73:41: error: '__NR_dup2' was not declared in this scope
     BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_dup2, 24, 0),
                                         ^
CSystemCallFilter_Linux.cc:73:41: note: suggested alternative: '__NR_dup'
CSystemCallFilter_Linux.cc:74:41: error: '__NR_mkdir' was not declared in this scope
     BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_mkdir, 23, 0), // for forecast temp storage
                                         ^
CSystemCallFilter_Linux.cc:74:41: note: suggested alternative: '__NR_mkdirat'
CSystemCallFilter_Linux.cc:75:41: error: '__NR_rmdir' was not declared in this scope
     BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_rmdir, 22, 0), // for forecast temp storage
                                         ^
CSystemCallFilter_Linux.cc:75:41: note: suggested alternative: '__NR_chdir'
CSystemCallFilter_Linux.cc:77:41: error: '__NR_getdents' was not declared in this scope
     BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_getdents, 20, 0), // for forecast temp storage
                                         ^
CSystemCallFilter_Linux.cc:77:41: note: suggested alternative: '__NR_getdents64'
CSystemCallFilter_Linux.cc:85:41: error: '__NR_unlink' was not declared in this scope
     BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_unlink, 12, 0),
                                         ^
CSystemCallFilter_Linux.cc:85:41: note: suggested alternative: '__NR_unlinkat'
CSystemCallFilter_Linux.cc:86:41: error: '__NR_mknod' was not declared in this scope
     BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_mknod, 11, 0),
                                         ^
CSystemCallFilter_Linux.cc:86:41: note: suggested alternative: '__NR_mknodat'
CSystemCallFilter_Linux.cc:94:41: error: '__NR_access' was not declared in this scope
     BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_access, 3, 0),
                                         ^
CSystemCallFilter_Linux.cc:94:41: note: suggested alternative: '__NR_accept'
/home/ngd/project/ml/ml-cpp/mk/rules.mk:45: recipe for target '.objs/CSystemCallFilter_Linux.o' failed
make[2]: *** [.objs/CSystemCallFilter_Linux.o] Error 1
make[2]: Leaving directory '/home/ngd/project/ml/ml-cpp/lib/seccomp'
/home/ngd/project/ml/ml-cpp/mk/toplevel.mk:26: recipe for target 'all' failed
make[1]: *** [all] Error 1
make[1]: Leaving directory '/home/ngd/project/ml/ml-cpp/lib'
/home/ngd/project/ml/ml-cpp/mk/toplevel.mk:26: recipe for target 'all' failed
make: *** [all] Error 1

could you please help me to solve these errors.

@droberts195
Copy link
Contributor Author

@Hossein-b I think you need the changes from #1132. Hopefully that will be merged to master within a day or two.

@Bobarshad
Copy link

Bobarshad commented Apr 14, 2020

Thanks, it works. Now, I need your help on how to bind this code to Etasticsearch, thank you.

@droberts195
Copy link
Contributor Author

Now, I need your help on how to bind this code to Etasticsearch

@Hossein-b we are still working through this, so the easiest thing would probably be to wait a few weeks until we're finished.

However, if you want to have a go yourself then this is what to do on your aarch64 machine:

  1. Get the latest ml-cpp build system changes from [ML] Add cross compilation support, Docker images and CI for aarch64 #1135
  2. Have a look at Introduce aarch64 packaging elasticsearch#53914 and revert the bits that were related to "ML binaries are not compiled for aarch64, so for now we disable ML on aarch64"
  3. Arrange your ml-cpp and elasticsearch repos with those changes into the following directory layout, with the ml-cpp directory a sub-directory of a directory called elasticsearch-extra that is adjacent to elasticsearch:
elasticsearch
elasticsearch-extra/ml-cpp
  1. Type ./gradlew assemble from within the elasticsearch directory - it should notice that the ml-cpp is available locally and build it instead of downloading a snapshot

I cannot promise this will work, as we are still developing it. Maybe there will be a silly one line typo that stops it working or maybe something harder to fix.

@Bobarshad
Copy link

Thank you @droberts195 for your kind help. Really I have a real aarch64 platform and I don't want to use Docker. Is there any config that I can change it to use real aarch64 platform? Here is the error that I got:

FAILURE: Build failed with an exception.

* What went wrong:
a problem occurred while using Docker from [/usr/bin/docker] yet it is required to run the following tasks: 
:distribution:docker:buildAarch64DockerImage
:distribution:docker:buildAarch64OssDockerImage
:distribution:docker:buildDockerImage
:distribution:docker:buildOssDockerImage
the problem is that Docker exited with exit code [1] with standard error output:
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/version: dial unix /var/run/docker.sock: connect: permission denied
you can address this by attending to the reported issue, or removing the offending tasks from being executed.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

BUILD FAILED in 8m 29s

@droberts195
Copy link
Contributor Author

@Hossein-b try building just the tar distribution. So replace step 4 in my previous instructions with:

cd distribution/archives/linux-aarch64-tar
../../../gradlew assemble

If it works your install bundle will be in distribution/archives/linux-aarch64-tar/build/distributions/elasticsearch-8.0.0-SNAPSHOT-linux-aarch64.tar.gz.

@Bobarshad
Copy link

Thanks, as you know Elasticsearch is written in java and it doesn't matter which platform you are using. My question is: is there any new release of Elasticsearch that ml-cpp is activated for aarch64 platform? in case I don't need to build an Elasticsearch form source.

@droberts195
Copy link
Contributor Author

is there any new release of Elasticsearch that ml-cpp is activated for aarch64 platform?

Not at present. There will be in the future - we are working on it. However, we cannot say exactly which version this will be.

@Bobarshad
Copy link

Hi David,
Thanks. I have built the project successfully and I have the following file:
elasticsearch-8.0.0-SNAPSHOT-linux-aarch64.tar.gz
I just add the following config in "elasticsearch.yml":
node.ml: true
and when run this query:
curl -XGET localhost:9200/_nodes/settings?pretty=true
a part of the results for above query are:

      "roles" : [
        "data",
        "ingest",
        "master",
        "ml",
        "remote_cluster_client",
        "transform"
      ],

is that means after activate a 1 munth trial version I can test machine-learning anomaly detection example?

@droberts195
Copy link
Contributor Author

is that means after activate a 1 munth trial version I can test machine-learning anomaly detection example?

Yes, you should be able to use machine learning for one month on a trial license. Of course, you might find bugs specific to aarch64 because you will be the first person ever to try running an ML job on aarch64. Also, you won't be able to extend the trial as it's not a supported platform, so you have to realise that you're basically doing throwaway testing. If you are interested in running on aarch64 in the long term then it will probably be supported in a future version, but we cannot yet say when.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] process can hang if large forecast job fails to delete temporary storage
3 participants