-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/cgo: pthread_create failed: Operation not permitted on glibc 2.34 #6238
Comments
Thanks for the report! I can confirm this reproduces as you described. It also does appear to be related to seccomp. Briefly, you should see something like this logged on startup:
The default policy for linux/amd64 is defined here: https://github.com/elastic/beats/blob/7e4affb6702a39f378170b8866f966d1b29e06fc/libbeat/common/seccomp/policy_linux_amd64.go#L25. This policy can be redefined via the apm-server.yml, adding example config
You may also disable seccomp altogether with |
Thanks for the answer, your workaround seems to work fine indeed! I'll open an issue in the beats repository, mentioning this one. Should we keep this issue open until apm-server has updated its dependency to beats? |
Yes, let's keep this open until the dependency update. Thanks again! |
clone3 is a linux syscall that is now used by glibc starting version 2.34. It is used when pthread_create() gets called. Current seccomp filters do not allow this syscall leading to crashes like runtime/cgo: pthread_create failed: Operation not permitted See elastic/apm-server#6238 for more details
clone3 is a linux syscall that is now used by glibc starting version 2.34. It is used when pthread_create() gets called. Current seccomp filters do not allow this syscall leading to crashes like runtime/cgo: pthread_create failed: Operation not permitted See elastic/apm-server#6238 for more details
clone3 is a linux syscall that is now used by glibc starting version 2.34. It is used when pthread_create() gets called. Current seccomp filters do not allow this syscall leading to crashes like runtime/cgo: pthread_create failed: Operation not permitted See elastic/apm-server#6238 for more details (cherry picked from commit 82507fd)
clone3 is a linux syscall that is now used by glibc starting version 2.34. It is used when pthread_create() gets called. Current seccomp filters do not allow this syscall leading to crashes like runtime/cgo: pthread_create failed: Operation not permitted See elastic/apm-server#6238 for more details (cherry picked from commit 82507fd)
clone3 is a linux syscall that is now used by glibc starting version 2.34. It is used when pthread_create() gets called. Current seccomp filters do not allow this syscall leading to crashes like runtime/cgo: pthread_create failed: Operation not permitted See elastic/apm-server#6238 for more details
* singleton sysinfo host to avoid frequently collecting host info * add Host object to Stats object * update changelog * set procStats.host to nil if any error calling sysinfo.Host() * Update aws-lambda-go library version to 1.13.3 (#28236) * [cloud][docker] use the private docker namespace (#28286) * [7.x] [DOCS] Update api_key example on elasticsearch output (#28288) * packetbeat/protos/dns: don't render missing A and AAAA addresses from truncated records (#28297) * seccomp: allow clone3 syscall for x86 (#28117) clone3 is a linux syscall that is now used by glibc starting version 2.34. It is used when pthread_create() gets called. Current seccomp filters do not allow this syscall leading to crashes like runtime/cgo: pthread_create failed: Operation not permitted See elastic/apm-server#6238 for more details * Osquerybeat: Improve handling of osquery.autoload file, allow customizations (#28289) Previously the osquery.autoload file was overwritten every time on osquerybeat start and stamped with our extension. After the change we check the content of the file and do not overwrite it on each osquerybeat start. This allows the user to deploy their own extensions if their want and start osquery with that. * Osquerybeat: Runner and Fetcher unit tests (#28290) * Runner and Fetcher unit tests * Fix header formatting * Tweak test * Update go release version 1.17.1 (#27543) * format of conditional build tags has changed * matching of * in regexes was fixed, thus breaking some of our code: golang/go#46123 * iproute package was missing from the new Golang Docker image, thus, we had to add it for our tests * go.mod file contains separate require directive for transitive dependencies * Move labels and annotations under kubernetes.namespace. (#27917) * Move labels and annotations under kubernetes.namespace. * Remove GCP support from Functionbeat (#28253) * Fix build tags for Go 1.17 (#28338) * [Elastic Agent] Add ability to communicate with Kibana through service token (#28096) * Add ability to communicate with Kibana through service token. Add ability to pass service token to container subcommand. * Add changelog entry. * Fix go fmt. * Add username to ASA Security negotiation log (#26975) * Add username to ASA Security negotiation log I added the username user.name field to ASA Security negotiation log line. * adding support for both formats * adding changelog entry * updating geo fields in expected output files * reverse formatting * reverting to older version of file * reverting formatting again * regenrate golden files again * remove formatting, ready for review * fixing missing message due to no newline * fix dissect pattern to fit correctly Co-authored-by: Marius Iversen <marius.iversen@elastic.co> * x-pack/filebeat/module/cisco: loosen time parsing and add group and session type capture (#28325) * Redis: remove deprecated fields (#28246) * Redis: remove deprecated fields * Disable generator tests temporarily (#28362) * Windows/perfmon metricset - remove deprecated perfmon.counters configuration (#28282) * remove deprecated config * changelog * [Filebeat] - S3 Input - Add support for only iterating/accessing only… (#28252) * [Filebeat] - S3 Input - Add support for only iterating/accessing only specific folders or datapaths * Breaking change for 8.0, namespace_annotations replaced by namespace.annotations (#28230) * Breaking change for 8.0, namespace_annotations replaced by namespace.annotations * Take care of namespace being nil * [Heartbeat] Setuid to regular user / lower capabilities when possible (#27878) partial fix for #27648 , this PR: Detects if the user is running as root then: Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present Attempts to Setuid , Setgid and Setgroups to that user / groups Invokes setcap to drop all privileges except NET_RAW+ep This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that. With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root. * mage fmt Co-authored-by: kaiyan-sheng <kaiyan.sheng@elastic.co> Co-authored-by: Victor Martinez <victormartinezrubio@gmail.com> Co-authored-by: Ugo Sangiorgi <ugo.sangiorgi@elastic.co> Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com> Co-authored-by: Arnaud Lefebvre <a.lefebvre@outlook.fr> Co-authored-by: Aleksandr Maus <aleksandr.maus@elastic.co> Co-authored-by: apmmachine <58790750+apmmachine@users.noreply.github.com> Co-authored-by: Michael Katsoulis <michaelkatsoulis88@gmail.com> Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com> Co-authored-by: Blake Rouse <blake.rouse@elastic.co> Co-authored-by: LaZyDK <dennisperto@gmail.com> Co-authored-by: Marius Iversen <marius.iversen@elastic.co> Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co> Co-authored-by: Mariana Dima <mariana@elastic.co> Co-authored-by: Andrew Cholakian <andrew@andrewvc.com>
Fixed in elastic/beats#28117 |
clone3 is a linux syscall that is now used by glibc starting version 2.34. It is used when pthread_create() gets called. Current seccomp filters do not allow this syscall leading to crashes like runtime/cgo: pthread_create failed: Operation not permitted See elastic/apm-server#6238 for more details
* singleton sysinfo host to avoid frequently collecting host info * add Host object to Stats object * update changelog * set procStats.host to nil if any error calling sysinfo.Host() * Update aws-lambda-go library version to 1.13.3 (elastic#28236) * [cloud][docker] use the private docker namespace (elastic#28286) * [7.x] [DOCS] Update api_key example on elasticsearch output (elastic#28288) * packetbeat/protos/dns: don't render missing A and AAAA addresses from truncated records (elastic#28297) * seccomp: allow clone3 syscall for x86 (elastic#28117) clone3 is a linux syscall that is now used by glibc starting version 2.34. It is used when pthread_create() gets called. Current seccomp filters do not allow this syscall leading to crashes like runtime/cgo: pthread_create failed: Operation not permitted See elastic/apm-server#6238 for more details * Osquerybeat: Improve handling of osquery.autoload file, allow customizations (elastic#28289) Previously the osquery.autoload file was overwritten every time on osquerybeat start and stamped with our extension. After the change we check the content of the file and do not overwrite it on each osquerybeat start. This allows the user to deploy their own extensions if their want and start osquery with that. * Osquerybeat: Runner and Fetcher unit tests (elastic#28290) * Runner and Fetcher unit tests * Fix header formatting * Tweak test * Update go release version 1.17.1 (elastic#27543) * format of conditional build tags has changed * matching of * in regexes was fixed, thus breaking some of our code: golang/go#46123 * iproute package was missing from the new Golang Docker image, thus, we had to add it for our tests * go.mod file contains separate require directive for transitive dependencies * Move labels and annotations under kubernetes.namespace. (elastic#27917) * Move labels and annotations under kubernetes.namespace. * Remove GCP support from Functionbeat (elastic#28253) * Fix build tags for Go 1.17 (elastic#28338) * [Elastic Agent] Add ability to communicate with Kibana through service token (elastic#28096) * Add ability to communicate with Kibana through service token. Add ability to pass service token to container subcommand. * Add changelog entry. * Fix go fmt. * Add username to ASA Security negotiation log (elastic#26975) * Add username to ASA Security negotiation log I added the username user.name field to ASA Security negotiation log line. * adding support for both formats * adding changelog entry * updating geo fields in expected output files * reverse formatting * reverting to older version of file * reverting formatting again * regenrate golden files again * remove formatting, ready for review * fixing missing message due to no newline * fix dissect pattern to fit correctly Co-authored-by: Marius Iversen <marius.iversen@elastic.co> * x-pack/filebeat/module/cisco: loosen time parsing and add group and session type capture (elastic#28325) * Redis: remove deprecated fields (elastic#28246) * Redis: remove deprecated fields * Disable generator tests temporarily (elastic#28362) * Windows/perfmon metricset - remove deprecated perfmon.counters configuration (elastic#28282) * remove deprecated config * changelog * [Filebeat] - S3 Input - Add support for only iterating/accessing only… (elastic#28252) * [Filebeat] - S3 Input - Add support for only iterating/accessing only specific folders or datapaths * Breaking change for 8.0, namespace_annotations replaced by namespace.annotations (elastic#28230) * Breaking change for 8.0, namespace_annotations replaced by namespace.annotations * Take care of namespace being nil * [Heartbeat] Setuid to regular user / lower capabilities when possible (elastic#27878) partial fix for elastic#27648 , this PR: Detects if the user is running as root then: Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present Attempts to Setuid , Setgid and Setgroups to that user / groups Invokes setcap to drop all privileges except NET_RAW+ep This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that. With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root. * mage fmt Co-authored-by: kaiyan-sheng <kaiyan.sheng@elastic.co> Co-authored-by: Victor Martinez <victormartinezrubio@gmail.com> Co-authored-by: Ugo Sangiorgi <ugo.sangiorgi@elastic.co> Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com> Co-authored-by: Arnaud Lefebvre <a.lefebvre@outlook.fr> Co-authored-by: Aleksandr Maus <aleksandr.maus@elastic.co> Co-authored-by: apmmachine <58790750+apmmachine@users.noreply.github.com> Co-authored-by: Michael Katsoulis <michaelkatsoulis88@gmail.com> Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com> Co-authored-by: Blake Rouse <blake.rouse@elastic.co> Co-authored-by: LaZyDK <dennisperto@gmail.com> Co-authored-by: Marius Iversen <marius.iversen@elastic.co> Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co> Co-authored-by: Mariana Dima <mariana@elastic.co> Co-authored-by: Andrew Cholakian <andrew@andrewvc.com>
clone3 is a linux syscall that is now used by glibc starting version 2.34. It is used when pthread_create() gets called. Current seccomp filters do not allow this syscall leading to crashes like runtime/cgo: pthread_create failed: Operation not permitted See elastic/apm-server#6238 for more details (cherry picked from commit 82507fd)
* seccomp: allow clone3 syscall for x86 (#28117) clone3 is a linux syscall that is now used by glibc starting version 2.34. It is used when pthread_create() gets called. Current seccomp filters do not allow this syscall leading to crashes like runtime/cgo: pthread_create failed: Operation not permitted See elastic/apm-server#6238 for more details (cherry picked from commit 82507fd) Co-authored-by: Arnaud Lefebvre <a.lefebvre@outlook.fr> Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>
APM Server version (
apm-server version
):apm-server version 8.0.0 (amd64), libbeat 8.0.0 [unknown built unknown] (current master but the issue also applies to previous apm-server version)
Description of the problem including expected versus actual behavior:
Since we upgraded to glibc 2.34, the apm-server crashes quite rapidly with the following error:
Steps to reproduce:
Please include a minimal but complete recreation of the problem,
including server configuration, agent(s) used, etc. The easier you make it
for us to reproduce it, the more likely that somebody will take the time to
look at it.
go build
the project (already built binary might as well have the bug)./apm-server -c <configuration file>
, the configuration file must point to a working Elasticsearch node / clusterMy apm-server.yml file looks like:
Provide logs (if relevant):
Crash logs
APM logs
GDB backtrace
** Additional information **
This issue started to happen as soon as we switched to glibc 2.34. It did not happen on glibc 2.33. Downgrading glibc from 2.34 to 2.33 resolves the issue, with the exact same binaries. This problem might as well be a golang bug but I'm not really sure and since I don't see any issue on their bug tracker, I'm wondering if we are the only ones hitting this.
Glibc 2.34 started to use the new
clone3
syscall on thread creation (in thepthread_create()
function from my understanding). This syscall sometimes returnsEPERM
leading to the following crash (https://github.com/golang/go/blob/4e308d73ba3610838305997b6f4793c4f4dcfc4e/src/runtime/cgo/gcc_libinit.c#L94). I say "sometimes" because there are other successfulclone3()
calls before.I've seen a few projects that also had issue with this new syscall (Docker, Firefox and Chromium). They all mention issues with their sandbox which is based on seccomp. I see the APM project also depend on https://github.com/elastic/go-seccomp-bpf. I couldn't see it really used or even any syscall filtering on the go-seccomp-bpf project as there are on the previously mentionned projets, but maybe I'm overlooking something?
It's also worth to mention that running as root doesn't fix the
Operation not permitted
error.Let me know if you need any other information, I'm not a go expert at all. Thanks for any help you could provide, I'll keep looking on my side.
The text was updated successfully, but these errors were encountered: