Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for memory-backed file descriptor #24

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

@dalehamel
Copy link

@dalehamel dalehamel commented Apr 15, 2019

This abstracts the creation of the temporary ELF file to allow for two possible approaches:

  • The current (default) approach of allocating a temporary file to /tmp and dlopen'ing it
  • A new (optional) approach, to use a memory-backed file descriptor, if LIBSTAPSDT_MEMORY_BACKED_FD is specified.

The new approach his beneficial for two reasons:

  • There is no need to clean up the file, it is automatically removed when the process dies (the primary reason for this approach).
  • No calls are made to the underlying filesystem, everything is done in memory. While tmpfs is probably already memory-based, this could be advantageous in some circumstances.

libusdt does something similar, where it just valloc's directly into the processes address space, so there is no need for a temporary file to clean up.

Overall, I find that doing tho whole thing in memory and having it be automatically cleaned up with the process's lifecycle is a bit more elegant.

The drawbacks to this approach are:

  • It is not yet supported by bcc, but I've submitted iovisor/bcc#2314 to show how this could be utilized. Support has been added to BCC by iovisor/bcc#2314
  • If a large number of providers are allocated, the process will have a large number of file descriptors open. This is probably not a problem in practice.
  • Support for memfd_create was not added until linux 3.17 (not a problem in practice, as BCC needs 4.1+ to do anything meaningful with eBPF) and glibc 2.27. The system call may not be defined on some systems.

I think the drawbacks are mitigated by hiding it behind a preprocessor macro check, as the behavior remains unchanged. In the future, hopefully once iovisor/bcc#2314 is in a major bcc release, we could remove the macro and default to a memory-based shared object, falling back to temporary file if we detect that memfd_create is not supported.

Attaching a probe (global:helloworld) this way, the resulting process looks like:

/proc/${PID}/maps:

...
7f9e79f0c000-7f9e79f0d000 r-xp 00000000 00:05 11706507                   /memfd:libstapsdt:global (deleted)
7f9e79f0d000-7f9e7a10c000 ---p 00001000 00:05 11706507                   /memfd:libstapsdt:global (deleted)
7f9e7a10c000-7f9e7a10d000 rw-p 00000000 00:05 11706507                   /memfd:libstapsdt:global (deleted)
...

And if we check out the file descriptors for the process:

lrwx------ 1 dale.hamel dale.hamel 64 Apr 14 20:57 7 -> '/memfd:libstapsdt:global (deleted)'

So if we check for elf notes on that fd:

$readelf --notes /proc/${PID}/fd/7

Displaying notes found in: .note.stapsdt
  Owner                 Data size       Description
  stapsdt              0x00000039       NT_STAPSDT (SystemTap probe descriptors)
    Provider: global
    Name: helloword
    Location: 0x0000000000000260, Base: 0x0000000000000318, Semaphore: 0x0000000000000000
    Arguments: 8@%rdi -8@%rsi

And if I run tplist -p

$ tplist -p ${PID} | grep global
/proc/12646/fd/7 global:hello_nsec

Or use my latest branch of bpftrace with wildcard USDT support:

$ bpftrace -l 'usdt:*:global:*' -p ${PID}
usdt:/proc/12646/fd/7:global:helloworld

I can attach to the probe by that path, or by PID (using my bcc branch)

If I disable the provider, the file descriptor is closed and the elf file is removed from the memory map.


// Note that linux must be 3.17 or greater to support this
static inline int memfd_create(const char *name, unsigned int flags) {
return syscall(__NR_memfd_create, name, flags);
Copy link
Author

@dalehamel dalehamel Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probbaly check that __NR_memfd_create is defined or else bail out elegantly.

Loading

@Drieger Drieger requested a review from mmarchini Apr 15, 2019
@dalehamel dalehamel marked this pull request as ready for review Apr 16, 2019
@dalehamel
Copy link
Author

@dalehamel dalehamel commented Apr 16, 2019

I've marked this as ready for review, as the dependent patch has landed in BCC.

Loading

@mmarchini
Copy link
Contributor

@mmarchini mmarchini commented Apr 22, 2019

Awesome! I was not aware of memfd, and this seems like a much better approach compared to creating a file in the filesystem. libusdt doesn't have to write to a file first because Solaris/DTrace have an interface to register User Tracepoints.

IMO memfd should be the default, and we should fall back to tmpfs only if necessary. Also, it might be useful to have a way to choose which filesystem to use during runtime?

Loading

@dalehamel
Copy link
Author

@dalehamel dalehamel commented Apr 22, 2019

@mmarchini great - i'll remove the macro then, and make it the default, falling back only if we can't find the necessary call.

Also, it might be useful to have a way to choose which filesystem to use during runtime?

to do this I'll need to add a new method signature for providerLoad probably, to force using /tmp files. If the original interface is called, it will go through the fallback path.

I'll modify this PR accordingly :)

Loading

@mmarchini
Copy link
Contributor

@mmarchini mmarchini commented Oct 9, 2019

@dalehamel did you had time to look into this?

Loading

@dalehamel
Copy link
Author

@dalehamel dalehamel commented Oct 9, 2019

@mmarchini yeah i should be able to get this PR revived, I got the necessary plumbing added to iovisor/bcc a few months back in iovisor/bcc#2314 but never circled back to this.

One practical issue though is that bpftrace 0.9.2 that ships with ubuntu 19.10 is linked against an older version of bcc prior to my patch for reading elf nodes from a memory backed FD, so bpftrace users won't be able to benefit from this until a newer version is widely available.

For that reason I think both the current temporary file and memory-backed file will need to be supported in tandem for a while

Loading

@javierhonduco
Copy link

@javierhonduco javierhonduco commented Dec 31, 2019

I've tried this PR on our internal Python wrapper and seems to work fine, It passes all our integration tests! 🎉

I was curious about performance and it seems to be the same as before. Intuitively I thought the performance would be better now. Just in case, will double check the simple benchmarks I wrote in the next couple of days

Loading

@dalehamel dalehamel force-pushed the memfd-dlopen branch 3 times, most recently from e005c0e to c752b00 Feb 3, 2020
@dalehamel
Copy link
Author

@dalehamel dalehamel commented Feb 3, 2020

@mmarchini ok sorry for the super long delay here, I have made the following changes:

  • The tests are run on both trusty and bionic. Trusty doesn't support memfd, so uses the old codepath (glibc too old). bionic does, and tests the new codepath.
  • Support for memfd is detected automatically at compile time, if available it's used by default
  • A new function is added to the public interface, to allow toggling the behavior of using memfd or no, on systems that support it.

I also found some memory leaks that I think I missed on the last pass / weren't running in the test suite before. I've fixed those - the valgrind test is really helpful :)

Can you please take another look?

Loading

This Makes the default behavior of libstapsdt to try and use a memory-backed
file descriptor for systems that support this syscall.

At runtime, this behavior can be changed with the function providerUseMemfd
src/libstapsdt.h Outdated
runtime by calling this with use_memfd = 0 prior after providerInit, and before
providerLoad.
*/
int providerUseMemfd(SDTProvider_t *provider, const int use_memfd);
Copy link
Author

@dalehamel dalehamel Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it is pure C, we can't overload the providerInit function like I'd normally prefer to, so I added a helper instead.

Loading

@@ -11,7 +18,18 @@ int testElfCreationError() {
int testTmpCreationError() {
SDTProvider_t *provider;
SDTError_t errno;
#ifdef HAVE_LIBSTAPSDT_MEMORY_BACKED_FD
char *providerLongName = "test/probe/creation/error/with/a/name/longer/than/"
Copy link
Author

@dalehamel dalehamel Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the path name used to fail the test for tempfiles is actually valid for memfd, so we force it to fail by sending it a name that is too long.

Loading

Copy link

@javierhonduco javierhonduco left a comment

Looks good to me!

Loading

}

snprintf(path_buffer, sizeof(path_buffer), "/proc/%d/fd/%d", getpid(), *fd);
filename = calloc(sizeof(char), strlen(path_buffer) + 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check if calloc returns NULL? (we may want to also check the other allocation callsites that predate this PR)

Loading

Copy link
Contributor

@mmarchini mmarchini Oct 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idr why I didn't originally check if calloc returns NULL on other locations. We probably should yes, and that can be done in a separate PR.

Loading

return filename;
}
#endif
filename = calloc(sizeof(char), strlen("/tmp/-XXXXXX.so") + strlen(name) + 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Loading

@mmarchini
Copy link
Contributor

@mmarchini mmarchini commented Feb 6, 2020

Great! I might take a few weeks to get to it though. Just got back from vacation and I'm on a few conferences now, so my backlog is quite huge. Will try to prioritize it though, as it's a nice addition :)

Loading

@dalehamel
Copy link
Author

@dalehamel dalehamel commented Feb 6, 2020

Great! I might take a few weeks to get to it though. Just got back from vacation and I'm on a few conferences now, so my backlog is quite huge

No worries @mmarchini hope you had a restful vacation!

Loading

@dalehamel
Copy link
Author

@dalehamel dalehamel commented Jun 26, 2020

Hi @mmarchini a friendly ping, in case this has fallen off your radar and you can find some time to take a look

Loading

@dalehamel dalehamel changed the title Support for memory-backed file description Support for memory-backed file descriptor Jun 27, 2020
@javierhonduco
Copy link

@javierhonduco javierhonduco commented Sep 24, 2021

Hi! It would be great if this PR could be merged. At my workplace we currently have this custom patch applied but it would be great if we don't depend on custom patches 😄

Loading

Copy link
Contributor

@mmarchini mmarchini left a comment

LGTM with the enum added. I also think the file opening/closing code should be moved to a separate file, but we can do that in a separate PR.

I think there is a bug if the user tries to start two providers with the same name (since they'll have the same filename), so we might want to handle that too (again, can be done in a follow up).

I forked the project, it's now located at https://github.com/linux-usdt/libstapsdt and I'll be continuing development there. Feel free to open the PR there if you're still interested in working on it. If not, I'm happy to make the enum changes and cherry-pick the commits.

Last but not least, I do wonder if this should result in libstapsdt2, because it's technically a breaking change (more from the POV of the consumers than the programs running it though, since now older bcc/bpftrace versions won't be able to load the probes).

Loading

src/libstapsdt.h Outdated
/*
Linux newer than 3.17 with libc that supports the syscall it will default to
using a memory-backed file descriptor. This behavior can be overridden at
runtime by calling this with use_memfd = 0 prior after providerInit, and before
providerLoad.
*/
Copy link
Contributor

@mmarchini mmarchini Oct 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's provide an enum here instead of 0/1, so it's less confusing for users.

Loading

Copy link
Author

@dalehamel dalehamel Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I've addressed this in dalehamel@4c62f85 please take a look and let me know what you think

Loading

src/libstapsdt.c Outdated

#ifdef __NR_memfd_create // older glibc may not have this syscall defined
#define HAVE_LIBSTAPSDT_MEMORY_BACKED_FD
#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
Copy link
Contributor

@mmarchini mmarchini Oct 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we #include <linux/fcntl.h> instead (see this SO answer)? Or are there any caveats of doing that?

Loading

Copy link
Author

@dalehamel dalehamel Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I've made the necessary change in dalehamel@c4248d0

Loading

}

snprintf(path_buffer, sizeof(path_buffer), "/proc/%d/fd/%d", getpid(), *fd);
filename = calloc(sizeof(char), strlen(path_buffer) + 1);
Copy link
Contributor

@mmarchini mmarchini Oct 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idr why I didn't originally check if calloc returns NULL on other locations. We probably should yes, and that can be done in a separate PR.

Loading

@dalehamel
Copy link
Author

@dalehamel dalehamel commented Oct 18, 2021

hi @mmarchini i'll take a look at this shortly, need to refresh my own memory on this old PR but thank you for all your feedback.

Loading

dalehamel added 2 commits Oct 18, 2021
This introduces a new enumerated type, MemFD_Option_t, which is used in lieu
of the previous convention of using just an integer type.

All references to these values have been updated, and the symbolic values of
"memfd_enabled" is used in place of the literal "1", and "memfd_disabled" is
used in place of the literel "0".
This avoids duplicating the definition of the constant F_SEAL_SEAL, and
allows relying on an existing standard linux header for it.
@dalehamel
Copy link
Author

@dalehamel dalehamel commented Oct 18, 2021

@mmarchini I believe I've addressed your two points, please take a quick look at the two patches and then I'll happily re-file the PR against the new repo, otherwise feel free to cherry pick the changes in.

Last but not least, I do wonder if this should result in libstapsdt2, because it's technically a breaking change (more from the POV of the consumers than the programs running it though, since now older bcc/bpftrace versions won't be able to load the probes).

Yeah I suppose that is true, but I would guess if they are running an up-to-date libstapsdt, they probably have also updated their bcc since the change to support this has landed. I think that this could justify a new semantic version of the package regardless, though I'm not sure what the current practice for versioning this library is.

Loading

@mmarchini
Copy link
Contributor

@mmarchini mmarchini commented Oct 18, 2021

@dalehamel thank you! The patches lgtm. I'll cherry-pick, no need to open a new PR (I'll also rename MemFD_Option_t to MemFDOption_t for consistency while cherry picking).

Loading

@mmarchini
Copy link
Contributor

@mmarchini mmarchini commented Oct 25, 2021

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants