Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use /proc/[pid]/root when looking at /proc/[pid]/maps paths #2324

Merged
merged 1 commit into from Jul 10, 2019

Conversation

davemarchevsky
Copy link
Collaborator

@davemarchevsky davemarchevsky commented Apr 19, 2019

I recently ran into a problem when trying to attach to a USDT in a shared library. Specifically, the shared library was loaded into the address space of a chrooted process. If I just provide the pid of this process, initializing USDT object fails because all paths in /proc/[pid]/maps for the target are relative to its chroot, while the process I'm trying to attach from is not in a chroot and therefore the paths don't make sense and USDT initialization can't look through shared lib ELFs to find the right probe.

Similarly, if I try to initialize the USDT object with pid and bin_path to the shared lib, initialization succeeds but actually attaching fails: address lookup for the usdt's semaphore in pid's address space fails because /proc/[pid]/maps shared lib path is expected to match provided bin_path exactly, but maps paths are chroot-relative so this doesn't work.

Implementation evolved significantly since this PR was created, look at this comment for details about impl.

@palmtenor
Copy link
Member

[buildbot, ok to test]

@yonghong-song
Copy link
Collaborator

the test failed due to an infra error. Let us try again.

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@yonghong-song
Copy link
Collaborator

@palmtenor any comment for this pull request?

@yonghong-song
Copy link
Collaborator

@vijunag could you take a look as well?

src/cc/usdt/usdt.cc Outdated Show resolved Hide resolved
@yonghong-song
Copy link
Collaborator

@palmtenor what do you think about this approach?

@yonghong-song
Copy link
Collaborator

@davemarchevsky related to MATCH_FILENAME, could you have false positives? Or because this is shared library and we should just proceed with library-name-only comparision even with potentially incorrect user full binary path.

I do not see any issues with MATCH_FILENAME if user only specifies library name like abbr. c or m in command line. But if user specify a full name, we could still compare name only but after this do a partial comparison to the original path and print an error if it does not partial match based on chroot convention?

@yonghong-song
Copy link
Collaborator

Agree with @vijunag, but changing path, with NSGuard, we do risk going into a different mount namespace then intended, so this change may require NSGuard becomes noop, right? I have not done any experiments to show this is possible or not.

@yonghong-song
Copy link
Collaborator

I did a little bit experiment. For USDT, suppose that we have
. process A is running inside a container with process ID 456 with local path
/tmp/a.out and chroot path /var/local/containers/container1/.
. process A is also visible outside the container as process ID 123456.

If we supply the process ID 123456 and shared binary path displayed by /proc/123456/maps,
e.g., /var/local/containers/container1/usr/lib64/libc.so.6, we should be okay for USDT, right?
All the probes and semaphore addresses can be obtained with process ID 123456 and binary path (also in /proc/123456/maps) /var/local/containers/container1/. Traversing /proc/123456/maps will get the shared library address range. Together with semaphore address in ELF file (/var/local/containers/container1/usr/lib64/libc.so.6), we can derive the runtime address.

So we do not need NSGuard at all in such cases? Did I miss anything?

@palmtenor

@davemarchevsky
Copy link
Collaborator Author

@yonghong-song I think you are correct in saying that if we have pid to access /proc/pid/maps and know which shared library path we're trying to match against we should be able to get semaphore address. I'm pushing up a removal of NSGuard later today.

Speaking of "match against":

related to MATCH_FILENAME, could you have false positives?

Definitely. I think the default should be MATCH_EXACT for this reason. One way of making this less likely that I think you're mentioning here:

But if user specify a full name, we could still compare name only but after this do a partial comparison to the original path and print an error if it does not partial match based on chroot convention?

Is that we could add a match like MATCH_SUBPATH (not best name) such that, if you have a library at /var/local/containers/container1/usr/lib64/libwhatever.so and the process you're trying to attach USDT to has chroot at /var/local/containers/container1, you could try to do match like /usr/lib64/libwhatever.so, MATCH_SUBPATH and since the chrooted path is substr of full path match would succeed. I would imagine that this would reduce false positives relative to MATCH_EXACT.

Another thought I had while doing first pass was matching based on (device, inode) but that feels too expensive. Could use this as a backup w/ MATCH_FILENAME to reduce false positive, perhaps.

@yonghong-song
Copy link
Collaborator

yonghong-song commented Apr 23, 2019

Regarding to MATCH_SUBPATH, I am still thinking we can either to full path comparison if user provides one, or bcc just derives the path if user does not supply one. This seems cleaner and less error prone. But I will see your patch first. Could you add some examples to show how it is tested so we can check usability as well? Thanks!

@vijunag
Copy link
Contributor

vijunag commented Apr 24, 2019

@davemarchevsky @yonghong-song - Now is the time to eliminate ProcNsGuard() and any other fix is incomplete. What we need now is PrefixGuard() which will prefix the paths according to the process namespace that way we can keep this infra same across USDT/Symlookups.

@yonghong-song
Copy link
Collaborator

Just realized that /tmp/perf-<pid>.map may need choot path prefix, '/proc//root/tmp/perf-.mapor/tmp/perf-.map`.

@yonghong-song
Copy link
Collaborator

@davemarchevsky Any update on this?

@davemarchevsky
Copy link
Collaborator Author

@yonghong-song @vijunag @palmtenor

I made some changes to incorporate feedback and added some tests. It might be best to consider this an entirely new PR, solving the same problems outlined in my first post. I will edit the first post to make this more clear.

Here's a summary of the changes:

  • Remove ns_guard everywhere as suggested
  • Modify bcc_resolve_global_addr to compare (inode, device_{major,minor}) instead of filename.
    • stat the module path passed in to bcc_resolve_global_addr and use inode, device info from stat struct to match against same data from /proc/PID/maps
      • Comparing stat device nums w/ /proc/PID/maps device numbers doesn't work on btrfs - major/minor numbers from stat differ from those in proc maps. An inode_match_only kludge was added to work around this. Added USDT::set_probe_matching_kludge function to public API, kludge flag needs to propagate USDT -> Context -> Probe unfortunately.
    • Functionality to do filename comparison as a backup is retained: if inode 0 is used to search with _bcc_syms_find_module, it'll fallback to doing name comparison. Can remove this fallback if desired
  • bcc_procutils_each_module's callback changed to add enter_ns, which is 1 when iterating /proc/PID/maps and 0 in the case where perf map might be outside of chroot. This allows callbacks to know whether they should do anything special to paths that bcc_procutils_each_module gives them.
  • Since enter_ns is added, Context::_each_module can do the right thing and prepend /proc/PID/root to paths where enter_ns is set and leave them alone otherwise. This obviates the need for ns_guard.
  • Some refactoring for clarity / testability
  • Added some tests to validate "iterate through /proc/PID/maps" functionality, one test to resolve addr in a dynamically linked lib, and one test to init USDT on a probe defined in a shared lib
    • This required making some minor additions to cmake files, which I was entirely unfamiliar with before starting this work, and thus likely messed something up here.

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@yonghong-song
Copy link
Collaborator

@davemarchevsky there are two test failures,

	 21 - py_uprobes (Failed)
	 32 - py_test_usdt2 (Failed)

They are legitimate failures. could you take a look.
Also, could you rebase on top of the master. There is a conflict when applying the change to the trunk.

@palmtenor if you get some time, could you take a look as well?

@davemarchevsky
Copy link
Collaborator Author

davemarchevsky commented May 28, 2019

(rebased, some small additional changes to address rebase conflict, integrating memfd addition)

still need to unbreak tests

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

1 similar comment
@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

src/cc/CMakeLists.txt Outdated Show resolved Hide resolved
src/cc/api/BPF.cc Show resolved Hide resolved
src/cc/api/BPF.h Show resolved Hide resolved
src/cc/bcc_proc.c Show resolved Hide resolved
src/cc/bcc_proc.c Outdated Show resolved Hide resolved
if (bcc_perf_map_path(map_path, sizeof(map_path), pid)) {
mod.name = map_path;
mod.end_addr = -1;
if (callback(&mod, 1, payload) < 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, bcc_perf_map_path may already deal with /proc/<pid>/root? So callback function does not need enter_ns?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was the case as well, but it's doing readlink on /proc/PID/root and prepending the path of the symlink to the perf map path. Some processes on the hosts I'm working on have /proc/PID/root symlink resolve to /, but the inode is not the same as the actual root inode.

When I tried modifying this line to do callback(&mod, 0, payload), one of the symcache tests in test_c_api broke. Think it's necessary to do enter_ns here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, the original code indeed entered mount ns.

int Context::_each_module(const char *modpath, uint64_t, uint64_t, uint64_t,
                          bool, void *p) {
  Context *ctx = static_cast<Context *>(p);
  // Modules may be reported multiple times if they contain more than one
  // executable region. We are going to parse the ELF on disk anyway, so we
  // don't need these duplicates.
  if (ctx->modules_.insert(modpath).second /*inserted new?*/) {
    ProcMountNSGuard g(ctx->mount_ns_instance_.get());
    bcc_elf_foreach_usdt(modpath, _each_probe, p);
  }
  return 0;
}

If you got some time, could you dig a little more to find out why this is needed? It is puzzling.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @palmtenor who may know the answer as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually tricky. You don't always want to go with proc root for reading perf map, since they are often only in root /tmp. I remember we try both (with either global PID or NS PID), but I need to take a better look at the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palmtenor I think this code is what you are referring to. This PR preserves that functionality.

src/cc/bcc_syms.cc Show resolved Hide resolved
@@ -273,6 +279,10 @@ std::string Context::resolve_bin_path(const std::string &bin_path) {
::free(which_so);
}

if (!result.empty() && pid_) {
result = tfm::format("/proc/%d/root%s", *pid_, result);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this true all the time? Or it is just another try.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context can be initialized without a pid, and separately from that both bcc_procutils_which and bcc_procutils_which_so can fail due to diabolical input. Functions that call Context::resolve_bin_path rely on its return string to be empty as a signal of "I couldn't find a bin path" so we must only prepend /proc/PID/root when there's something to prepend to.

tests/cc/test_c_api.cc Show resolved Hide resolved
tests/cc/usdt_test_lib.c Show resolved Hide resolved
@davemarchevsky
Copy link
Collaborator Author

All ubuntu failures are same:

3: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3: test_libbcc is a Catch v1.4.0 host application.
3: Run with -? for options
3: 
3: -------------------------------------------------------------------------------
3: test find a probe in our process' shared libs with c++ API
3: -------------------------------------------------------------------------------
3: /home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1710/tests/cc/test_usdt_probes.cc:87
3: ...............................................................................
3: 
3: /home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1710/tests/cc/test_usdt_probes.cc:92: FAILED:
3:   REQUIRE( res.msg() == "" )
3: with expansion:
3:   "Unable to find USDT libbcc_test:sample_lib_probe_1 from binary  PID 6709 for
3:   probe on_event"
3:   ==
3:   ""
3: 
3: ===============================================================================
3: test cases:  33 |  32 passed | 1 failed
3: assertions: 562 | 561 passed | 1 failed

fc28 looks more complicated, so going to focus on this first.

@yonghong-song
Copy link
Collaborator

For

3: with expansion:
3:   "Unable to find USDT libbcc_test:sample_lib_probe_1 from binary  PID 6709 for
3:   probe on_event"
3:   ==
3:   ""

This is a test case bug. The probe name should be sample_probe_1, not sample_lib_probe_1.

Could you fix and resubmit the pull request?

@davemarchevsky
Copy link
Collaborator Author

@yonghong-song The sample_lib_probe_1 is a tracepoint I'm adding in this PR here (and associated test to confirm that it works). Want to add this test in because looking for usdt in shared lib is different codepath that was previously untested.

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@davemarchevsky
Copy link
Collaborator Author

Only fc28 tests remain broken. Addressing.

Ubuntu tests were broken because the shared library (w/ USDT) that I added to tests was never actually used in test_libbcc. Ubuntu's gcc decided that it could remove it from the binary's list of shared libs because of this, which is why the test was unable to locate the USDT in the shared lib's dummy function.

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@davemarchevsky
Copy link
Collaborator Author

@yonghong-song It's finally green. The remaining fc28 error was actually indicative of a bug: there are a few places where I prepend /proc/PID/root to the path if the pid is set - by default the pid is set to -1 (e.g. attach_usdt), in this case we shouldn't be prepending /proc/PID/root. perf_event_open was failing for this reason.

@yonghong-song
Copy link
Collaborator

Cools. Thanks! Will do another round of review.
@palmtenor could you also take a look?

Copy link
Member

@palmtenor palmtenor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks @davemarchevsky for the patience!

BTW can you merge all the commits? Ideally do a rebase -i and re-order them to fixup the fixes together with the initial add commit. If it's too much trouble you can just merge them into one.

P.S.: Can you catch me what's with the -1? I got a bit lost there.

@davemarchevsky
Copy link
Collaborator Author

davemarchevsky commented Jul 8, 2019

@palmtenor I squashed down to one commit and pushed. Since the PR got big and messy it was hard to find logical multi-commit split.

FYI there were two REVERT ME commits that added debug prints which I removed.

I'd like to validate that this still fixes the production issue that prompted the PR (since there were lots of minor test-related bugfixes), will update when that's done, day or two at most.

Can you catch me what's with the -1? I got a bit lost there.

A few places in this PR I do a check: "If we have an actual PID, prepend /proc/PID/root to path, otherwise do nothing". The "have an actual PID" check was initially checking if optional wrapper had a value. However, elsewhere in the code -1 is used as a sentinel value for PID when one isn't provided, so we were erroneously prepending /proc/-1/root, which caused a test to break.

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@yonghong-song
Copy link
Collaborator

@davemarchevsky sounds good to me for first verifying in production environment. Thanks!

@davemarchevsky
Copy link
Collaborator Author

New CI system seems broken:

Last log line in recent Travis CI build says: /home/travis/.travis/functions: line 107: secure: unbound variable, which doesn't look like an issue with bcc code to me.

@yonghong-song
Copy link
Collaborator

@davemarchevsky do not need to worry Travis CI run for now. There are a different commit trying to address all failures. For your commit, maybe you can rename .travis.yml to .travis.yml.org to
temporarily disable it since it looks travis CI failure will prevent merging?

@yonghong-song
Copy link
Collaborator

Looks like builtbot has some issues. @drzaeus77 is in the middle of fixing them.

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@palmtenor
Copy link
Member

[buildbot, retest this please]

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@yonghong-song yonghong-song merged commit 01ee0b6 into iovisor:master Jul 10, 2019
dalehamel added a commit to dalehamel/bcc that referenced this pull request Jan 21, 2020
This removes the namespace code from libbpf, as they are no longer necessary
since iovisor#2324 was merged, and have caused regressions on older Kernels that do
not use the new API for creating probes. This also deletes the dead code for
namespace handling in libbpf, as this was the last use of it.

This also introduces regression tests to ensure that processes in containers
can be USDT probed, by adding tests that unshare the mount and process
namespaces.
yonghong-song pushed a commit that referenced this pull request Jan 21, 2020
This removes the namespace code from libbpf, as they are no longer necessary
since #2324 was merged, and have caused regressions on older Kernels that do
not use the new API for creating probes. This also deletes the dead code for
namespace handling in libbpf, as this was the last use of it.

This also introduces regression tests to ensure that processes in containers
can be USDT probed, by adding tests that unshare the mount and process
namespaces.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants