Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pcp.env: _get_pids_by_name returns all pmcd pids on machine with containers #101

Open
test-account-0 opened this issue Aug 2, 2016 · 6 comments

Comments

@test-account-0
Copy link
Contributor

I'm using pmcd on machines with containers which also have pmcd. Theese container are based on openvz and lxc.

With the current version of /etc/pcp.env I cannot start, stop, restart et cetera pmcd (and probably other pcp services).

As a workaround I have changed function _get_pids_by_name from:

_get_pids_by_name()
{
    if [ $# -ne 1 ]
    then
        echo "Usage: _get_pids_by_name process-name" >&2
        exit 1
    fi

    # Algorithm ... all ps(1) variants have a time of the form MM:SS
    # or HH:MM:SS or HH:MM.SS before the psargs field, so we're using
    # this as the search anchor.
    #
    # Matches with $1 (process-name) occur if the first psarg is $1
    # or ends in /$1 or starts ($1) (blame Mac OS X for the last one)
    # ... the matching uses sed's regular expressions, so passing a
    # regex into $1 will work.

    $PCP_PS_PROG $PCP_PS_ALL_FLAGS \
    | sed -n \
        -e 's/$/ /' \
        -e 's/[         ][      ]*/ /g' \
        -e 's/^ //' \
        -e 's/^[^ ]* //' \
        -e "/[0-9][:\.][0-9][0-9]  *[^ ]*\/$1 /s/ .*//p" \
        -e "/[0-9][:\.][0-9][0-9]  *$1 /s/ .*//p" \
        -e "/[0-9][:\.][0-9][0-9]  *($1)/s/ .*//p" \
    # end
}

to:

_get_pids_by_name()
{
    if [ $# -ne 1 ]
    then
        echo "Usage: _get_pids_by_name process-name" >&2
        exit 1
    fi

    # Algorithm ... all ps(1) variants have a time of the form MM:SS
    # or HH:MM:SS or HH:MM.SS before the psargs field, so we're using
    # this as the search anchor.
    #
    # Matches with $1 (process-name) occur if the first psarg is $1
    # or ends in /$1 ... the matching uses sed's regular expressions,
    # so passing a regex into $1 will work.

    pids=$($PCP_PS_PROG $PCP_PS_ALL_FLAGS \
    | sed -n \
        -e 's/$/ /' \
        -e 's/[         ][      ]*/ /g' \
        -e 's/^ //' \
        -e 's/^[^ ]* //' \
        -e "/[0-9][:\.][0-9][0-9]  *[^ ]*\/$1 /s/ .*//p" \
        -e "/[0-9][:\.][0-9][0-9]  *$1 /s/ .*//p")

    if [ -n "$pids" ]; then
        for pid in $pids; do
            ppid=$($PCP_PS_PROG -p "$pid" -o ppid --no-headers | awk '{print $1}')
            if [ "$ppid" = 1 ]; then
                echo "$pid"
            fi
        done
    fi
    unset pids
}

It echoes only those pids that a parent is pid 1.
I'm not sure if it is good/compatilble enough to make a pull request.
It is working on wheezy, jessie, precise, trusty and xenial.

@fche
Copy link
Contributor

fche commented Aug 2, 2016

Can you explain why you're running pmcd servers inside multiple containers? One running on the host should be able to give intra-container data for those other systems (before too long, if not already).

@test-account-0
Copy link
Contributor Author

Because I want to monitor metrics from pcp with a nagios script. These are not docker containers but kind of "normal machines" (one can ssh to them, install stuff and such). I already have automation to set up services and monitoring per host.
I haven't yet checked monitoring containers from outside with pcp, but it would look weird in nagios - all checks on one machine for all containers on this machine instead of on a specific container. And also it is a lot of work to change/write new puppet modules to make it happen automatically.

Apart from that, why the function _get_pids_by_name gets all pids of a specific program and not only those that a parent is pid 1? Is it used not only in init scripts?

@natoscott
Copy link
Member

Apart from that, why the function _get_pids_by_name gets all pids of a specific program
and not only those that a parent is pid 1? Is it used not only in init scripts?

Its used almost exclusively from the init scripts; there's a corner case in that its used from a number of the test scripts. However, that's something that could be resolved (e.g. via separate shell function), and its arguably desirable to restrict the set of processes the init scripts lookup in this way too, as often they are being considered as targets for kill(1) - IOW, maybe this change would be beneficial anyway.

@kmcdonell are there any other situations you know of where _get_pids_by_name() might not be looking for init-parented processes?

@test-account-0
Copy link
Contributor Author

gentle ping

@kmcdonell
Copy link
Member

Not quite 4 years late ... oops (looking at the issues in reverse order for a change).
All of the pmcd, pmproxy, pmie and pmlogger processes will be parented by init/systemd and that's most of the use cases.
But there are some other nasty ones that make this a non-starter, or at least requires a clever plan ([tm] Black Adder):

  1. pmsignal
  2. pmcd's rc script when looking for PMDAs
  3. _pmda_cull in pmdaproc.sh, again looking for PMDAs
  4. pmlogger_daily checks for sh(1) (bizarre concurrent execution check)

@andreasgerstmayr
Copy link
Member

hm, I just ran into this one too with the PCP container (https://src.fedoraproject.org/container/pcp) - once I start the container, I cannot (re)start pmproxy on the host system, because the host system sees two pmproxies running. Restarting pmcd works fine however.

natoscott added a commit to natoscott/pcp that referenced this issue Sep 7, 2022
…8f460da..65fc7b81f3

65fc7b81f3 Release 0.8.1
a63aa04fed Fix async api issues (performancecopilot#107)
a439a20f4f No command retries due to CROSSSLOT
b96d43fd95 Connect using redisConnectWithOptions() (performancecopilot#103)
c47e8a9cf3 Update hiredis version in build examples
0a3b20f691 Update Makefile
4a581e91e4 Use common build warnings
08ffdd30f0 Remove usage of hiredis internal flags (performancecopilot#101)
f3091fef52 Release 0.8.0 (performancecopilot#99)
0e741c6dd8 Add Redis compatibility testing to CI (performancecopilot#97)
1aa93a0862 Add crude support for BITFIELD and BITFIELD_RO (performancecopilot#96)
295bf3c81e Add async transaction tests
6c0aecfcf1 Deprecate non-block options which have no effect (performancecopilot#89)
16ec08bb8b Accept multiple field and value arguments in HSET (performancecopilot#86)
ff76aac0f5 Timeout tests and corrections (performancecopilot#84)
7c39940fa9 Move SSL support to own library (performancecopilot#80)
573c1006f9 README updates (performancecopilot#81)
e642e42df1 Add windows and macOS builds to CI (performancecopilot#76)
5c9e294f75 tests: fix error handling in clusterclient_reconnect_async (performancecopilot#70)
4a69cb65d1 tests: add reconnect test (performancecopilot#68)
512a790dad reset cluster context errors in redisClusterAsyncFormattedCommandToNode

git-subtree-dir: vendor/github.com/Nordix/hiredis-cluster
git-subtree-split: 65fc7b81f31389b878a669710bac9d7042c2404b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants