Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide safe mechanism for migrating archives to an alternative storage area #1613

Closed
kmcdonell opened this issue Jun 21, 2022 · 3 comments
Closed
Assignees

Comments

@kmcdonell
Copy link
Member

kmcdonell commented Jun 21, 2022

The issue has arisen in the context of performance analysis in computing clusters (e.g. XDMoD), although the issue is more general than just computing clusters.

In these environments there are 1,000s or 10,000s of compute nodes, each with their own pmcd-pmlogger pair archiving performance data. Consolidation of the archives for analysis is achieved using some cluster-wide parallel filesystem, like GPFS or Lustre or Panasas or (in the distant past) CXFS. What all of the parallel filesystems have in common is a distinct performance hit associated with concurrent operations in the same directory and directories with large numbers of entries.

To overcome this, the XDMoD folk suggest a path for the archives on each node (as specified in the /etc/pcp/pmlogger/control* file) along the lines of:

/gpfs/data/root/pcp/\$(date +%Y)/\$(date +%m)/LOCALHOSTNAME/\$(date +%Y)-\$(date +%m)-\$(date +%d)

This has worked in the past (they are still on PCP 4.3.2) because pmlogger_daily launched a new pmlogger each night and so the directory path changed each day. But it will not work with any PCP version that includes the "reexec" feature for pmlogger, as this does not re-evaluate the path by default (the expansion of the $(date ...) parts is done in pmlogger_daily, but unless pmlogger_daily is run with a -z option, if pmlogger is already running it is sent a SIGUSR2 rather than being killed off and restarted).

So these users would be forced to edit the pmlogger_daily.service file to add "-z" to PMLOGGER_DAILY_PARAMS to maintain the old semantics, or we could devise a better scheme.

Note that this is a special case of a more generic requirement, namely once PCP has finished creating an archive in the infrastructure supported by pmlogger_check and pmlogger_daily (the scripts behind the PCP systemd bits), it would be helpful to provide some safe mechanism to migrate those archives off to some other storage place (cloud, off-line, ...) or allow data reduction (pmlogextract, pmlogreduce or something else) or report generation or ...

The proposal is:

  1. Augment the pmlogger "control" files to support some additional options via the existing $name=value mechanism (there are no command line equivalents for these new options).
  2. $PCP_DAILY_SAVE_DIR - directory into which archives are moved once the daily combination of any component archives with pmlogextract has been done and possibly compression (if -x 0 or $PCP_COMPRESSAFTER=0) is done ... so at most 1 archive per day added to this directory. By "move" the archives we mean a paranoid checksum-cp-checksum-rm that will bail if the cp fails or the checksums do not match (the archives are important so we cannot risk something like a full filesystem or a permissions issue messing with the copy process). If pmlogger_daily is in "catch up" mode (more than one day's worth of archives need to be combined) then the archives for more than one day could be copied in this step.
  3. $PCP_DAILY_CALLBACK - a user-provided script that is called once the daily processing (as in 2.) has been done. The script would be called once with the basename of the archive as an argument for each archive that was a candidate for copying in 2.
  4. If both $PCP_DAILY_SAVE_DIR and $PCP_DAILY_CALLBACK are set, the archive move happens first, and of this succeeds then the callback(s) are made.

So for the XDMoD example if the pmlogger processes need to create their archives in /gpfs this would require a control file stanza like:

$PCP_DAILY_SAVE_DIR=/gpfs/data/root/pcp/\$(date +%Y)/\$(date +%m)/LOCALHOSTNAME/\$(date +%Y)-\$(date +%m)-\$(date +%d)
LOCALHOSTNAME	y   n	/gpfs/data/root/pcp/LOCALHOSTNAME -r -c /etc/pcp/pmlogger/pmlogger-supremm.config

Otherwise the archives could be created in each node's local filesystem and moved to /gpfs after the daily combination with a control file stanza like:

$PCP_DAILY_SAVE_DIR=/gpfs/data/root/pcp/\$(date +%Y)/\$(date +%m)/LOCALHOSTNAME/\$(date +%Y)-\$(date +%m)-\$(date +%d)
LOCALHOSTNAME	y   n	PCP_ARCHIVE_DIR/LOCALHOSTNAME -r -c /etc/pcp/pmlogger/pmlogger-supremm.config

All of this would be done holding the PCP archive per-directory lock, so it would be protected from any concurrent pmlogger_check activity.

@kmcdonell kmcdonell self-assigned this Jun 21, 2022
@natoscott
Copy link
Member

Hi @kmcdonell - thats some good sleuthing and the proposed solution seems to me a fair approach to take.

I have another consideration that would be good to think about in this context, and a very similar issue came up in unrelated discussions this past week. When pcp-zeroconf was initially added a script - pmlogger_daily_report(1) - was included, with the intention of generating daily summaries. The implementation turned out to interact extremely badly with daily rotation, such that its not disabled by default (but still present and problematic).

IIRC the problem with it is that it runs (too-long) after daily log merging, compression and rotation. It runs a series of pmrep commands on the recently-rotated daily log. I was on a call with another team wanting the same functionality - generating daily log summaries to send back-to-base later.

The problem with our current setup is that these scripts have no knowledge of rotation - they wait until "some time later", and then run pcp client tools (pmrep, pmlogsummary, etc) to generate a daily report which is left in some convenient location. This ends up being hideously expensive because the archive is now compressed - in the case of the existing pmlogger_daily_report it runs multiple (tens of?) pmrep invocations, each uncompressing-reporting-and-discarding the temporary archive.

A better, extensible design would fit into the mechanism you're describing, such that user scripts could run on the daily archive just at the right moment - after daily merging but before the daily compression. So, could we extend the PCP_DAILY_CALLBACK mechanism to allow multiple, user-defined scripts? I think that'd mean we could start enabling pmlogger_daily_report once more, and these other folks would be able to plug in for their needs too.

Final, unrelated thought - should we formalise all those $(date...) calls in the examples and provide "macros" (well, constants) like we've done for LOCALHOSTNAME? (perhaps DATEMM, DATEDD, DATEYYYY - something like that?) There's a teensy race condition I see there currently (in the example) where subsequent calls to 'date' might end up spanning days/months/years and providing different values for the "same" date command on one control line.

cheers.

@kmcdonell
Copy link
Member Author

I subsequently realized a flaw in the XDMoD example above. Since pmlogger_daily runs soon after midnight:

$PCP_DAILY_SAVE_DIR=/gpfs/data/root/pcp/\$(date +%Y)/\$(date +%m)/LOCALHOSTNAME/\$(date +%Y)-\$(date +%m)-\$(date +%d)

will save yesterday's archive in a directory named with today's date! And even more obscure, if pmlogger_daily failed for some reason yesterday, but worked today then this would save yesterday's archive and the day before's archive in a directory named with today's date, etc, etc.

This adds more force to the arguments from @natoscott to avoid the use of $(date ...) in the control files.

So now we have a revised proposal:

  1. Augment the pmlogger control file syntax to support some additional options via the existing $name=value mechanism (there are no command line equivalents for these new options).
  2. As each day's archive is created by merging and before any compression takes place, if $PCP_MERGE_CALLBACK is defined, then it is assumed to be a script that will be called with one argument being the name of the archive (stripped of any suffixes), so something of the form /some/directory/path/YYYYMMDD. The callback script will be run in the foreground, so pmlogger_daily will wait for it to complete.
  3. If the control file contains more than one $PCP_MERGE_CALLBACK specification then these will be run serially in the order they appear in the control file.
  4. If $PCP_MERGE_CALLBACK is defined in the environment when pmlogger_daily is run, this is treated as though this option was the first in the control file, i.e. it will be run first.
  5. If pmlogger_daily is run with -x 0 or $PCP_COMPRESSAFTER=0, then compression is done immediately after merging. As each day's archive is compressed, if $PCP_COMPRESS_CALLBACK is defined, then it is assumed to be a script that will be called with one argument being the name of the archive (stripped of any suffixes), so something of the form /some/directory/path/YYYYMMDD. The callback script will be run in the foreground, so pmlogger_daily will wait for it to complete.
  6. If the control file contains more than one $PCP_COMPRESS_CALLBACK specification then these will be run serially in the order they appear in the control file.
  7. If $PCP_COMPRESS_CALLBACK is defined in the environment when pmlogger_daily is run, this is treated as though this option was the first in the control file, i.e. it will be run first.
  8. Once the merging and possible compression has been done, if $PCP_AUTOSAVE_DIR is defined then all of the physical files that make up one day's archive will be moved (autosaved) to $PCP_AUTOSAVE_DIR. The basename of the archive is used to set the reserved words DATEYYYY (year), DATEMM (month) and DATEDD (day) and these may be used in $PCP_AUTOSAVE_DIR. so these correspond to the date on which the archive data was collected, not the date that pmlogger_daily was run. By "move" the archives we mean a paranoid checksum-cp-checksum-rm that will bail if the cp fails or the checksums do not match (the archives are important so we cannot risk something like a full filesystem or a permissions issue messing with the copy process). If pmlogger_daily is in "catch up" mode (more than one day's worth of archives need to be combined) then the archives for more than one day could be copied in this step.
  9. All of the callback script execution and the autosave moving will be executed as the non-privilged user "pcp" and group "pcp", so appropriate permissions may need to have been set up in advance.

With these changes, the simplest XDMoD variant becomes:

$PCP_AUTOSAVE_DIR=/gpfs/data/root/pcp/DATEYYYY/DATEMM/LOCALHOSTNAME/DATEYYYY-DATEMM-DATEDD
LOCALHOSTNAME	y   n	PCP_ARCHIVE_DIR/LOCALHOSTNAME -r -c /etc/pcp/pmlogger/pmlogger-supremm.config

Note that $PCP_MERGE_CALLBACK dodges the decompression issue, so is a suitable hook for pmlogger_daily_report, or any other customized scripts that need to process one day's worth of archive data.

One would not normally expect more than one of the new options to be set, but they could safely all be used to achieve different processing goals.

The nice thing about all of this is that it appears simple to implement and without regression risk.

@kmcdonell
Copy link
Member Author

This is all done now, and will be on board when the bright shiny new PCP 6.0 sets sail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants