Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pmwebd: improve graphite archive-cache performance w.r.t. syscalls #429

Merged
merged 1 commit into from
Mar 7, 2018

Conversation

fche
Copy link
Contributor

@fche fche commented Jan 26, 2018

It was reported that on a large collection of pcp archives, which
included a number of corrupt (0-byte ones), the graphite
metric-enumeration query took too long. One source of this was
excessive effort on

  • frequently retrying opening corrupt archives, and
  • fstat'ing all files under -A $DIR

We no longer do either. Corrupt archives are treated as though they
were fresh at the moment of pmwebd startup, but containing no content.
The -A directory's transitive contents are no longer routinely
fstat()d, only readdir() enumerated. This costs us the ability to
follow subdirectory symlinks, but it's a pretty big win otherwise.
No QA impact, only performance.

It was reported that on a large collection of pcp archives, which
included a number of corrupt (0-byte ones), the graphite
metric-enumeration query took too long.  One source of this was
excessive effort on
- frequently retrying opening corrupt archives, and
- fstat'ing all files under -A $DIR

We no longer do either.  Corrupt archives are treated as though they
were fresh at the moment of pmwebd startup, but containing no content.
The -A directory's transitive contents are no longer routinely
fstat()d, only readdir() enumerated.  This costs us the ability to
follow subdirectory symlinks, but it's a pretty big win otherwise.
No QA impact, only performance.
@goodwinos
Copy link
Contributor

ran baseline (master) QA for group 'pmwebapi' and 1090 and 1388 failed. Then built and installed with the patches from this PR and #428, and now 661 and 1042 fail. That's on the same machine, so maybe there is residual pmwebapi QA pollution between test runs or something. Needs more investigation.

@goodwinos
Copy link
Contributor

Earlier report mentioning ".. residual pmwebapi QA pollution between test runs" has proven correct - I've applied this patch and run QA on a different system, and it's fine now. So I'll be merging this PR shortly.

goodwinos added a commit to goodwinos/pcp that referenced this pull request Mar 7, 2018
…fche-merge

Merged from PR performancecopilot#429 - pmwebd: improve graphite archive-cache performance w.r.t. syscalls
@goodwinos goodwinos merged commit 0b01b3c into performancecopilot:master Mar 7, 2018
@goodwinos
Copy link
Contributor

Merged into upstream master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants