Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument Storage Layer #9958

Merged
merged 11 commits into from
Apr 25, 2024
Merged

Instrument Storage Layer #9958

merged 11 commits into from
Apr 25, 2024

Conversation

FahadBSyed
Copy link
Contributor

@FahadBSyed FahadBSyed commented Apr 17, 2024

This is a first pass at instrumenting the storage layer. There's a few high level changes you'll generally see throughout this PR:

  1. Child contexts have been passed throughout. Particular areas of focus are distributed systems like the task service and low level constructs like file set, index, and chunk readers & writers.
  2. Metrics have been added. The metrics vary based on what is relevant, but generally we're reporting bytes read/written as well as indicators like cache misses vs cache hits, indices skipped vs indices read, and so forth.

I'm sure I've missed areas that we want to instrument, but that's fine. We can add those in a follow up PR.
Metrics are visible when running at a 'debug' level.

Here are some snippets of metrics:

{
  "severity": "debug",
  "time": "2024-04-18T19:30:24.935036734Z",
  "logger": "grpc.pfs_v2.API/ModifyFile.fileSetWriter.indexWriter(deletive-index-writer)",
  "caller": "index/writer.go:85",
  "message": "meter: bytes",
  "service": "pfs_v2.API",
  "method": "ModifyFile",
  "command": [
    "pachctl put file images@master -f liberty.jpg"
  ],
  "peer": "127.0.0.1:48528",
  "type": "int",
  "delta": 35,
  "meter": "bytes"
}
{
  "severity": "debug",
  "time": "2024-04-18T19:30:24.935196695Z",
  "logger": "grpc.pfs_v2.API/ModifyFile.fileSetWriter.batcher(chunk-batcher).taskChain.upload",
  "caller": "chunk/uploader.go:186",
  "message": "meter: tx_bytes",
  "service": "pfs_v2.API",
  "method": "ModifyFile",
  "command": [
    "pachctl put file images@master -f liberty.jpg"
  ],
  "peer": "127.0.0.1:48528",
  "type": "int",
  "delta": 133858,
  "meter": "tx_bytes"
}
{
  "severity": "debug",
  "time": "2024-04-18T19:30:24.954077375Z",
  "logger": "grpc.pfs_v2.API/ModifyFile.fileSetWriter.indexWriter(additive-batched-index-writer)",
  "caller": "index/writer.go:85",
  "message": "meter: bytes",
  "service": "pfs_v2.API",
  "method": "ModifyFile",
  "command": [
    "pachctl put file images@master -f liberty.jpg"
  ],
  "peer": "127.0.0.1:48528",
  "type": "int",
  "delta": 155,
  "meter": "bytes"
}

src/internal/storage/chunk/batcher.go Outdated Show resolved Hide resolved
@@ -14,8 +16,10 @@ type Renewer struct {
}

func NewRenewer(ctx context.Context, tr track.Tracker, name string, ttl time.Duration) *Renewer {
ctx = pctx.Child(ctx, "trackerRenewer", pctx.WithCounter("renewals", 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only need a WithCounter in the context if you want to do aggregation. If you're OK logging at every call to Inc, then you don't need to bother with this. (With no options, the value will be written to the logs at most every 10 seconds.)

src/internal/storage/chunk/uploader.go Outdated Show resolved Hide resolved
src/internal/storage/chunk/uploader.go Outdated Show resolved Hide resolved
@FahadBSyed FahadBSyed changed the title Pfs 221 Instrument Storage Layer Apr 18, 2024
Copy link

codecov bot commented Apr 19, 2024

Codecov Report

Attention: Patch coverage is 97.70115% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 58.22%. Comparing base (9569e42) to head (1abe9a6).
Report is 6 commits behind head on master.

❗ Current head 1abe9a6 differs from pull request most recent head 95386fc. Consider uploading reports for the commit 95386fc to get more accurate results

Files Patch % Lines
src/internal/storage/chunk/renewer.go 50.00% 1 Missing ⚠️
src/internal/storage/fileset/index/reader.go 93.75% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #9958   +/-   ##
=======================================
  Coverage   58.21%   58.22%           
=======================================
  Files         614      614           
  Lines       75461    75527   +66     
=======================================
+ Hits        43931    43976   +45     
- Misses      30978    30998   +20     
- Partials      552      553    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@brycemcanally brycemcanally left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@FahadBSyed FahadBSyed merged commit 84a3c0a into master Apr 25, 2024
19 of 20 checks passed
jrockway pushed a commit that referenced this pull request Apr 29, 2024
This is a first pass at instrumenting the storage layer. There's a few
high level changes you'll generally see throughout this PR:
1. Child contexts have been passed throughout. Particular areas of focus
are distributed systems like the task service and low level constructs
like file set, index, and chunk readers & writers.
2. Metrics have been added. The metrics vary based on what is relevant,
but generally we're reporting bytes read/written as well as indicators
like cache misses vs cache hits, indices skipped vs indices read, and so
forth.

I'm sure I've missed areas that we want to instrument, but that's fine.
We can add those in a follow up PR.
Metrics are visible when running at a 'debug' level.

Here are some snippets of metrics:
```json
{
  "severity": "debug",
  "time": "2024-04-18T19:30:24.935036734Z",
  "logger": "grpc.pfs_v2.API/ModifyFile.fileSetWriter.indexWriter(deletive-index-writer)",
  "caller": "index/writer.go:85",
  "message": "meter: bytes",
  "service": "pfs_v2.API",
  "method": "ModifyFile",
  "command": [
    "pachctl put file images@master -f liberty.jpg"
  ],
  "peer": "127.0.0.1:48528",
  "type": "int",
  "delta": 35,
  "meter": "bytes"
}
{
  "severity": "debug",
  "time": "2024-04-18T19:30:24.935196695Z",
  "logger": "grpc.pfs_v2.API/ModifyFile.fileSetWriter.batcher(chunk-batcher).taskChain.upload",
  "caller": "chunk/uploader.go:186",
  "message": "meter: tx_bytes",
  "service": "pfs_v2.API",
  "method": "ModifyFile",
  "command": [
    "pachctl put file images@master -f liberty.jpg"
  ],
  "peer": "127.0.0.1:48528",
  "type": "int",
  "delta": 133858,
  "meter": "tx_bytes"
}
{
  "severity": "debug",
  "time": "2024-04-18T19:30:24.954077375Z",
  "logger": "grpc.pfs_v2.API/ModifyFile.fileSetWriter.indexWriter(additive-batched-index-writer)",
  "caller": "index/writer.go:85",
  "message": "meter: bytes",
  "service": "pfs_v2.API",
  "method": "ModifyFile",
  "command": [
    "pachctl put file images@master -f liberty.jpg"
  ],
  "peer": "127.0.0.1:48528",
  "type": "int",
  "delta": 155,
  "meter": "bytes"
}

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants