Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV-1075: add HTFeed::JobMetrics #113

Merged
merged 1 commit into from
May 10, 2024
Merged

DEV-1075: add HTFeed::JobMetrics #113

merged 1 commit into from
May 10, 2024

Conversation

mwarin
Copy link
Contributor

@mwarin mwarin commented Mar 20, 2024

Originating ticket: https://hathitrust.atlassian.net/browse/DEV-1075

What?

This PR adds the class HTFeed::JobMetrics that can be used to gather metrics on the various stages of a job, such as number of items downloaded, number of bytes downloaded and amount of time spent downloading (as well as the other job stages, not just download).

HTFeed::JobMetrics specifies which kinds of metrics can be used, knows how to update them, pretty print and clear metrics. It has its own minimal CLI (see perl lib/HTFeed/JobMetrics.pm --usage)

Why?

We want a better understanding of how much time is spent on different parts of the Ingest workflow.

How?

--help or --usage should provide a short usage page.

Each measured job reports how long it took to HTFeed::JobMetrics, which passes it on to Prometheus::Tiny::Shared which saves the data point in a Hash::SharedMem.

The Hash::SharedMem needs to live somewhere on the filesystem. For now, it defaults to a subdir of /tmp/:

$self->{file} = "/tmp/HTFeed::JobMetrics::Data";
if (defined $ENV{'HTFEED_JOBMETRICS_DATA_DIR'}) {
    $self->{file} = $ENV{'HTFEED_JOBMETRICS_DATA_DIR'};
}
$self->{prom} = Prometheus::Tiny::Shared->new(filename => $self->{file});

... but that can be changed by setting $ENV{'HTFEED_JOBMETRICS_DATA_DIR'} to some other path.

Metrics will generally change when jobs are run.
For ease of testing, they can be viewed/manipulated directly via commandline:

$ docker compose run --rm test bash

perl lib/HTFeed/JobMetrics.pm --clear
  Clearing data...
  Data cleared.
  No operation defined, exiting.

perl lib/HTFeed/JobMetrics.pm --operation match --value collate
  running operation match ...
  # TYPE ingest_collate_bytes counter
  # TYPE ingest_collate_items counter
  # TYPE ingest_collate_seconds counter

perl lib/HTFeed/JobMetrics.pm --operation inc --metric ingest_collate_items
  running operation inc ...

perl lib/HTFeed/JobMetrics.pm --operation add --metric ingest_collate_bytes --value 2460
  running operation add ...

perl lib/HTFeed/JobMetrics.pm --operation match --value collate
  running operation match ...
  # TYPE ingest_collate_bytes counter
  ingest_collate_bytes 2460
  # TYPE ingest_collate_items counter
  ingest_collate_items 1
  # TYPE ingest_collate_seconds counter

Get all metrics with:

$ docker compose run --rm test perl lib/HTFeed/JobMetrics.pm --pretty

Limitations

Note that if you call from outside docker, and have not set $ENV{'HTFEED_JOBMETRICS_DATA_DIR'}, then the metrics data will default to being stored in the container's /tmp, and that data will not persist between commands.

Far from every operation that could be measured is being measured, but the size of this PR is getting out of hand and those missing pieces could be added incrementally after we've proven whether this concept works in production.

Tests

docker compose run --rm test perl t/job_metrics.t to run just the new tests.
docker compose run --rm test to run all tests.

Where?

These are some of the main changes per file:

  • lib/HTFeed/JobMetrics.pm new class, the main innovation in this PR.
  • t/job_metrics.t tests for the new class.
  • lib/HTFeed/QueueRunner.pm uses JobMetrics to observe _items and _seconds metrics for each job stage in run_job_sequence (this is a big and important part of this PR so this needs to work)
  • Makefile.pl getting required perl module Prometheus::Tiny::Shared.
  • lib/HTFeed/Stage.pm giving a JobMetrics object ($self->{job_metrics}) to all Stage objects and their heirs.

The rest of the Files changed are about using HTFeed::JobMetrics, plus some cleanup/cosmetics.

lib/HTFeed/JobMetrics.pm Outdated Show resolved Hide resolved
lib/HTFeed/JobMetrics.pm Outdated Show resolved Hide resolved
lib/HTFeed/JobMetrics.pm Outdated Show resolved Hide resolved
lib/HTFeed/JobMetrics.pm Outdated Show resolved Hide resolved
@mwarin mwarin marked this pull request as ready for review April 18, 2024 13:45
@mwarin mwarin changed the title wip PR for DEV-1075 DEV-1075: add HTFeed::JobMetrics Apr 18, 2024
@aelkiss aelkiss self-requested a review April 23, 2024 14:00
Copy link
Member

@aelkiss aelkiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of good stuff here as a starting point.

One thing to address right now is a couple additional tests:

  • something involving at least one Stage that verifies that it records the expected metrics (not necessarily the exact values, just some nonzero value)
  • an integration test involving QueueRunner and multiple Stages and again verifies we see non-zero values for expected metrics.

as well as the issue of jm vs job_metrics.

A couple of things to think about later on:

I would suggest making follow-up subtasks for these areas, though -- I don't think they need to block merging.

lib/HTFeed/Job.pm Show resolved Hide resolved
lib/HTFeed/JobMetrics.pm Outdated Show resolved Hide resolved
lib/HTFeed/JobMetrics.pm Show resolved Hide resolved
lib/HTFeed/JobMetrics.pm Show resolved Hide resolved
lib/HTFeed/JobMetrics.pm Show resolved Hide resolved
lib/HTFeed/QueueRunner.pm Show resolved Hide resolved
lib/HTFeed/Stage.pm Outdated Show resolved Hide resolved
lib/HTFeed/Stage/Download.pm Outdated Show resolved Hide resolved
lib/HTFeed/Stage/ImageRemediate.pm Outdated Show resolved Hide resolved
lib/HTFeed/Stage/Pack.pm Outdated Show resolved Hide resolved
@@ -60,8 +60,6 @@ RUN echo "deb [signed-by=/usr/share/keyrings/hathitrust-archive-keyring.gpg] htt

RUN apt-get update && apt-get install -y grokj2k-tools

RUN cpan -f -i Net::AMQP::RabbitMQ
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Net::AMQP::RabbitMQ was already included by Makefile.pl, which is the proper place to put required modules, so I removed it.

@@ -24,6 +24,7 @@ WriteMakefile(
'Mouse' => 0,
'Net::AMQP::RabbitMQ' => 0,
'Net::Prometheus' => 0,
'Prometheus::Tiny::Shared' => 0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New required module, used in new class HTFeed::JobMetrics.

@@ -55,35 +55,33 @@
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only cosmetic/whitespace changes in this file.

@@ -1,12 +1,10 @@
package HTFeed::Job;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly cosmetic/whitespace changes in this file.

my $self = shift;
my $job = shift;

while($job){
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of metrics get recorded here.

my $self = {
volume => undef,
@_,
has_run => 0,
failed => 0,
job_metrics => HTFeed::JobMetrics->get_instance
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it so that each object that inherits from Stage should be able to access $self->{job_metrics}, so we don't need to say use HTFeed::JobMetrics in every Stage-descendant.

@@ -1,11 +1,11 @@
package HTFeed::Stage::DirectoryMaker;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just re-ordering the use-declarations.

@mwarin mwarin requested a review from aelkiss May 7, 2024 13:17
@aelkiss
Copy link
Member

aelkiss commented May 10, 2024

I thought I had re-reviewed this, but apparently the comment didn't show up. The gist was that all the changes looked good, especially the metrics names & units; I still wanted to see the integration tests:

One thing to address right now is a couple additional tests:

  • something involving at least one Stage that verifies that it records the expected metrics (not necessarily the exact values, just some nonzero value)
  • an integration test involving QueueRunner and multiple Stages and again verifies we see non-zero values for expected metrics.

I see you pushed some commits yesterday so I will take a look at those.

Copy link
Member

@aelkiss aelkiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration test looks good to me. The way they work with match and get_value (fishing through the rendered output by regex) is the way I've done integration testing with metrics before, but since we're using those functions exclusively as test helpers (I think) I'd rather see them in job_metrics.t. I don't think that needs to block merging, so I'll add an issue for it (#123)

@mwarin mwarin merged commit 55eda8f into main May 10, 2024
1 check passed
@mwarin mwarin deleted the DEV-1075 branch May 10, 2024 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants