-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEV-1075: add HTFeed::JobMetrics #113
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of good stuff here as a starting point.
One thing to address right now is a couple additional tests:
- something involving at least one
Stage
that verifies that it records the expected metrics (not necessarily the exact values, just some nonzero value) - an integration test involving
QueueRunner
and multipleStage
s and again verifies we see non-zero values for expected metrics.
as well as the issue of jm
vs job_metrics
.
A couple of things to think about later on:
- consider whether metrics tracking belongs in
QueueRunner
vs.Job
- Would definitely recommend thinking more about specific metric names and labels, although that also needs to be informed by specific use cases; we can talk about that more
- Would definitely want metrics in
HTFeed::Stage::Collate
and probably its more specific operations (https://github.com/hathitrust/feed/blob/main/lib/HTFeed/Stage/Collate.pm#L78-L86) - We may also want to add some metrics in
feed_internal
- one specific area might be this operation https://github.com/hathitrust/feed_internal/blob/main/lib/HTFeed/PackageType/Google/Unpack.pm#L42
I would suggest making follow-up subtasks for these areas, though -- I don't think they need to block merging.
@@ -60,8 +60,6 @@ RUN echo "deb [signed-by=/usr/share/keyrings/hathitrust-archive-keyring.gpg] htt | |||
|
|||
RUN apt-get update && apt-get install -y grokj2k-tools | |||
|
|||
RUN cpan -f -i Net::AMQP::RabbitMQ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Net::AMQP::RabbitMQ
was already included by Makefile.pl
, which is the proper place to put required modules, so I removed it.
@@ -24,6 +24,7 @@ WriteMakefile( | |||
'Mouse' => 0, | |||
'Net::AMQP::RabbitMQ' => 0, | |||
'Net::Prometheus' => 0, | |||
'Prometheus::Tiny::Shared' => 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New required module, used in new class HTFeed::JobMetrics
.
@@ -55,35 +55,33 @@ | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only cosmetic/whitespace changes in this file.
@@ -1,12 +1,10 @@ | |||
package HTFeed::Job; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly cosmetic/whitespace changes in this file.
my $self = shift; | ||
my $job = shift; | ||
|
||
while($job){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of metrics get recorded here.
my $self = { | ||
volume => undef, | ||
@_, | ||
has_run => 0, | ||
failed => 0, | ||
job_metrics => HTFeed::JobMetrics->get_instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it so that each object that inherits from Stage
should be able to access $self->{job_metrics}
, so we don't need to say use HTFeed::JobMetrics
in every Stage
-descendant.
@@ -1,11 +1,11 @@ | |||
package HTFeed::Stage::DirectoryMaker; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just re-ordering the use-declarations.
I thought I had re-reviewed this, but apparently the comment didn't show up. The gist was that all the changes looked good, especially the metrics names & units; I still wanted to see the integration tests:
I see you pushed some commits yesterday so I will take a look at those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integration test looks good to me. The way they work with match
and get_value
(fishing through the rendered output by regex) is the way I've done integration testing with metrics before, but since we're using those functions exclusively as test helpers (I think) I'd rather see them in job_metrics.t
. I don't think that needs to block merging, so I'll add an issue for it (#123)
Originating ticket: https://hathitrust.atlassian.net/browse/DEV-1075
What?
This PR adds the class
HTFeed::JobMetrics
that can be used to gather metrics on the various stages of a job, such as number of items downloaded, number of bytes downloaded and amount of time spent downloading (as well as the other job stages, not just download).HTFeed::JobMetrics
specifies which kinds of metrics can be used, knows how to update them, pretty print and clear metrics. It has its own minimal CLI (seeperl lib/HTFeed/JobMetrics.pm --usage
)Why?
We want a better understanding of how much time is spent on different parts of the Ingest workflow.
How?
--help
or--usage
should provide a short usage page.Each measured job reports how long it took to
HTFeed::JobMetrics
, which passes it on toPrometheus::Tiny::Shared
which saves the data point in aHash::SharedMem
.The
Hash::SharedMem
needs to live somewhere on the filesystem. For now, it defaults to a subdir of/tmp/
:... but that can be changed by setting
$ENV{'HTFEED_JOBMETRICS_DATA_DIR'}
to some other path.Metrics will generally change when jobs are run.
For ease of testing, they can be viewed/manipulated directly via commandline:
Get all metrics with:
Limitations
Note that if you call from outside docker, and have not set
$ENV{'HTFEED_JOBMETRICS_DATA_DIR'}
, then the metrics data will default to being stored in the container's/tmp
, and that data will not persist between commands.Far from every operation that could be measured is being measured, but the size of this PR is getting out of hand and those missing pieces could be added incrementally after we've proven whether this concept works in production.
Tests
docker compose run --rm test perl t/job_metrics.t
to run just the new tests.docker compose run --rm test
to run all tests.Where?
These are some of the main changes per file:
lib/HTFeed/JobMetrics.pm
new class, the main innovation in this PR.t/job_metrics.t
tests for the new class.lib/HTFeed/QueueRunner.pm
uses JobMetrics to observe_items
and_seconds
metrics for each job stage inrun_job_sequence
(this is a big and important part of this PR so this needs to work)Makefile.pl
getting required perl modulePrometheus::Tiny::Shared
.lib/HTFeed/Stage.pm
giving aJobMetrics
object ($self->{job_metrics}
) to allStage
objects and their heirs.The rest of the
Files changed
are about usingHTFeed::JobMetrics
, plus some cleanup/cosmetics.