pre-commit.ci timing out when passes locally #62

matthewfeickert · 2021-04-27T20:44:33Z

👋 Hi. pre-commit.ci is failing by timeout for PR scikit-hep/pyhf#1403 when it passes locally in a fresh virtual environment (and also passes in pre-commit.ci but then timesout).

c.f. https://results.pre-commit.ci/repo/github/118789569

and for a particular failing run

This is probably just transitory issue, but thought I'd still report it.

cc @lukasheinrich @kratsg

Also example of my claim that pre-commit passes locally:

(base) $ git checkout feat/clean-public-api-all  # Branch for the PR that is failing
(base) $ pyenv virtualenv 3.8.7 test-pre-commit
(base) $ pyenv activate test-pre-commit 
(test-pre-commit) $ pip install --upgrade pip setuptools wheel
(test-pre-commit) $ pip install pre-commit
(test-pre-commit) $ pre-commit run --all-files
Check for added large files..............................................Passed
Check for case conflicts.................................................Passed
Check for merge conflicts................................................Passed
Check for broken symlinks................................................Passed
Check JSON...............................................................Passed
Check Yaml...............................................................Passed
Check Toml...............................................................Passed
Check Xml................................................................Passed
Debug Statements (Python)................................................Passed
Fix End of Files.........................................................Passed
Mixed line ending........................................................Passed
Fix requirements.txt.................................(no files to check)Skipped
Trim Trailing Whitespace.................................................Passed
black....................................................................Passed
blacken-docs.............................................................Passed
flake8...................................................................Passed
pyupgrade................................................................Passed
nbqa-black...............................................................Passed
nbqa-pyupgrade...........................................................Passed

The text was updated successfully, but these errors were encountered:

asottile · 2021-04-27T20:46:20Z

very strange, timings do look elevated today according to my metrics -- let me look into whether something changed

matthewfeickert · 2021-04-27T21:01:12Z

I doubt this matters much, but the 3 shown timeouts above are happening at different stages:

matthewfeickert · 2021-04-27T21:12:08Z

@asottile retriggering the run from the GitHub comments has things passing now (after a long queue time): https://results.pre-commit.ci/run/github/118789569/1619557729.aLz8qBFvTBiIxfXakvechw

asottile · 2021-04-27T21:15:38Z

yeah the queue makes sense, I was kicking off a bunch of runs at the same time while the hosts were cycling.

there were no code changes during the period that led to higher timeouts, I suspect one of the hosts got a noisy neighbor in aws:

I'll be putting in some automated alerts to catch this particular failure mode in the future -- thanks for the report!

I'm going to send a message to the mailing list to make sure others know about this and follow up with a postmortem once I'm comfortable that it is resolved

I'll be watching this closely over the next couple of hours to make sure that fixed it

I'll also be sending out a postmortem entry to the mailing list

matthewfeickert · 2021-04-27T21:16:40Z

Awesome. :) Many thanks for this report and also for being ⚡ fast in your feedback and help!

asottile · 2021-04-27T23:56:11Z

marking this all clear, run times have returned to normal after mitigation

postmortem

root cause

unknown

no code changes occurred before or after the incident or to mitigate the incident
observable host level metrics (cpu / io) were not elevated on any of the affected hosts

what went well

run-level metrics were extremely helpful for identifying the affected timeframe and validating the fix
host rotation was quick and easy (already scripted)
helpful issue created by @matthewfeickert alerting to the problem

what didn't go well

detection: slow and entirely manual
prevention: unknown what caused the actual root issue

follow-up

detection: add automated alerting for elevated timing
prevention: investigate larger ec2 instance sizes for performance and to lessen "noisy neighbor" effects

matthewfeickert mentioned this issue Apr 27, 2021

pre-commit.ci timing out when passes locally pre-commit/pre-commit#1892

Closed

matthewfeickert changed the title ~~pre-commit.ci timing out when passes locally~~ pre-commit.ci timing out on nbqa-pyupgrade when passes locally Apr 27, 2021

matthewfeickert changed the title ~~pre-commit.ci timing out on nbqa-pyupgrade when passes locally~~ pre-commit.ci timing out when passes locally Apr 27, 2021

asottile mentioned this issue Apr 27, 2021

Improve pytest.approx error messages readability (Pull request) pytest-dev/pytest#8429

Merged

asottile added the outage label Apr 27, 2021

asottile closed this as completed Apr 27, 2021

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre-commit.ci timing out when passes locally #62

pre-commit.ci timing out when passes locally #62

matthewfeickert commented Apr 27, 2021 •

edited

asottile commented Apr 27, 2021

matthewfeickert commented Apr 27, 2021

matthewfeickert commented Apr 27, 2021 •

edited

asottile commented Apr 27, 2021

matthewfeickert commented Apr 27, 2021

asottile commented Apr 27, 2021

This comment has been minimized.

pre-commit.ci timing out when passes locally #62

pre-commit.ci timing out when passes locally #62

Comments

matthewfeickert commented Apr 27, 2021 • edited

asottile commented Apr 27, 2021

matthewfeickert commented Apr 27, 2021

matthewfeickert commented Apr 27, 2021 • edited

asottile commented Apr 27, 2021

matthewfeickert commented Apr 27, 2021

asottile commented Apr 27, 2021

This comment has been minimized.

matthewfeickert commented Apr 27, 2021 •

edited

matthewfeickert commented Apr 27, 2021 •

edited