-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All examples skipped after exception in before: suite hook in queue mode #196
Comments
Hi @amanfredi I'm wondering how to reproduce it. I did # spec_helper.rb
KnapsackPro::Hooks::Queue.before_queue do |queue_id|
raise 'a'
end Indeed knapsack pro consumes tests from Queue API and does it fast. It does not execute tests because RSpec could not start tests due to a failure. Tests failed. CI build should complete work faster when tests are not executed. I'm not sure why you experienced a timeout from CI? Do you mean you fixed the bug in the code and you started a new build (attempted some kind of retry) and it was split using cached distribution (fixed queue split) so that a single CI node got too many tests (due to them being cached on the API side). If you fixed a bug in the code then I would expect a new git commit hash so that the fixed queue split (cached distribution on API side) won't be used. Could you give more details on how to reproduce the problem. What you did? |
For the future reference. In order to reproduce the problem, raise an exception outside of a test example. describe 'abc' do
raise 'error'
it do
expect(true).to be true
end
end When tests fail, then new tests are fetched from Queue API. All tests fail so that means a lot of tests will be fetched quickly from Queue API. If you retry a CI node (and you use Problem: Possible solution. Ensure the DB is up and running: Example for Github Actions: services:
postgres:
image: postgres:10.8
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: ""
POSTGRES_DB: postgres
ports:
- 5432:5432
# needed because the postgres container does not provide a healthcheck
# tmpfs makes DB faster by using RAM
options: >-
--mount type=tmpfs,destination=/var/lib/postgresql/data
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5 Maybe this will be helpful: If the problem happens again, you can create a new commit and this will start a new queue so that the tests are distributed in new order. |
We run 40 parallel CI runners using knapsack pro queue mode, which normally complete in about 10 minutes each.
I recently observed that after retrying certain failed test nodes, the test runner would consistently time out. I realized this was because the runner had fetched more tests from the knapsack queue than it could possibly execute, and that distribution was persisted with the fixed queue split & retry configuration. The underlying cause of this is that the first failure did not fail the entire run, but instead caused knapsack to continue fetching examples from the queue and then not actually execute them.
See the following log for examples:
The text was updated successfully, but these errors were encountered: