Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate populating the queue from working off the queue #19

Closed
wvanbergen opened this issue Jul 14, 2017 · 1 comment
Closed

Separate populating the queue from working off the queue #19

wvanbergen opened this issue Jul 14, 2017 · 1 comment

Comments

@wvanbergen
Copy link
Contributor

I think I'd like to split up populating the queue, and working off the queue, so we can put a filter/order step in between. This allows us to remove tests we don't want to run, or order the list so the slowest ones get run first.

  1. Retrieve list of tests. [Implementations: Minitest discovery, File, ...]
  2. Annotate tests [e.g. Get statistics about every test case]
  3. Filter/order based on annotations.
  4. Populate to worker queue [Implementations: Memory, Redis]

Then, the existing worker code can work off this queue.

  • For this to work, we probably need to wrap every test case in a class, rather than representing it as a string, e.g. CI::Queue::Entry. This class would hold all annotation.
  • Annotations can come from the input list. This could be a more elaborate file format. For the minitest discovery, we could annotate test methods in the Ruby code.
  • Annotations can also come from external sources as a separate step. E.g. retrieve the duration data for all tests in the list from an external database.

@DazWorrall @casperisfine what do you think?

@casperisfine
Copy link
Contributor

It's already split off, here's how we initialize the queue right now:

  Minitest.queue = CI::Queue::Redis.new(
    TestGlobs.all_tests(seed: ENV['BUILDKITE_COMMIT']),
    build_id: build_id,
    worker_id: ENV.fetch('BUILDKITE_PARALLEL_JOB'),
    redis: CIRedis.connection,
    **queue_config,
  )

All queue implementations just stupidly take a list of strings (test identifiers). If we want to filter out parts of the tests, it can and should be done beforehand.

The only downside is that for resiliency purposes, all workers can potentially be elected master, which mean they all compute the list of tests, but only one actually get to populate the shared queue with it.

So if we were to have a costly way to reduce the list it would be a bit wasteful.

what do you think?

IMO if we want to filter out the tests, it should be the responsibility of a dedicated service which can hold state, record all tests runs results and refine it's heuristics based on that data.

e.g. curl https://example.com/commits/abcdef092332432/tests.txt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants