Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a queue for offline jobs #1240

Closed
Tracked by #6657
crowbot opened this issue Nov 27, 2013 · 7 comments
Closed
Tracked by #6657

Use a queue for offline jobs #1240

crowbot opened this issue Nov 27, 2013 · 7 comments

Comments

@crowbot
Copy link
Member

crowbot commented Nov 27, 2013

Possible job types to include:

Possible advantages:

  • Separation of concerns - web app interface versus more resource intensive processes
  • Closer to realtime updates of the index
  • Resilience to memory leaks, bloat
  • One coherent system rather than multiple cron-jobs and daemons
@crowbot
Copy link
Member Author

crowbot commented Jan 9, 2014

Not spawning a new rails app each time a mail comes in could improve the resource usage on smaller hosts.

@garethrees
Copy link
Member

Rails 4.2 (#2968) has a built-in framework for integrating background jobs.

@garethrees
Copy link
Member

Rails 6.1 (#6011) has the ability to destroy associated records asynchronously which would help prevent timeouts when managing WDTK.

@gbp
Copy link
Member

gbp commented Jun 28, 2022

To help 3rd party installs, @garethrees wanted to look at using a PG backed queue over Sidekiq/Redis - this seems like a good option: https://github.com/bensheldon/good_job

@garethrees
Copy link
Member

https://github.com/que-rb/que was also something I'd seen that seems to be pretty well maintained.

@garethrees
Copy link
Member

Our initial stab at this should be quite basic; mainly with the intention of getting the job infrastructure in place and running, with an initial concrete implementation of ActiveJob so that we actually gain some benefit from doing the setup work.

I think we have 4 main tasks.

  1. Decide on a QueueAdapter
  2. Add ActiveJob configuration (ensure we handle errors, etc)
  3. Add infrastructure to run the backend processor (unix daemon?)
  4. Add a concrete implementation of ActiveJob

2 and 3 don't need too much discussion at this point, so I'll focus on 1 and 4.

To reiterate, the main focus is getting the ActiveJob infrastructure in place and running so that over time we can more easily add jobs in future; we're not going to get everything migrated to a job in one shot.

1. Decide on a QueueAdapter

As previously mentioned, I'm keen to avoid additional infrastructure costs for re-users who don't have in-house sysadmins. Sidekiq requires Redis, which adds another thing to configure, secure, run and maintain.

It's also worth noting that Sidekiq does not process the queue serially. Is this a problem? IDK. Here's a situation:

  • Event 1: A request is sent
  • Event 2: A response is received
  • Event 3: A classification is made

Would it be bad if the order of indexing ended up being?

  • Event 1
  • Event 3
  • Event 2

Do either of the postgres-based backends (que or good_job) allow more flexibility here?

4. Add a concrete implementation of ActiveJob

Association Destruction

An easy-to-implement option would be to add dependent: :destroy_async for some key classes so that we get jobs flowing through the system. Are there records we destroy on a fairly regular basis that would benefit from this?

Xapian Indexing

The ideal improvement would be to run ActsAsXapian indexing jobs via ActiveJob, but this has potential to be a significant effort which may exceed the time we have available. If we explore this option, we should create an outline plan of how we'll achieve it before starting programming work. Of course this will require a little spiking, but we want to have an understanding of what we think this will take before committing to completing it.

We already have xapian_create_job that we can re-write to create an ActiveJob record.

I think the difficulty will be in how the job actually gets reindexed. All this logic is currently wrapped up in ActsAsXapian.update_index which expects to iterate through the current ActsAsXapianJob records, rather than having a single event to index. We do have the xapian_index which might be the thing to call from the job, but we need to be careful of the interaction with full indexing (i.e. "aborting incremental index update while full index rebuild happens").

Expiring Events

This is all sounding quite complicated, so perhaps instead we move InfoRequest#reindex_request_events to be background-job based as that's something that often times out in a request cycle… something like:

class InfoRequest
  def reindex_request_events
    InfoRequest::ReindexRequestEventsJob.perform_later(self)
  end
end

class InfoRequest::ReindexRequestEventsJob
  def perform(info_request)
    info_request.info_request_events.find_each(&:xapian_mark_needs_index)
  end
end

@garethrees
Copy link
Member

Fixed by #7575. Moving specific functionality into background jobs can be ticketed independently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants