Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

300k batched email sends with Sidekiq take hours to finish #4193

Closed
theianjones opened this issue Jun 13, 2019 · 2 comments
Closed

300k batched email sends with Sidekiq take hours to finish #4193

theianjones opened this issue Jun 13, 2019 · 2 comments

Comments

@theianjones
Copy link

Ruby version: 2.4.1
Sidekiq: 5.2.2
Pro: 4.0.4

We are sending out roughly 300k ActionMailer emails through via Sidekiq::Batches. These batches take 2-3 hours to complete. When they are finished, the Sidekiq::Batch::Status still reports that is_complete is false and there are negative pending jobs. No errors are thrown but this whole process takes way too long. How can we speed this up?

# configs/initializers/sidekiq.rb

require 'sidekiq/pro/metrics'
Sidekiq::Client.reliable_push! unless Rails.env.test?
Sidekiq::Extensions.enable_delay!
Sidekiq::Pro.statsd = ->{ Statsd.new(ENV['STATSD_HOST'], ENV['STATSD_PORT']) }
Sidekiq.configure_client do |config|
  config.redis = { url: ENV[ENV['SIDEKIQ_REDIS_PROVIDER']] }
  config.client_middleware do |chain|
    chain.add EggheadEventClientMiddleware
  end
end
Sidekiq.configure_server do |config|
  config.redis = { url: ENV[ENV['SIDEKIQ_REDIS_PROVIDER']] }
  config.super_fetch!
  config.reliable_scheduler!
  config.client_middleware do |chain|
    chain.add EggheadEventClientMiddleware
  end
  config.server_middleware do |chain|
    require 'sidekiq/middleware/server/statsd'
    chain.add Sidekiq::Middleware::Server::Statsd
  end
  # config.server_middleware do |chain|
  #   chain.add EggheadEventServerMiddleware
  # end
  schedule_file = "config/schedule.yml"
  if File.exist?(schedule_file) && Rails.env.production? || Rails.env.staging?
    Sidekiq::Cron::Job.load_from_hash! YAML.load_file(schedule_file)
  end
end
production: &production
  url: postgres://...
  pool: 30
  prepared_statements: false
  encoding: unicode

We spawn 350k Sidekiq batches up at one time. It takes 30 threads a couple of hours to get through the whole lot.

We spawn the batches in BroadcastMessageSendWorker. message.contacts can contain up to 350k contacts.

    class BroadcastMessageSendWorker
      include Sidekiq::Worker
      def perform(message_guid)
        ActiveRecord::Base.connection_pool.with_connection do
          message = BroadcastMessage.find(message_guid)
          message.with_lock do
            return unless message.pending?
            message.pickup!
            if message.contacts.count == 0
              message.finish!
              return
            end
            batch = Sidekiq::Batch.new
            batch.on(:complete, self.class, 'guid' => message_guid)
            batch.jobs do
              # We can't use `uniq` or `DISTINCT` with find_in_batches because after 1000 records it
              # will start blowing up. Instead, use an in-memory `seen` index
              seen = Set.new({})
              message.contacts.select(:id).find_in_batches do |contact_batch|
                args = contact_batch.pluck(:id).map do |contact_id| 
                  next unless seen.add?(contact_id) # add? returns nil if the object is already in the set
                  [message_guid, contact_id]
                end
                Sidekiq::Client.push_bulk('class' => BroadcastMessageDeliverWorker, 'args' => args.compact)
              end
            end
            message.update(batch_id: batch.bid)
          end
        end
      end
      def on_complete(_, options)
        message = BroadcastMessage.find(options['guid'])
        message.finish! if message.sending?
      end
    end

Here is the BroadcastMessageDeliverWorker that we use to send the Action Mailer messages.

class BroadcastMessageDeliverWorker
  include Sidekiq::Worker
  sidekiq_options queue: 'broadcast_mailers', retry: 2, dead: false
  def perform(message_guid, contact_id)
    ActiveRecord::Base.connection_pool.with_connection do
      if Rails.env.staging?
        Rails.logger.error "Not running in staging"
        return
      end
      return unless valid_within_batch?
      BroadcastMessageMailer.send_message(message_guid, contact_id).deliver_now
    end
  end
end

Here is the status info of the last batch that went out. It looks like things aren’t completing correctly:

{
  :is_complete=>false,
  :bid=>"mU...",
  :total=>35493,
  :pending=>-7,
  :description=>"",
  :failures=>0,
  :created_at=>1560331127.951144,
  :fail_info=>[]
}
@mperham
Copy link
Collaborator

mperham commented Jun 13, 2019

First, a back of the envelope calculation: 360k emails. IME it takes ~1 second to send an email on average, one hour is 3600 seconds, 3600 emails/hour/thread or 100 threads to send those emails in one hour. If you say it's taking 30 threads about 3 hours to send the emails, that's about what I would expect in the real world. You can always profile the Worker code to look for hotspots but your code looks at first glance like it's doing all the right things: find_in_batches, pluck(:id), push_bulk, etc.

Things I would try:

  • double your dynos to 60 threads and seeing if that halves the time
  • double the concurrency within each dyno and see if that halves the time

As for the negative pending, that's a worry. Make sure you are shutting down your Sidekiqs cleanly when deploying, sending TERM with plenty of time to shut down and avoid Heroku's kill -9 on shutdown timeout:

https://github.com/mperham/sidekiq/wiki/Deployment#heroku

(I think you are on Heroku?) If you have a deployment recipe, post it and I'll review.

@mperham
Copy link
Collaborator

mperham commented Jul 9, 2019

Another customer has found that they get negative pending errors when they push too much data into Redis so that it starts evicting Batch data. Make sure you've enabled maxmemory-policy noeviction in your Redis instance so Redis does not silently break Sidekiq.

@mperham mperham closed this as completed Jul 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants