Skip to content

Memory issue when using barriers? #445

@travisbell

Description

@travisbell

Hey @ioquatix,

I recently migrated one of our long running Resque jobs to use Async and while I am still amazed after all this time just at how clean and easy it is to move IO bound work to Async and see real measurable gains (in this case, 4x the throughput!) I quickly noticed we were running out of memory. This job had never had this problem before.

Upon digging into it a bit more, I noticed a difference when specifically, using barriers. Before I continue I need to tell you that I am not even 100% sure I am testing the right thing, so do please be kind if I am not.

Take these two methods (which are just proxies for the kind of work my job is doing) as examples:

def barrier_test
  GC.disable
  items = 1_000.times.map { rand(0.01..0.05) }
  Barrier do |barrier|
    semaphore = Async::Semaphore.new(8, parent: barrier)
    items.each do |item|
      semaphore.async do
        sleep(item)
      end
      
      GC.start
      puts GC.stat(:heap_live_slots)
    end
  end
end

def sync_test
  GC.disable
  items = 1_000.times.map { rand(0.01..0.05) }
  Sync do |task|
    semaphore = Async::Semaphore.new(8, parent: task)
    items.each do |item|
      semaphore.async do
        sleep(item)
      end
      
      GC.start
      puts GC.stat(:heap_live_slots)
    end
  end
end

Now you'll notice two interesting differences; the heap_live_slots value for barrier_test basically just runs unbounded, always getting bigger and bigger while the sync_test does not. Its value stays nice and flat (which I think(?) is expected).

Could this be the "issue" I am seeing with my Resque job?

Thanks as always. 🙏🏼

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions