Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run cycle detector every N ms based on timer #2709

Merged
merged 2 commits into from Jun 20, 2018

Conversation

dipinhora
Copy link
Contributor

Prior to this change, the cycle detector would receive
block/unblock messages from actors whenever the actors logically
blocked or unblocked. This resulted in a large amount of overhead
for cycle detection due the large amount of block/unblock messages
that were repeatedly sent from an actor.

This commit changes things so that actors do not send block
messages to the cycle detector until the cycle detector asks an
actor if it is blocked or not. The logic works as follows:

  • When an actor is created, the cycle detector is notified so it
    is aware of the new actor.
  • When an actor logically blocks, it makes note of that as a flag
    (same as before) but it no longer sends a block message to the
    cycle detector immediately.
  • When the ASIO thread starts, it asks the cycle detector to
    create a timer for running the cycle detection check.
  • When the timer fires, the cycle detector asks every actor it
    knows of if it is blocked or not and then checks for any cycles
    and does the normal conf/ack processing.
  • If an actor is logically blocked and receives a message from
    the cycle detector asking if it is blocked, the actor sends a
    block message to the cycle detector and makes note of that fact
    that it sent a block message in a flag.
  • If an actor gets a normal/application message, it immediately
    sends an unblock message to the cycle detector. Same if the
    actor gets a GC acquire or release message.
  • If an actor gets a conf message, it sends an ack if it has
    already sent a block (and didn't send an unblock already).
  • When the cycle detector receives any messages from actors
    (created, block, unblock, ack, etc), it processes them
    immediately to update it's viewmap and adds actors for deferred
    cycle detection checking when the next time the timer is
    triggered. This allows for any actors that might have received
    new work inbetween blocking and cycle detection to send an
    unblock message (which would remove the actor from the viewmap
    deferred status) and so would not be considered as part of
    cycle detection (making cycle detection more efficient).

The end result is that there is almost no overhead to cycle
detection with this new approach.

Additional changes:

  • The --ponycdmin, --ponycdmax, and --ponycdconf parameters
    have been removed because they don't make sense any longer with
    this new approach. They have been replaced with
    --ponycdinterval to control how often the cycle detector
    runs it's detection (the interval determines how often the
    timer for detection fires).

There are currently two down sides to this new approach:

  • First, this new approach results in every new actor being
    registered with the cycle detector. Previously, actors were
    only registered with the cycle detector if they blocked. My
    naive understanding is that this is not a big deal because
    almost every actor would block at some point in a pony program
    as that is the only way to reach quiescence but I could be
    wrong about that.
  • Second, the ponyint_destroy would now definitely result in
    a program crash (unless --ponynoblock were used) due to the
    cycle detector having a reference to every actor. This
    could be avoided if ponyint_destroy could call
    ponyint_cycle_actor_destroyed but unfortunately
    ponyint_destroy does not have a ctx available as is required
    for ponyint_cycle_actor_destroyed.

@jemc jemc requested a review from sylvanc May 23, 2018 15:18
@winksaville
Copy link
Contributor

Are there measurements on the size of the "large amount of overhead" and the performance imporvements this change provides?

Does the timer run continuously, if so that would seem to be a problem for long running programs, especailly in a battery operated environment?

@SeanTAllen
Copy link
Member

So...

Second, the ponyint_destroy would now definitely result in
a program crash (unless --ponynoblock were used) due to the
cycle detector having a reference to every actor. This
could be avoided if ponyint_destroy could call
ponyint_cycle_actor_destroyed but unfortunately
ponyint_destroy does not have a ctx available as is required
for ponyint_cycle_actor_destroyed.

ponyint_destroy will crash a program? am i reading that correctly?

@dipinhora
Copy link
Contributor Author

@winksaville The overhead of the cycle detector can be quantified using the message-ubench example.

Before this change the performance is as follows:

Without --ponynoblock:

Dipins-MBP:dhp dipinhora$ /usr/bin/time -l ./message-ubench --ponyminthreads 9999 --pingers 1000 --report-count 10
# pingers 1000, report-interval 10, report-count 10, initial-pings 5
time,run-ns,rate
1527165067.207296000,999989000,18208961
1527165068.206935000,998713000,18389554
1527165069.205445000,997710000,18328885
1527165070.205199000,998745000,18347287
1527165071.203882000,997716000,18483239
1527165072.202323000,997261000,18143352
1527165073.202567000,999347000,18257816
1527165074.201814000,998236000,18136653
1527165075.201811000,998726000,18202724
1527165076.200623000,997945000,18260926
       10.00 real        39.82 user         0.05 sys
  13660160  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
      3332  page reclaims
         6  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
         0  voluntary context switches
      6526  involuntary context switches

With --ponynoblock:

Dipins-MBP:dhp dipinhora$ /usr/bin/time -l ./message-ubench --ponyminthreads 9999 --pingers 1000 --report-count 10 --ponynoblock
# pingers 1000, report-interval 10, report-count 10, initial-pings 5
time,run-ns,rate
1527165087.205905000,999039000,23946329
1527165088.205142000,998536000,24153367
1527165089.203449000,997578000,23720937
1527165090.203737000,999551000,23891998
1527165091.202563000,998096000,24017780
1527165092.202844000,999547000,23885066
1527165093.200750000,997218000,23990724
1527165094.201036000,999506000,24087362
1527165095.199901000,998139000,23507249
1527165096.198370000,997794000,23931942
        9.99 real        39.75 user         0.04 sys
   9277440  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
      2264  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
         0  voluntary context switches
      6700  involuntary context switches

There's about a 5 million message increase (on my mac) in the message-ubench metrics when --ponynoblock is used.

After this change the performance is as follows:

Without --ponynoblock:

Dipins-MBP:dhp dipinhora$ /usr/bin/time -l ./message-ubench --ponyminthreads 9999 --pingers 1000 --report-count 10
# pingers 1000, report-interval 10, report-count 10, initial-pings 5
time,run-ns,rate
1527165143.291448000,999781000,24037426
1527165144.290693000,998500000,22833212
1527165145.290356000,997249000,21950785
1527165146.289114000,998051000,24019057
1527165147.288410000,998569000,23993396
1527165148.288057000,998962000,24058392
1527165149.286837000,998075000,23848351
1527165150.286014000,998456000,23959039
1527165151.285138000,998389000,23958878
1527165152.283302000,997308000,24094481
       10.00 real        39.65 user         0.04 sys
  10022912  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
      2370  page reclaims
        66  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
         0  voluntary context switches
      6178  involuntary context switches

With --ponynoblock:

Dipins-MBP:dhp dipinhora$ /usr/bin/time -l ./message-ubench --ponyminthreads 9999 --pingers 1000 --report-count 10 --ponynoblock
# pingers 1000, report-interval 10, report-count 10, initial-pings 5
time,run-ns,rate
1527165158.972248000,999134000,24212641
1527165159.972421000,999380000,24178742
1527165160.972273000,999123000,24062581
1527165161.971532000,998545000,24040628
1527165162.970765000,998526000,24205803
1527165163.969960000,998470000,24241206
1527165164.967922000,997235000,24171910
1527165165.967880000,999236000,23983357
1527165166.967167000,998586000,23921745
1527165167.966355000,998486000,24165289
       10.00 real        39.79 user         0.04 sys
   9048064  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
      2200  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
         0  voluntary context switches
      4671  involuntary context switches

The performance for message-ubench is roughly the same (on my mac) both with and without --ponynoblock.


In regards to the timer, yes, it runs continuously for the duration of the program (unless --ponynoblock is used which results in no timer).

I'm not sure I follow how the timer running continuously is inherently a problem for any long running program regardless of operating environment. Can you expand on what you meant by that comment?

In regards to battery constrained environments, yes, the timer (and subsequent cycle detection processing) would add some additional overhead. However, it would be possible to manage the timer along with the dynamic scheduler scaling logic to ensure that when all scheduler threads go to sleep (not currently possible/enabled due to some unsquashed bugs) that the timer is cancelled because no work can be done by any actors and then recreated when the first scheduler thread is woken up. It might also be possible to do the same with 1 scheduler thread active (the current best case with dynamic scheduler scaling after all other threads are asleep) when that scheduler thread is quiescent during the steal loop (and pauses the cpu) but that would be a bit more complex to implement. NOTE: both of these proposed solutions would be additional work building on the current PR.

@dipinhora
Copy link
Contributor Author

@SeanTAllen Kind of. The current ponyint_destroy looks like the following (https://github.com/ponylang/ponyc/blob/master/src/libponyrt/actor/actor.c#L458-L464):

PONY_API void ponyint_destroy(pony_actor_t* actor)
{
  // This destroys an actor immediately. If any other actor has a reference to
  // this actor, the program will likely crash. The finaliser is not called.
  ponyint_actor_setpendingdestroy(actor);
  ponyint_actor_destroy(actor);
}

This PR doesn't actually change the logic in ponyint_destroy but only updates the comment details to reflect that the cycle detector will now always have a reference to every actor:

PONY_API void ponyint_destroy(pony_actor_t* actor)
{
  // This destroys an actor immediately.
  // This will DEFINITELY CAUSE THE PROGRAM TO CRASH because the
  // cycle detector has a reference to the actor.
  // The finaliser is not called.

  // This wouldn't be a problem if we could call the following
  // but we can't because we don't have a ctx
  // ponyint_cycle_actor_destroyed(ctx, actor);

  ponyint_actor_setpendingdestroy(actor);
  ponyint_actor_destroy(actor);
}

Calling ponyint_destroy will not itself cause a crash. However, the next time the cycle detector runs and does its cycle detection logic, the program will likely crash because the actor that was destroyed using ponyint_destroy will now be an invalid reference but the cycle detector will not be aware of that.

@jemc
Copy link
Member

jemc commented May 24, 2018

@dipinhora - What is the barrier to adding the ctx as a parameter to ponyint_destroy?

@winksaville
Copy link
Contributor

@dipinhora, my concern is running a timer continuously means a server in the cloud will be charged CPU usage fees even if no activity is occuring and in battery operated devices it would needlessly drain the battery.

To me, "polling" architectures just feel wrong, one of the things I love about message passing is that it's "reactive" and if there is no work to do the system should be able to go into a "deep sleep" mode.

@dipinhora
Copy link
Contributor Author

@jemc In theory one could call pony_ctx() to get the ctx however that expects this_scheduler to be non-NULL. See: https://github.com/ponylang/ponyc/blob/master/src/libponyrt/sched/scheduler.c#L1032

I'm not sure if ponyint_destroy is intended to always be called from a thread within pony or not. But assuming we're sure that it will always only be called by a pony created thread we can rely on pony_ctx() to get the ctx and then let the cycle detector know of the destroyed actor.

This still wouldn't make it completely safe though because it could be possible that another actor that is not the cycle detector has a reference to the actor being destroyed by ponyint_destroy. It would, however, make it as safe as it currently is (maybe a bit safer because any cycle detector held reference could be safely removed).

@dipinhora
Copy link
Contributor Author

@winksaville I don't disagree with your concern regarding the CPU usage charge or the battery drain.

However, looking at the current state of the pony runtime, there is always CPU usage (and thereby battery drain) because there is always at least 1 scheduler thread active and the largest CPU pause in quiescent via ponyint_cpu_core_pause is a maximum of about 30 ms before the CPU wakes again to check for work (see: https://github.com/ponylang/ponyc/blob/master/src/libponyrt/sched/cpu.c#L291-L342).

As a result, I don't think the addition of the timer for cycle detection is a significant concern in the overall scheme of things (given the current pony runtime state).

To optimize pony for battery operated devices and metered cloud environments would require some additional work on the runtime. The dynamic scheduler scaling was a step in the right direction but it is currently limited in not being able to suspend the last scheduler thread (so only the ASIO thread would remain awake). The next step would be to fix the bugs related to suspending the last scheduler thread to remove the last source of continuous CPU usage for programs that don't require it (i.e. any program where the input comes from an ASIO event so it cannot reach quiescence).

Assuming the runtime can suspend all scheduler threads, this PR would then create an unnecessary source of continuous work (that would be a waste of cycles because by definition no actors could become blocked if all scheduler threads were suspended). This could be managed by having the dynamic scheduler scaling logic cancel the cycled detector timer before suspending and then recreate it when the first scheduler thread is woken up as described in my previous comment. This would not result in any continuous or unnecessary work due to the cycle detector timer and would be battery friendly and CPU resource minimal.

In regards to polling architectures vs reactive architectures, the current implementation for the cycle detector is completely reactive (if no block messages are sent to it, it does no work; this is how --ponynoblock works). There is a significant overhead to this and it can be quantified using something like message-ubench numbers (more than 20% overhead based on the numbers for that application). My proposed solution is to switch to a polling architecture where actors don't send any messages to the cycle detector unless it requests it first. This removes almost all of the overhead related to the actor block/unblock message chatter of the current implementation. The only problem is that if no messages are sent to the cycle detector, it will not be scheduled and will never be able to send a request to all actors asking about their status. One solution to ensuring that the cycle detector gets scheduled is to use a timer as this PR does. Another possible solution is to force one of the scheduler threads to ensure that the cycle detector gets scheduled. I chose the timer because it seemed as though it would be more consistent and easily tunable if needed. Assuming tunability and/or consistency is not a concern, it would be relatively easy to switch from using the timer to having a scheduler thread ensure the cycle detector gets scheduled. This would also address your concerns and might be better than the timer but would create some additional overhead for whichever scheduler thread is responsible for ensuring the cycle detector is scheduled.

I'm open to suggestions to improve this PR and/or alternatives to solve the issue of cycle detection overhead and do not want to force this change in as is if others don't agree it's a worthwhile change.

@jemc
Copy link
Member

jemc commented May 24, 2018

@dipinhora - ponyint_destroy has the ponyint prefix, which means it is "internal" - it isn't meant to be called by anyone but the pony runtime or the compiler-generated code.

So it seems like we can freely change the function signature and/or the places where it is called without worrying about the question of whether some user will choose to call it with inappropriate arguments or in an inappropriate context. Does that make sense, or am I still missing something in what you're saying?

@dipinhora dipinhora force-pushed the make_psycho_detector_better branch from 3e9c3bb to c3f076b Compare May 25, 2018 12:20
@dipinhora
Copy link
Contributor Author

@jemc Good point. There are only two instances where ponyint_destroy is used. One is to destroy the cycle detector actor and the other to destroy the main actor created to run primitive finalisers.

Neither of these two instances should result in a message to the cycle detector about the actor being destroyed (the cycle detector doesn't need to know it's being destroyed and the temporary main actor is created/destroyed after the runtime and cycle detector have been terminated). However, just in case of any future uses of ponyint_destroy that might need to be tracked, I've updated ponyint_destroy to take a pony_ctx as an argument and to attempt to notify the cycle detector of the actor being destroyed.

@dipinhora
Copy link
Contributor Author

Anyone have any opinion on the use of the timer vs having a scheduler thread to ensure the cycle detector is run consistently?

I'm starting to wonder if that additional complexity of the timer is worth it or not.

My initial thought was that the timer would be a good solution but it adds some complexity in relation to @winksaville's concerns regarding cpu/battery usage. Switching to having a scheduler thread ensure the cycle detector runs would accomplish the same goal without the additional complexity of canceling and recreating the timer when suspending/resuming the last scheduler thread. NOTE: This would result in that scheduler thread (index == 0) having some additional overhead (to check to see if it's time to run the cycle detector or not using ponyint_cpu_tick and to send the message to the cycle detector telling it to run) and every other scheduler thread having an extra branch to evaluate to determine if it's scheduler with index == 0 or not (this is likely minimal due to branch prediction logic in processors).

@winksaville
Copy link
Contributor

My take is that even though using a timer probably won't make pony any worse in than it is now, it does add another place that would have to be changed in the future to make pony more efficient in its CPU utilization. So trying to use the scheduler thread to ensure the cycle dector runs and see what its cost might be seems worth trying.

@jemc
Copy link
Member

jemc commented May 25, 2018

@winksaville - regarding the point that this adds another place that would have to change to support low-power / long sleep systems, it's probably worth noting that @sylvanc wants to eventually remove the cycle detector in favor of another approach, and it's possible (likely?) that this effort would happen first anyway.

@winksaville
Copy link
Contributor

@dipinhora this change seems fairly large and in a critical area, is there test coverage data to minimize regressions?

@dipinhora dipinhora force-pushed the make_psycho_detector_better branch 4 times, most recently from ab2fe6e to 0bdeb36 Compare May 26, 2018 04:19
@dipinhora
Copy link
Contributor Author

@winksaville I've added a test (CodegenTest.CycleDetector) to test out that the cycle detector is functioning correctly. Unfortunately, the test works correctly most of the time on my OSX laptop but it seems that in CI it fares much worse. I don't think that is because the test is invalid but more that the cycle detector doesn't always detect the dead cycle of actors (I could be wrong about that and it could be the test that is broken). Additionally, Windows doesn't seem to like me using ponyint_cpu_core_pause in the new test. I'm open to other ideas for tests to add for coverage around this functionality as I'm unable to think of anything else.

I'd also added a commit to remove the cycle detector timer and to instead have scheduler thread 0 send messages to the cycle detector to ensure it is woken up on a regular interval. The performance of this version is comparable to the performance with the cycle detector timer.

I don't believe the test failure issue should hold up this PR (unless we determine there's a test logic error) as that seems to me to be unrelated to the changes to how cycle detection is triggered (the same test issue occurs if the test is added to master and run), however, we can delay it if others believe this is too large a change/risk without complete/functioning tests.

@jemc
Copy link
Member

jemc commented May 26, 2018

As Sean mentioned on the sync call this week, "@sylvanc knows the cycle detector better than anyone", so I'd prefer if he could take a look at this before we merge it.

@jemc
Copy link
Member

jemc commented May 26, 2018

Regarding the CI, well, I don't think it's practical to have failing CI tests on master, so if we know these tests aren't working consistently, but intend to follow up later to fix whatever is making them fail, we should mark the tests as disabled rather than leaving the CI in a non-green state.

@dipinhora
Copy link
Contributor Author

@jemc of course. I didn't mean to suggest that this PR get merged without @sylvanc's review. It was more me explaining why I feel the new test shouldn't necessarily hold up the change.

Also agreed regarding the failing CI test. I left it enabled so others could see the CI failures. I'll mark it as disabled and leave a note here about how it can be run manually instead.

@dipinhora dipinhora force-pushed the make_psycho_detector_better branch from 0bdeb36 to b9333aa Compare May 26, 2018 16:16
@dipinhora
Copy link
Contributor Author

I've marked CodegenTest.CycleDetector as disabled by renaming it to CodegenTest.DISABLED_CycleDetector per gtest guidelines. This ensures the test is compiled but not executed by default.

The test can manually be run by using one of the following commands:

./build/release/libponyc.tests --gtest_filter=CodegenTest.DISABLED_CycleDetector \
  --gtest_also_run_disabled_tests

or

./build/debug/libponyc.tests --gtest_filter=CodegenTest.DISABLED_CycleDetector \
  --gtest_also_run_disabled_tests

@dipinhora dipinhora force-pushed the make_psycho_detector_better branch 3 times, most recently from 81dbb35 to c76ea2b Compare May 28, 2018 01:14
@dipinhora
Copy link
Contributor Author

Took a closer look at the test failures. The test was failing because I missed some edge cases related to the non-timer version of triggering the cycle detector.

The logic has been fixed and the test has been re-enabled (and is passing!).

@dipinhora dipinhora force-pushed the make_psycho_detector_better branch from c76ea2b to 7f0a9b1 Compare June 6, 2018 13:31
@dipinhora
Copy link
Contributor Author

Rebased to resolve merge conflict.

@SeanTAllen SeanTAllen added needs discussion during sync changelog - changed Automatically add "Changed" CHANGELOG entry on merge labels Jun 12, 2018
@SeanTAllen
Copy link
Member

@dipinhora can you rebase this against master?

Prior to this change, the cycle detector would receive
block/unblock messages from actors whenever the actors logically
blocked or unblocked. This resulted in a large amount of overhead
for cycle detection due the large amount of block/unblock messages
that were repeatedly sent from an actor.

This commit changes things so that actors do not send block
messages to the cycle detector until the cycle detector asks an
actor if it is blocked or not. The logic works as follows:

* When an actor is created, the cycle detector is notified so it
  is aware of the new actor.
* When an actor logically blocks, it makes note of that as a flag
  (same as before) but it no longer sends a block message to the
  cycle detector immediately.
* When the ASIO thread starts, it asks the cycle detector to
  create a timer for running the cycle detection check.
* When the timer fires, the cycle detector asks every actor it
  knows of if it is blocked or not and then checks for any cycles
  and does the normal conf/ack processing.
* If an actor is logically blocked and receives a message from
  the cycle detector asking if it is blocked, the actor sends a
  block message to the cycle detector and makes note of that fact
  that it sent a block message in a flag.
* If an actor gets a normal/application message, it immediately
  sends an unblock message to the cycle detector. Same if the
  actor gets a GC acquire or release message.
* If an actor gets a conf message, it sends an ack if it has
  already sent a block (and didn't send an unblock already).
* When the cycle detector receives any messages from actors
  (created, block, unblock, ack, etc), it processes them
  immediately to update it's viewmap and adds actors for deferred
  cycle detection checking when the next time the timer is
  triggered. This allows for any actors that might have received
  new work inbetween blocking and cycle detection to send an
  unblock message (which would remove the actor from the viewmap
  deferred status) and so would not be considered as part of
  cycle detection (making cycle detection more efficient).

The end result is that there is almost no overhead to cycle
detection with this new approach.

Additional changes:

* The `--ponycdmin`, `--ponycdmax`, and `--ponycdconf` parameters
  have been removed because they don't make sense any longer with
  this new approach. They have been replaced with
  `--ponycdinterval` to control how often the cycle detector
  runs it's detection (the interval determines how often the
  timer for detection fires).
* A codegen test has been added to test the cycle detector

There are currently two down sides to this new approach:

* First, this new approach results in every new actor being
  registered with the cycle detector. Previously, actors were
  only registered with the cycle detector if they blocked. My
  naive understanding is that this is not a big deal because
  almost every actor would block at some point in a pony program
  as that is the only way to reach quiescence but I could be
  wrong about that.
@dipinhora dipinhora force-pushed the make_psycho_detector_better branch from 7f0a9b1 to d0ee40a Compare June 17, 2018 04:24
@dipinhora
Copy link
Contributor Author

Rebased to pick up latest changes on master.

@SeanTAllen SeanTAllen merged commit 0a0ef30 into ponylang:master Jun 20, 2018
ponylang-main added a commit that referenced this pull request Jun 20, 2018
@sylvanc
Copy link
Contributor

sylvanc commented Jun 20, 2018

This is great stuff @dipinhora. My only concern is that it can create a message storm when the cycle detector asks all actors if they are blocked. I think this can (in the future) be improved by asking only a sample of actors, in an incremental way. However, let's merge it now, and optimise it more down the road.

Great PR!

@dipinhora
Copy link
Contributor Author

@sylvanc Good point regarding the message storm. I'll modify it to be incremental.

@SeanTAllen I got emailed about your comments but I can't seem to find them in the UI. The following was your question:

+ if((sched->index == 0) && !ponyint_actor_getnoblock())
does the value of ponyint_actor_getnoblock change during the course of a run? if not, these cases should probably be flipped.

The value of ponyint_actor_getnoblock is set during startup. It does not change during the execution of a program run. I'll flip the conditional checks when I address the message storm issue.

@SeanTAllen
Copy link
Member

@dipinhora i deleted the comments because i figured it out.

@dipinhora
Copy link
Contributor Author

@SeanTAllen I'm confused. I'm not sure what you figured out since your comment was correct and that value doesn't change during the runtime of a program.

Are you saying that you figured out that the code should stay as it is?

dipinhora added a commit to dipinhora/ponyc that referenced this pull request Jun 20, 2018
In PR ponylang#2709, @sylvanc pointed out a concern regarding the cycle
detector creating a message storm due to how it would ask all
actors whether they are blocked or not on every interval.

This commit changes the logic so that the cycle detector only
asks set number of actors whether they are blocked or not. If
it is not able to ask all actors, it will resume iterating actors
on the next iteration.

This commit also addresses @SeanTAllen's concern regarding the
conditional check order using `ponyint_actor_getnoblock`.
@SeanTAllen
Copy link
Member

@dipinhora well then, shame on me!

@dipinhora
Copy link
Contributor Author

@sylvanc @SeanTAllen PR #2800 should address both of your concerns.

sylvanc pushed a commit that referenced this pull request Jun 27, 2018
In PR #2709, @sylvanc pointed out a concern regarding the cycle
detector creating a message storm due to how it would ask all
actors whether they are blocked or not on every interval.

This commit changes the logic so that the cycle detector only
asks set number of actors whether they are blocked or not. If
it is not able to ask all actors, it will resume iterating actors
on the next iteration.

This commit also addresses @SeanTAllen's concern regarding the
conditional check order using `ponyint_actor_getnoblock`.
@SeanTAllen SeanTAllen mentioned this pull request Apr 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog - changed Automatically add "Changed" CHANGELOG entry on merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants