New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance comparision with http-streams #98

Closed
larskuhtz opened this Issue Jan 17, 2015 · 33 comments

Comments

Projects
None yet
5 participants
@larskuhtz

larskuhtz commented Jan 17, 2015

Here are some benchmark results of comparing http-client performance with the performance of http-streams for making a large number of requests from (a moderately large number of) concurrent (but otherwise independent) threads:

https://gist.github.com/larskuhtz/d935b119f8b5790e2cda

I wonder what exactly the reasons are for the observed performance differences. I get similar results for POST requests for bodies of up to 64KB. In practice I observed http-streams doing about 3 times as many requests in a given time-frame as as http-client.

The performance differences between the http-streams benchmarks that use an MVar for storing the connection and the benchmarks that use an IORef seem to indicate that usage of mutable values in variables (for instance for managing connections) may play a role. (Interestingly the performance of the IORef version is similar to the MVar version when the connection is written back to the IORef after each usage, even if the connection wasn't reset.)

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jan 17, 2015

Owner

I don't have time to look into this in depth right now. But one thing that jumps out at me is: you're creating a new manager for each benchmark run, which is not recommended usage of http-client, and should not be done in any real-life application. The initial http-streams benchmarks did the same and claimed a performance advantage over http-client, but once that was corrected http-client was in fact faster.

Owner

snoyberg commented Jan 17, 2015

I don't have time to look into this in depth right now. But one thing that jumps out at me is: you're creating a new manager for each benchmark run, which is not recommended usage of http-client, and should not be done in any real-life application. The initial http-streams benchmarks did the same and claimed a performance advantage over http-client, but once that was corrected http-client was in fact faster.

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jan 17, 2015

Owner

Actually, looks like your tests are more complicated than that... anyway, original statement applies, I don't have time to dive into this now.

Owner

snoyberg commented Jan 17, 2015

Actually, looks like your tests are more complicated than that... anyway, original statement applies, I don't have time to dive into this now.

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Jan 17, 2015

I ran tests where a global manager was initialized in main and given to the benchmark. The results were similar, so I removed that case in favor of the simpler setting where the manager is created in the benchmark. (There seems to be a limit on how many benchmarks can be included in a single criterion HTML report.)

larskuhtz commented Jan 17, 2015

I ran tests where a global manager was initialized in main and given to the benchmark. The results were similar, so I removed that case in favor of the simpler setting where the manager is created in the benchmark. (There seems to be a limit on how many benchmarks can be included in a single criterion HTML report.)

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Jan 20, 2015

When profiling an application that heavily uses http-client I noticed that almost 40% of the time is spent in calls to System.Timeout.timeout:

COST CENTRE                        MODULE                         %time %alloc

timeout                            Network.HTTP.Client.Response    19.3    4.2
timeout                            Network.HTTP.Client.Request     18.3    4.0
socketConnection                   Network.HTTP.Client.Connection  11.2   11.1
...

In the benchmarks I don't implement a timeout handler for http-streams [*]. I wonder if the usage of timeout could explain the observed differences in the performance of http-client and http-streams?

[*] In fact when doing HTTP connections in the backend of HTTP servers we often wouldn't need a timeout handler because there is already a global timeout handler for the request thread.

larskuhtz commented Jan 20, 2015

When profiling an application that heavily uses http-client I noticed that almost 40% of the time is spent in calls to System.Timeout.timeout:

COST CENTRE                        MODULE                         %time %alloc

timeout                            Network.HTTP.Client.Response    19.3    4.2
timeout                            Network.HTTP.Client.Request     18.3    4.0
socketConnection                   Network.HTTP.Client.Connection  11.2   11.1
...

In the benchmarks I don't implement a timeout handler for http-streams [*]. I wonder if the usage of timeout could explain the observed differences in the performance of http-client and http-streams?

[*] In fact when doing HTTP connections in the backend of HTTP servers we often wouldn't need a timeout handler because there is already a global timeout handler for the request thread.

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Jan 20, 2015

Explicitly setting managerResponseTimeout = Nothing almost doubled the number of requests per second in my application. It's not yet on a par with the version that uses http-streams, but much better.

larskuhtz commented Jan 20, 2015

Explicitly setting managerResponseTimeout = Nothing almost doubled the number of requests per second in my application. It's not yet on a par with the version that uses http-streams, but much better.

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Jan 20, 2015

I wonder if it would be possible to use the same approach to timeout that warp uses, namely having a single global timeout handler (Network.Wai.Handler.Warp.Timeout)?

larskuhtz commented Jan 20, 2015

I wonder if it would be possible to use the same approach to timeout that warp uses, namely having a single global timeout handler (Network.Wai.Handler.Warp.Timeout)?

snoyberg added a commit that referenced this issue Jan 21, 2015

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jan 21, 2015

Owner

I haven't tested this seriously yet, but I've pushed some code to the faster-timeout branch that, when using the multithreaded runtime, using the event manager to hopefully be more efficient. Would you be able to test it out?

Owner

snoyberg commented Jan 21, 2015

I haven't tested this seriously yet, but I've pushed some code to the faster-timeout branch that, when using the multithreaded runtime, using the event manager to hopefully be more efficient. Would you be able to test it out?

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Jan 22, 2015

I'll give it a try probably tomorrow.

larskuhtz commented Jan 22, 2015

I'll give it a try probably tomorrow.

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Jan 25, 2015

I linked the benchmarks against the faster-timeout branch. It seems to have little effect on the performance.

I updated https://gist.github.com/larskuhtz/d935b119f8b5790e2cda with some new results (that use the faster-timeout branch) that include benchmarks for http-client with managerResponseTimeout = Nothing. That makes a big difference which may indicate the issue is not caused by the implementation of timeout in System.Timeout but is sitting deeper somewhere in getSystemTimerManager from GHC.Event.

larskuhtz commented Jan 25, 2015

I linked the benchmarks against the faster-timeout branch. It seems to have little effect on the performance.

I updated https://gist.github.com/larskuhtz/d935b119f8b5790e2cda with some new results (that use the faster-timeout branch) that include benchmarks for http-client with managerResponseTimeout = Nothing. That makes a big difference which may indicate the issue is not caused by the implementation of timeout in System.Timeout but is sitting deeper somewhere in getSystemTimerManager from GHC.Event.

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Jan 25, 2015

Here are some results from profiling some code that does makes 50000 requests from 50 threads with the faster-timeout branch:

COST CENTRE                        MODULE                         %time %alloc

E.registerTimeout                  Network.HTTP.Client.Util        18.5    3.3
timeoutMT                          Network.HTTP.Client.Util        15.7    2.6
socketConnection                   Network.HTTP.Client.Connection  11.2   11.3
...

larskuhtz commented Jan 25, 2015

Here are some results from profiling some code that does makes 50000 requests from 50 threads with the faster-timeout branch:

COST CENTRE                        MODULE                         %time %alloc

E.registerTimeout                  Network.HTTP.Client.Util        18.5    3.3
timeoutMT                          Network.HTTP.Client.Util        15.7    2.6
socketConnection                   Network.HTTP.Client.Connection  11.2   11.3
...
@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Jan 25, 2015

Here are two screenshots from threadscope. Both from an application that uses http-client to make HTTP requests through 32 concurrent threads using TCP 32 connections with +RTS -N8.

The first screenshot is from a run with managerResponseTimeout = Just 50000000:

threadscope-with-timeout

The second screenshot is from a run with managerResponseTimeout = Nothing:

threadscope-without-timeout

The differences are quite obvious, though have no idea yet how the usage of timeout may be causing these differences in the behavior.

larskuhtz commented Jan 25, 2015

Here are two screenshots from threadscope. Both from an application that uses http-client to make HTTP requests through 32 concurrent threads using TCP 32 connections with +RTS -N8.

The first screenshot is from a run with managerResponseTimeout = Just 50000000:

threadscope-with-timeout

The second screenshot is from a run with managerResponseTimeout = Nothing:

threadscope-without-timeout

The differences are quite obvious, though have no idea yet how the usage of timeout may be causing these differences in the behavior.

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Jan 26, 2015

In the event logs for the run with a timeout I noticed a lot of context switching due to apparent chains of threads being blocked on black holes owned by threads on different caps.

I wonder if this could be related to GHC.Event.TimerManager.registerTimeout updating the timeout queue lazily. Even though the respective comment in the code disappeared in base-4.8 maybe https://ghc.haskell.org/trac/ghc/ticket/3838 (or something similar) still is an issue? But I don't know enough about the details of the GHC RTS to really understand what is going on.

larskuhtz commented Jan 26, 2015

In the event logs for the run with a timeout I noticed a lot of context switching due to apparent chains of threads being blocked on black holes owned by threads on different caps.

I wonder if this could be related to GHC.Event.TimerManager.registerTimeout updating the timeout queue lazily. Even though the respective comment in the code disappeared in base-4.8 maybe https://ghc.haskell.org/trac/ghc/ticket/3838 (or something similar) still is an issue? But I don't know enough about the details of the GHC RTS to really understand what is going on.

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jan 26, 2015

Owner

Just for the record, I'm not ignoring this issue, but my travel schedule
does not allow me time to focus on it right now.

On Sun, Jan 25, 2015, 9:35 PM Lars Kuhtz notifications@github.com wrote:

In the event logs for the run with a timeout I noticed a lot of context
switching due to apparent chains of threads being blocked on black holes
owned by threads on different caps.

I wonder if this could be related to
GHC.Event.TimerManager.registerTimeout updating the timeout queue lazily.
Even though the respective comment in the code disappeared in base-4.8
maybe https://ghc.haskell.org/trac/ghc/ticket/3838 (or something similar)
still is an issue? But I don't know enough about the details of the GHC RTS
to really understand what is going on.


Reply to this email directly or view it on GitHub
#98 (comment).

Owner

snoyberg commented Jan 26, 2015

Just for the record, I'm not ignoring this issue, but my travel schedule
does not allow me time to focus on it right now.

On Sun, Jan 25, 2015, 9:35 PM Lars Kuhtz notifications@github.com wrote:

In the event logs for the run with a timeout I noticed a lot of context
switching due to apparent chains of threads being blocked on black holes
owned by threads on different caps.

I wonder if this could be related to
GHC.Event.TimerManager.registerTimeout updating the timeout queue lazily.
Even though the respective comment in the code disappeared in base-4.8
maybe https://ghc.haskell.org/trac/ghc/ticket/3838 (or something similar)
still is an issue? But I don't know enough about the details of the GHC RTS
to really understand what is going on.


Reply to this email directly or view it on GitHub
#98 (comment).

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Jan 27, 2015

I did some benchmarking with System.Timeout (which just wraps the functions from GHC.Event.TimerManager).

The registerTimeout function updates the timeout priority-queue that is shared between all threads (on the same capability, I think) through an IORef. It seems to me that this queue should always be fairly small (about the number of concurrent threads), so the computational overhead of the update should be only moderate. To figure out the overhead due to the synchronization I included scenarios where instead of calling timeout the threads globally synchronize through updating shared variables (MVars or IORefs). While one definitely has to pay a price for the synchronization, it seems to me that this is not enough to fully explain the observed overhead of timeout. But at this point I have no idea what else could be causing the overhead and if it could be avoided.

I included all code that I used for benchmarking this issue into this repository: https://github.com/alephcloud/hs-service-benchmark-tool

The criterion results are available through the repository homepage: http://alephcloud.github.io/hs-service-benchmark-tool/

larskuhtz commented Jan 27, 2015

I did some benchmarking with System.Timeout (which just wraps the functions from GHC.Event.TimerManager).

The registerTimeout function updates the timeout priority-queue that is shared between all threads (on the same capability, I think) through an IORef. It seems to me that this queue should always be fairly small (about the number of concurrent threads), so the computational overhead of the update should be only moderate. To figure out the overhead due to the synchronization I included scenarios where instead of calling timeout the threads globally synchronize through updating shared variables (MVars or IORefs). While one definitely has to pay a price for the synchronization, it seems to me that this is not enough to fully explain the observed overhead of timeout. But at this point I have no idea what else could be causing the overhead and if it could be avoided.

I included all code that I used for benchmarking this issue into this repository: https://github.com/alephcloud/hs-service-benchmark-tool

The criterion results are available through the repository homepage: http://alephcloud.github.io/hs-service-benchmark-tool/

@gregwebs

This comment has been minimized.

Show comment
Hide comment
@gregwebs

gregwebs Jan 28, 2015

Contributor

perhaps @kazu-yamamoto is interested in looking at this timeout issue.

Contributor

gregwebs commented Jan 28, 2015

perhaps @kazu-yamamoto is interested in looking at this timeout issue.

@kazu-yamamoto

This comment has been minimized.

Show comment
Hide comment
@kazu-yamamoto

kazu-yamamoto Jan 29, 2015

Contributor

I don't have time to dig this issue now. But seeing threadscope figures, I guess there is a global lock somewhere.

General note:

Contributor

kazu-yamamoto commented Jan 29, 2015

I don't have time to dig this issue now. But seeing threadscope figures, I guess there is a global lock somewhere.

General note:

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Feb 10, 2015

Owner

I've just pushed code to a new branch (98-alternate-timeout) which has a very lightly tested alternate implementation of timeout, which when I tested on my machine seemed to help the performance of this benchmark (thank you for such a solid benchmark by the way). Can you confirm my benchmark results? If in fact this is more efficient, I'll clean up the code a bit before releasing. It would also probably be worthwhile to report this as a bug to GHC.

Owner

snoyberg commented Feb 10, 2015

I've just pushed code to a new branch (98-alternate-timeout) which has a very lightly tested alternate implementation of timeout, which when I tested on my machine seemed to help the performance of this benchmark (thank you for such a solid benchmark by the way). Can you confirm my benchmark results? If in fact this is more efficient, I'll clean up the code a bit before releasing. It would also probably be worthwhile to report this as a bug to GHC.

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Feb 16, 2015

Owner

@larskuhtz Have you had a chance to try out that branch?

Owner

snoyberg commented Feb 16, 2015

@larskuhtz Have you had a chance to try out that branch?

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Feb 16, 2015

@snoyberg I read through the code and it looks good; but I haven't yet tried it. I'll try to get to it early this week.

larskuhtz commented Feb 16, 2015

@snoyberg I read through the code and it looks good; but I haven't yet tried it. I'll try to get to it early this week.

@snoyberg snoyberg referenced this issue Feb 17, 2015

Closed

Provide optimized timeout implementation #36

0 of 5 tasks complete
@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Feb 23, 2015

Owner

@larskuhtz Any update on this?

Owner

snoyberg commented Feb 23, 2015

@larskuhtz Any update on this?

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Mar 2, 2015

Owner

I've gone ahead and released this to Hackage. If the problem persists, please reopen.

Owner

snoyberg commented Mar 2, 2015

I've gone ahead and released this to Hackage. If the problem persists, please reopen.

@snoyberg snoyberg closed this Mar 2, 2015

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Mar 2, 2015

@snoyberg sorry that it took me so long to respond. I ran the benchmarks with above branch. The results looked somewhat better, but not at the order that I would have expected. There is still a large penalty introduced by enabling the timeout. So I was wondering if I made a mistake when I did the build or changed something in the benchmark setup or environment (in the end I run the tests on my laptop which introduces quite a bit of noise).

In concrete I saw about ~25% improvement in in the criterion benchmarks for 8 and more threads for the cases with timeout enabled. But even with that the runtime for 16 threads and more is still about twice as much as for the case where there is no timeout.

The thread scope diagrams still show a low degree of parallel work. Here is a thread scope sample with timeout set to a large value:

screen-shot 2015-03-02 at 9 35 32 am

Here is the corresponding sample without a timeout:

screen-shot 2015-03-02 at 9 51 21 am

So next I'll check more carefully if my build actually used above branch before I do any further testing...

larskuhtz commented Mar 2, 2015

@snoyberg sorry that it took me so long to respond. I ran the benchmarks with above branch. The results looked somewhat better, but not at the order that I would have expected. There is still a large penalty introduced by enabling the timeout. So I was wondering if I made a mistake when I did the build or changed something in the benchmark setup or environment (in the end I run the tests on my laptop which introduces quite a bit of noise).

In concrete I saw about ~25% improvement in in the criterion benchmarks for 8 and more threads for the cases with timeout enabled. But even with that the runtime for 16 threads and more is still about twice as much as for the case where there is no timeout.

The thread scope diagrams still show a low degree of parallel work. Here is a thread scope sample with timeout set to a large value:

screen-shot 2015-03-02 at 9 35 32 am

Here is the corresponding sample without a timeout:

screen-shot 2015-03-02 at 9 51 21 am

So next I'll check more carefully if my build actually used above branch before I do any further testing...

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Mar 2, 2015

I repeated the tests with http-client 0.4.7.2. The results match what I described in the previous comment. Here are some criterion results:

screen-shot 2015-03-02 at 10 55 37 am

(Note that the http-streams versions don't implement a timeout.)

larskuhtz commented Mar 2, 2015

I repeated the tests with http-client 0.4.7.2. The results match what I described in the previous comment. Here are some criterion results:

screen-shot 2015-03-02 at 10 55 37 am

(Note that the http-streams versions don't implement a timeout.)

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Mar 3, 2015

Owner

I think it's clear now that the problem is in timeout implementations. I'd suggest as next step coming up with a benchmark that demonstrates the timeout slowdown on its own, and then we can talk to GHC HQ about trying to fix the RTS.

Owner

snoyberg commented Mar 3, 2015

I think it's clear now that the problem is in timeout implementations. I'd suggest as next step coming up with a benchmark that demonstrates the timeout slowdown on its own, and then we can talk to GHC HQ about trying to fix the RTS.

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Mar 4, 2015

I did some benchmarks of just the timeout from System.Timeout. It is exposing the low level of parallel work in the ThreadScope diagrams.

The results are here: http://alephcloud.github.io/hs-service-benchmark-tool/

The benchmarks also compare the behavior of timeout in comparison with setups where all threads synchronize through MVars and IORefs. The results indicate that there is a performance penalty on top of the usage of mutable synchronization variables in the implementation of timeout.

larskuhtz commented Mar 4, 2015

I did some benchmarks of just the timeout from System.Timeout. It is exposing the low level of parallel work in the ThreadScope diagrams.

The results are here: http://alephcloud.github.io/hs-service-benchmark-tool/

The benchmarks also compare the behavior of timeout in comparison with setups where all threads synchronize through MVars and IORefs. The results indicate that there is a performance penalty on top of the usage of mutable synchronization variables in the implementation of timeout.

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Mar 6, 2015

Owner

@kazu-yamamoto @AndreasVoellmy Do you think the information above from @larskuhtz would be useful as a GHC Trac ticket? Does it indicate a problem in the RTS?

Owner

snoyberg commented Mar 6, 2015

@kazu-yamamoto @AndreasVoellmy Do you think the information above from @larskuhtz would be useful as a GHC Trac ticket? Does it indicate a problem in the RTS?

@AndreasVoellmy

This comment has been minimized.

Show comment
Hide comment
@AndreasVoellmy

AndreasVoellmy Mar 6, 2015

Hi, @larskuhtz thanks for raising this and everyone for chasing this down. I haven't yet read through all the details yet, but I intend to do so soon.

AndreasVoellmy commented Mar 6, 2015

Hi, @larskuhtz thanks for raising this and everyone for chasing this down. I haven't yet read through all the details yet, but I intend to do so soon.

@kazu-yamamoto

This comment has been minimized.

Show comment
Hide comment
@kazu-yamamoto

kazu-yamamoto Mar 19, 2015

Contributor

Now I have read all comments and code carefully. My guess is that timeout causes huge contention on Priority Search Queue which is not striped. Note that IO Manager is using a striped callback table: Array Int (MVar (IntTable [FdData])). (IO Manager has another striped structure: an IO manager is spawn for each core.)

Contributor

kazu-yamamoto commented Mar 19, 2015

Now I have read all comments and code carefully. My guess is that timeout causes huge contention on Priority Search Queue which is not striped. Note that IO Manager is using a striped callback table: Array Int (MVar (IntTable [FdData])). (IO Manager has another striped structure: an IO manager is spawn for each core.)

@kazu-yamamoto

This comment has been minimized.

Show comment
Hide comment
@kazu-yamamoto

kazu-yamamoto Mar 19, 2015

Contributor

But I don't understand why 84e054a does not improve the performance drastically.

Contributor

kazu-yamamoto commented Mar 19, 2015

But I don't understand why 84e054a does not improve the performance drastically.

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Mar 19, 2015

@kazu-yamamoto I haven't looked a the more low-level aspects of the IO manager. Why would that contention affect only the benchmarks with timeout, but not the benchmarks without? Does the excessive usage of timeout induce more expensive operations on that queue?

There is also a queue (GHC.Event.PSQ) used in GHC.Event.TimeoutManager to keep track of the timeouts. I tried to rule out the influence of that queue by comparing with a version of the benchmarks where all threads synchronize just through mutable variables instead of using a timeout. If the shared timeout queue would be the problem I would expect that the latter benchmarks would suffer from the same performance drawbacks, which they don't.

larskuhtz commented Mar 19, 2015

@kazu-yamamoto I haven't looked a the more low-level aspects of the IO manager. Why would that contention affect only the benchmarks with timeout, but not the benchmarks without? Does the excessive usage of timeout induce more expensive operations on that queue?

There is also a queue (GHC.Event.PSQ) used in GHC.Event.TimeoutManager to keep track of the timeouts. I tried to rule out the influence of that queue by comparing with a version of the benchmarks where all threads synchronize just through mutable variables instead of using a timeout. If the shared timeout queue would be the problem I would expect that the latter benchmarks would suffer from the same performance drawbacks, which they don't.

@kazu-yamamoto

This comment has been minimized.

Show comment
Hide comment
@kazu-yamamoto

kazu-yamamoto Mar 20, 2015

Contributor

In your Timeout.hs, the number of contention in the mvar and IORef version is just 1000 while that of the timeout version is 1000 * 10000000. Note that System.timeout itself has the ciritical section (ie IORef to PSQ).

Contributor

kazu-yamamoto commented Mar 20, 2015

In your Timeout.hs, the number of contention in the mvar and IORef version is just 1000 while that of the timeout version is 1000 * 10000000. Note that System.timeout itself has the ciritical section (ie IORef to PSQ).

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Mar 20, 2015

Here is how I understand the code: there are exactly threadN * 1000 calls to timeout and to modifyMVar_:

-- | Run 'threadN' threads which each call 'getCurrentTime'
-- 1000 times with a timeout of 10 seconds (which never triggers).
--
run2 :: Int -> IO ()
run2 n = forkN threadN $
    replicateM_ 1000 . timeout 10000000 . replicateM_ n $ getCurrentTime

-- | All threads share an 'MVar' that each thread updates 1000 times.
--
run3 :: Int -> IO ()
run3 n = do
    mvar <- newMVar [0 :: Int]
    forkN threadN $
        replicateM_ 1000 $ do
            modifyMVar_ mvar (return . func)
            replicateM_ n $ getCurrentTime

In GHC.Event.TimerManager each call of timeout results in

  • a call to registerTimeout and in turn to editTimeout which calls atomicModifyIORef' on the TimerManager,
  • (when the replicateM_ n $ getCurrentTime returns) a call to unregisterTimeout which again calls editTimeout, and
  • a call to wakeManager which triggers a call of step which in turn calls atomicModifyIORef' on the TimerManger.

These three calls happen sequentially. Howerver, I think, the trigger of step is asynchronous and may thus overlap with subsequent timeout calls or timeout calls from other threads. Maybe this overlapping can pile up and cause to many context switches?

larskuhtz commented Mar 20, 2015

Here is how I understand the code: there are exactly threadN * 1000 calls to timeout and to modifyMVar_:

-- | Run 'threadN' threads which each call 'getCurrentTime'
-- 1000 times with a timeout of 10 seconds (which never triggers).
--
run2 :: Int -> IO ()
run2 n = forkN threadN $
    replicateM_ 1000 . timeout 10000000 . replicateM_ n $ getCurrentTime

-- | All threads share an 'MVar' that each thread updates 1000 times.
--
run3 :: Int -> IO ()
run3 n = do
    mvar <- newMVar [0 :: Int]
    forkN threadN $
        replicateM_ 1000 $ do
            modifyMVar_ mvar (return . func)
            replicateM_ n $ getCurrentTime

In GHC.Event.TimerManager each call of timeout results in

  • a call to registerTimeout and in turn to editTimeout which calls atomicModifyIORef' on the TimerManager,
  • (when the replicateM_ n $ getCurrentTime returns) a call to unregisterTimeout which again calls editTimeout, and
  • a call to wakeManager which triggers a call of step which in turn calls atomicModifyIORef' on the TimerManger.

These three calls happen sequentially. Howerver, I think, the trigger of step is asynchronous and may thus overlap with subsequent timeout calls or timeout calls from other threads. Maybe this overlapping can pile up and cause to many context switches?

@larskuhtz

This comment has been minimized.

Show comment
Hide comment
@larskuhtz

larskuhtz Mar 20, 2015

There is quite a bit of complexity involved in the implementation of I.poll (emBackend mgr) (Just timeout) (handleControlEvent mgr) which is called in step. This includes a call to poll and possibly a call to readControlMessage.

larskuhtz commented Mar 20, 2015

There is quite a bit of complexity involved in the implementation of I.poll (emBackend mgr) (Just timeout) (handleControlEvent mgr) which is called in step. This includes a call to poll and possibly a call to readControlMessage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment