Add option to limit number of concurrent tasks. #2

alloy · 2013-06-07T10:30:56Z

As I’m in full preparation mode for WWDC, I did not have time to make this into a full fledged patch yet, but I did want to open up discussion on it now.

This current patch limits the number of concurrent tasks to a given number and defaults to twice the number of CPU cores. (Somewhat arbitrary, but likely a good default.)

The reason for this is that GCD will happily spawn as much threads as needed, but it does not take into account other stress on the machine, e.g. memory usage. So if you have a task that takes up a lot of memory per task, your system comes to a grinding halt before it ever finishes.

I’m thinking that it would be nice to be able to set a global default, or if not specified, default to something like twice the CPU cores. It would also be nice to be able to specify that a job does not need to be limited, but it should probably be done per job (not global). I.e. the shorthand methods all check the global default value, whereas the verbose methods /need/ a value, which could be 0 meaning ‘no limit’.

…t to twice the number of CPU cores.

alloy · 2013-06-07T10:32:16Z

Oh, and I only applied this to NSArray+map for now, but the same applies to all other methods.

kastiglione · 2013-06-07T16:31:00Z

Nice, it's good to get real world results. I was lead to believe that GCD was smarter than that about spawning threads. I didn't expect it to take memory use into consideration, but I did believe it factored CPU resources and availability into it's decision making.

Did you have any alternatives to the term task? I ask because it does have other existing usages (NSTask). The first thought I had was threadCountLimit, but it's a bit of a lie since this library interfaces with GCD, not threads directly.

This is where a benchmark in the project would come in handy. Part of me feels that I'd prefer not to impose a default cap on the number of simultaneous invocations (2 x #cores), unless a benchmark showed it to be better (for non-memory intensive tasks). I only say this because all things being equal, I'd rather defer to GCD to handle concurrency. Since that is its purpose for being. For tasks that take up large amounts of memory, maybe it should be up to the developer to call a method that explicitly limits the number of tasks. Even without this change, they could always use a semaphore in the calling code to achieve the limit. I'm just thinking aloud, I'm not opposed to a default limit.

On the other hand, perhaps the best way to handle this is to get rid of the vanilla methods (-cco_concurrentMap:, cco_concurrentFilter:), and force the caller to specify the concurrency level. For example, the basic methods would become -cco_concurrentWithTaskLimit:map: and -cco_concurrentWithTaskLimit:filter:.

alloy · 2013-06-08T20:50:43Z

Did you have any alternatives to the term task? I ask because it does have other existing usages (NSTask). The first thought I had was threadCountLimit, but it's a bit of a lie since this library interfaces with GCD, not threads directly.

After glancing at the GCD docs, ‘operation’ might be more appropriate.

For tasks that take up large amounts of memory, maybe it should be up to the developer to call a method that explicitly limits the number of tasks.

On second thought, I believe you’re right.

Even without this change, they could always use a semaphore in the calling code to achieve the limit.

I was also considering this, but, unless the code path that has no limit still imposes some performance penalty, I felt that the lib should acknowledge the potential issue and provide a solution.

On the other hand, perhaps the best way to handle this is to get rid of the vanilla methods (-cco_concurrentMap:, cco_concurrentFilter:), and force the caller to specify the concurrency level. For example, the basic methods would become -cco_concurrentWithTaskLimit:map: and -cco_concurrentWithTaskLimit:filter:.

Nah, I would keep the sugar and default to no limit, as you suggested. The good thing about a lib with not a lot of methods is that people with the memory issue will be able to easily find the method that allows you to limit the number of operations.

kastiglione · 2013-06-08T22:46:20Z

After glancing at the GCD docs, ‘operation’ might be more appropriate.

Haha, I did too and saw that 'task' get's used quite frequently. The docs for dispatch_apply calls the block a 'task block'. I'm fine with 'task'. There's also NSOperation, it's probably not possible to pick a term that doesn't have another meaning.

As I’m in full preparation mode for WWDC, I did not have time to make this into a full fledged patch yet, but I did want to open up discussion on it now.

Do you want to finish this patch, or do you want someone else to? I only ask because I presume that managing CocoaPods would make delegation your M.O.

kastiglione · 2013-06-10T04:48:08Z

I was reading the man page for dispatch_apply, and came across the section on "striding" for performance tuning. I was unfamiliar with striding, it reminds me of loop unrolling.

I wonder how it relates, if it all, to the task limits discussed here. It seems like striding could potentially address the performance troubles you ran into, by serializing execution in small doses, but it doesn't give as direct control as a semaphore based limit. On first impression, it seems like it could be a worthwhile addition to the interface, but adding both a task limit and striding seems clunky.

Just mentioning it, no response required.

alloy · 2013-06-10T17:46:21Z

Haha, I did too and saw that 'task' get's used quite frequently. The docs for dispatch_apply calls the block a 'task block'. I'm fine with 'task'. There's also NSOperation, it's probably not possible to pick a term that doesn't have another meaning.

After posting my response, I realised that the better argument for ‘operation’ over ‘task’ is that the lib name uses the label ‘operations’ :)

Do you want to finish this patch, or do you want someone else to? I only ask because I presume that managing CocoaPods would make delegation your M.O.

Hah, yeah that’s definitely true for the CP stuff, but in this case I’m really open for creating the patch. But if, in the meantime, someone else wants to implement it, I’m totally fine with that as well. I have no preference.

I was reading the man page for dispatch_apply, and came across the section on "striding" for performance tuning. I was unfamiliar with striding, it reminds me of loop unrolling.

I think it’s better to wait for someone to come along that actually has a situation where dispatching operations is more costly than combining a couple of operations.

kastiglione · 2013-06-10T19:02:14Z

Hah, yeah that’s definitely true for the CP stuff, but in this case I’m really open for creating the patch.

Great, I wasn't sure so I asked.

I think it’s better to wait for someone to come along that actually has a situation where dispatching operations is more costly than combining a couple of operations.

Busted, but you're right, YAGNI, damnit!

alloy · 2013-06-25T08:43:41Z

Btw, in case you have access to the as of yet REDACTED APIs, check the memory related additions to GCD ;)

Which also begs the question, should we then still add a built-in facility for this, or explain it in docs or a sample?

kastiglione · 2013-06-25T17:52:49Z

I had skimmed right past those, thanks for the heads up. Interesting additions.

Honestly, I don't know the answer to that question. I'd want to run some experiments. I'm thinking maybe our API needs to offer a way to allow callers to dynamically throttle of the internal dispatch_each. I don't know what that looks like…

alloy · 2013-07-17T13:56:41Z

Let’s close this for now and see if others come up with similar questions.

kastiglione · 2013-07-17T14:14:08Z

I'll create an issue to track it, and mention the issue in the README. Thanks for highlighting the issue.

[Prototype] Add option to limit number of concurrent tasks and defaul…

d518a10

…t to twice the number of CPU cores.

alloy closed this Jul 17, 2013

kastiglione mentioned this pull request Jul 17, 2013

Memory pressure handling #6

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to limit number of concurrent tasks. #2

Add option to limit number of concurrent tasks. #2

alloy commented Jun 7, 2013

alloy commented Jun 7, 2013

kastiglione commented Jun 7, 2013

alloy commented Jun 8, 2013

kastiglione commented Jun 8, 2013

kastiglione commented Jun 10, 2013

alloy commented Jun 10, 2013

kastiglione commented Jun 10, 2013

alloy commented Jun 25, 2013

kastiglione commented Jun 25, 2013

alloy commented Jul 17, 2013

kastiglione commented Jul 17, 2013

Add option to limit number of concurrent tasks. #2

Add option to limit number of concurrent tasks. #2

Conversation

alloy commented Jun 7, 2013

alloy commented Jun 7, 2013

kastiglione commented Jun 7, 2013

alloy commented Jun 8, 2013

kastiglione commented Jun 8, 2013

kastiglione commented Jun 10, 2013

alloy commented Jun 10, 2013

kastiglione commented Jun 10, 2013

alloy commented Jun 25, 2013

kastiglione commented Jun 25, 2013

alloy commented Jul 17, 2013

kastiglione commented Jul 17, 2013