Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to limit number of concurrent tasks. #2

Closed
wants to merge 1 commit into from

Conversation

alloy
Copy link
Collaborator

@alloy alloy commented Jun 7, 2013

As I’m in full preparation mode for WWDC, I did not have time to make this into a full fledged patch yet, but I did want to open up discussion on it now.

This current patch limits the number of concurrent tasks to a given number and defaults to twice the number of CPU cores. (Somewhat arbitrary, but likely a good default.)

The reason for this is that GCD will happily spawn as much threads as needed, but it does not take into account other stress on the machine, e.g. memory usage. So if you have a task that takes up a lot of memory per task, your system comes to a grinding halt before it ever finishes.

I’m thinking that it would be nice to be able to set a global default, or if not specified, default to something like twice the CPU cores. It would also be nice to be able to specify that a job does not need to be limited, but it should probably be done per job (not global). I.e. the shorthand methods all check the global default value, whereas the verbose methods /need/ a value, which could be 0 meaning ‘no limit’.

@alloy
Copy link
Collaborator Author

alloy commented Jun 7, 2013

Oh, and I only applied this to NSArray+map for now, but the same applies to all other methods.

@kastiglione
Copy link
Owner

Nice, it's good to get real world results. I was lead to believe that GCD was smarter than that about spawning threads. I didn't expect it to take memory use into consideration, but I did believe it factored CPU resources and availability into it's decision making.

Did you have any alternatives to the term task? I ask because it does have other existing usages (NSTask). The first thought I had was threadCountLimit, but it's a bit of a lie since this library interfaces with GCD, not threads directly.

This is where a benchmark in the project would come in handy. Part of me feels that I'd prefer not to impose a default cap on the number of simultaneous invocations (2 x #cores), unless a benchmark showed it to be better (for non-memory intensive tasks). I only say this because all things being equal, I'd rather defer to GCD to handle concurrency. Since that is its purpose for being. For tasks that take up large amounts of memory, maybe it should be up to the developer to call a method that explicitly limits the number of tasks. Even without this change, they could always use a semaphore in the calling code to achieve the limit. I'm just thinking aloud, I'm not opposed to a default limit.

On the other hand, perhaps the best way to handle this is to get rid of the vanilla methods (-cco_concurrentMap:, cco_concurrentFilter:), and force the caller to specify the concurrency level. For example, the basic methods would become -cco_concurrentWithTaskLimit:map: and -cco_concurrentWithTaskLimit:filter:.

@alloy
Copy link
Collaborator Author

alloy commented Jun 8, 2013

Did you have any alternatives to the term task? I ask because it does have other existing usages (NSTask). The first thought I had was threadCountLimit, but it's a bit of a lie since this library interfaces with GCD, not threads directly.

After glancing at the GCD docs, ‘operation’ might be more appropriate.

For tasks that take up large amounts of memory, maybe it should be up to the developer to call a method that explicitly limits the number of tasks.

On second thought, I believe you’re right.

Even without this change, they could always use a semaphore in the calling code to achieve the limit.

I was also considering this, but, unless the code path that has no limit still imposes some performance penalty, I felt that the lib should acknowledge the potential issue and provide a solution.

On the other hand, perhaps the best way to handle this is to get rid of the vanilla methods (-cco_concurrentMap:, cco_concurrentFilter:), and force the caller to specify the concurrency level. For example, the basic methods would become -cco_concurrentWithTaskLimit:map: and -cco_concurrentWithTaskLimit:filter:.

Nah, I would keep the sugar and default to no limit, as you suggested. The good thing about a lib with not a lot of methods is that people with the memory issue will be able to easily find the method that allows you to limit the number of operations.

@kastiglione
Copy link
Owner

After glancing at the GCD docs, ‘operation’ might be more appropriate.

Haha, I did too and saw that 'task' get's used quite frequently. The docs for dispatch_apply calls the block a 'task block'. I'm fine with 'task'. There's also NSOperation, it's probably not possible to pick a term that doesn't have another meaning.

As I’m in full preparation mode for WWDC, I did not have time to make this into a full fledged patch yet, but I did want to open up discussion on it now.

Do you want to finish this patch, or do you want someone else to? I only ask because I presume that managing CocoaPods would make delegation your M.O.

@kastiglione
Copy link
Owner

I was reading the man page for dispatch_apply, and came across the section on "striding" for performance tuning. I was unfamiliar with striding, it reminds me of loop unrolling.

I wonder how it relates, if it all, to the task limits discussed here. It seems like striding could potentially address the performance troubles you ran into, by serializing execution in small doses, but it doesn't give as direct control as a semaphore based limit. On first impression, it seems like it could be a worthwhile addition to the interface, but adding both a task limit and striding seems clunky.

Just mentioning it, no response required.

@alloy
Copy link
Collaborator Author

alloy commented Jun 10, 2013

Haha, I did too and saw that 'task' get's used quite frequently. The docs for dispatch_apply calls the block a 'task block'. I'm fine with 'task'. There's also NSOperation, it's probably not possible to pick a term that doesn't have another meaning.

After posting my response, I realised that the better argument for ‘operation’ over ‘task’ is that the lib name uses the label ‘operations’ :)

Do you want to finish this patch, or do you want someone else to? I only ask because I presume that managing CocoaPods would make delegation your M.O.

Hah, yeah that’s definitely true for the CP stuff, but in this case I’m really open for creating the patch. But if, in the meantime, someone else wants to implement it, I’m totally fine with that as well. I have no preference.

I was reading the man page for dispatch_apply, and came across the section on "striding" for performance tuning. I was unfamiliar with striding, it reminds me of loop unrolling.

I think it’s better to wait for someone to come along that actually has a situation where dispatching operations is more costly than combining a couple of operations.

@kastiglione
Copy link
Owner

Hah, yeah that’s definitely true for the CP stuff, but in this case I’m really open for creating the patch.

Great, I wasn't sure so I asked.

I think it’s better to wait for someone to come along that actually has a situation where dispatching operations is more costly than combining a couple of operations.

Busted, but you're right, YAGNI, damnit!

@alloy
Copy link
Collaborator Author

alloy commented Jun 25, 2013

Btw, in case you have access to the as of yet REDACTED APIs, check the memory related additions to GCD ;)

Which also begs the question, should we then still add a built-in facility for this, or explain it in docs or a sample?

@kastiglione
Copy link
Owner

I had skimmed right past those, thanks for the heads up. Interesting additions.

Honestly, I don't know the answer to that question. I'd want to run some experiments. I'm thinking maybe our API needs to offer a way to allow callers to dynamically throttle of the internal dispatch_each. I don't know what that looks like…

@alloy
Copy link
Collaborator Author

alloy commented Jul 17, 2013

Let’s close this for now and see if others come up with similar questions.

@alloy alloy closed this Jul 17, 2013
@kastiglione
Copy link
Owner

I'll create an issue to track it, and mention the issue in the README. Thanks for highlighting the issue.

@kastiglione kastiglione mentioned this pull request Jul 17, 2013
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants