Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate existing weight APIs #111

Closed
nikomatsakis opened this issue Oct 19, 2016 · 5 comments
Closed

deprecate existing weight APIs #111

nikomatsakis opened this issue Oct 19, 2016 · 5 comments
Assignees
Labels
Milestone

Comments

@nikomatsakis
Copy link
Member

Per this comment, we should deprecate the existing weight APIs. I think we probably still want some room for manual tuning, but it should be simpler by default -- and if we are going to let people give fine-grained numbers, it should be tied to the length (like seq. cutoff) rather than some abstract notion of weight. Ideally though we'd add these APIs along with representative benchmarks where they are needed.

@Eh2406
Copy link
Contributor

Eh2406 commented Oct 28, 2016

As a quick test I took a work project and cargo updated to 0.4.3. With weight_max it runes 32s. Then I removed the weight_max and it runs in 48s. (each best of 3 runs, wall clock time.) Did I test this correctly? Is there any info I can get so as to be helpful? Unfortunately I can't share the example. :-(

@cuviper
Copy link
Member

cuviper commented Oct 28, 2016

@Eh2406 can you characterize the workload somehow? Is the work well balanced for each item, or skewed heavily? Maybe you can synthesize something that performs similarly?

For possible tunables related to length, I think we'll want both a minimum (don't bother splitting below this length) and a maximum (always split at least down to this length), where weight_max() corresponds to asking for a maximum length 1.

@Eh2406
Copy link
Contributor

Eh2406 commented Oct 28, 2016

so I am running on a intel core i5-5200u @ 2.2GHz 2.19 GHz
I Isolated the first stage, the stages are run sequentially but each one calls into_par_iter.
The first stage, actually looks good.
with weight_max 61s, 56s, 51s
without 46s, 58s, 59s
So let me try to isolate the second stage, as so far the new algorithm seems comparable.

stage 2 is 1-2s either way. much of which is the linear overhead before the into_par_it
so going back to doing all stages together:
with: 43s, 36s, 35s
without: 37s, 37s, 39s

and if I replace into_par_iter with into_iter: 69s, 69s, 72s. so rayon is giving me a 2x improvement in speed, good for a 2 core machine. Is there a programmatic way to determine the number of cores as opposed to the number of logical processors?

Don't know why all stages together are faster than just the first stage. But It looks like without weight_max is basically comparable. :-) well done!

@clamydo
Copy link

clamydo commented Jan 11, 2017

I really dislike the weights mechanism, since it relies on an arbitrary threshold. Which makes it meaningless as a measure, since the expensiveness of task can also vary arbitrarily. So I'm in favor removing it.

However, I'd like to retain some control, how the work is split. So I'd propose to simply introduce an optional way, to specify the number of threads used.

Something like .par_iter().jobs(42).whatever().

It is minimally intrusive, but offers precise manual control, if needed.

EDIT: Just saw, this was considered through sequential_threshold(N) in #81. I'd agree, this is the right way to do this, and falling back to the adaptive scheme if not specified.

@nikomatsakis
Copy link
Member Author

I think @cuviper has some thoughts here and wanted to take a crack at this. I'm going to assign to them and remove the Help-Wanted label for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants