<execution>: Parallelize more algorithms #7

BillyONeal · 2019-09-05T19:55:09Z

Some strategies that can be used:

Just call parallel transform:

replace_copy
replace_copy_if

Scans:

copy_if
partition_copy
remove_copy
remove_copy_if
unique
unique_copy

Same as serial nth_element but call the parallel partition op for large N:

nth_element

Predicate tests (like all_of):

lexicographical_compare

Summary statistics (like find / find_end):

min_element
max_element
minmax_element

Divide and conquer:

inplace_merge
stable_partition

Divide range1 into chunks, binary search to find matching range2 chunks, scan:

merge
set_symmetric_difference
set_union

Other:

includes

The text was updated successfully, but these errors were encountered:

AlexGuteniev · 2020-08-09T16:40:02Z

Should generate and generate_n also be parallelized?

Here's how I try to compensate lack of transform with unseq

#include <vector>
#include <execution>

std::vector<int> v_a(16);
std::vector<int> v_b(16);
std::vector<int> v_c(16);

void sum(int* c, int* a, int* b)
{
    std::generate_n(std::execution::unseq, c, 16, [=]() mutable {
        return *(a++) + *(b++);
    });
}

int main()
{
    sum(v_c.data(), v_a.data(), v_b.data());
}

CaseyCarter · 2020-08-09T19:48:22Z

Should generate and generate_n also be parallelized?

Your sample program is a great demonstration of why generate and generate_n can't be meaningfully parallelized: the function object is almost always stateful, and evaluations almost always order-dependent. I suspect the writer of this program would be very unhappy if the library spawned 16 threads with their own copies of the lambda resulting in all elements of v_c being equal to v_a[0] + v_b[0].

AlexGuteniev · 2020-08-10T03:12:15Z

Why does execution policy parameter even exist for them?

BillyONeal · 2020-08-10T17:42:46Z

Why does execution policy parameter even exist for them?

Because the 'design' process for the parallel algorithms was 'if nobody can come up with a reason why it can't be parallelized, without looking at real implementations, in 10 minutes, it gets a parallel one'. Note that partial_sort has such an overload too even though it is a heap algorithm and none of the other heap algorithms got parallel overloads.

AlexGuteniev · 2020-08-10T17:52:48Z

I'm trying to make this work without #pragma in my code:

#include <vector>

std::vector<int> v_a(16);
std::vector<int> v_b(16);
std::vector<int> v_c(16);

void sum(int c[], int a[], int b[])
{
#pragma loop(ivdep)
	for (int i = 0; i < 16; i++)
	{
		c[i] = a[i] + b[i];
	}
}

int main()
{
	sum(v_c.data(), v_a.data(), v_b.data());
}

Note that without a pragma the compilers still emits SIMD version, but goes to it conditionally at runtime.

Now that this is useless:

void sum(int * c, int* a, int * b)
{
   std::transform(std::execution::unseq, a, a + 16, b, c, std::plus{});
}

As well as this:

void sum(int* c, int* a, int* b)
{
    std::for_each(std::execution::unseq, c, c + 16, [=](int& v) {
        v = a[&v - c] + b[&v - c];
    });
}

How else I can say it?

BillyONeal · 2020-08-10T20:46:33Z

I don't believe we have a way to say that without a pragma at this time. It's really on the optimizer team if they want to consume the unseq signal and they have declined to do so as of yet.

BillyONeal self-assigned this Sep 5, 2019

StephanTLavavej added the enhancement Something can be improved label Sep 5, 2019

StephanTLavavej changed the title ~~Investigate parallelization of more algorithms~~ <execution>: Parallelize more algorithms Sep 5, 2019

StephanTLavavej added the performance Must go faster label Sep 19, 2019

BillyONeal removed their assignment Oct 28, 2019

StephanTLavavej removed the enhancement Something can be improved label Feb 6, 2020

MahmoudGSaleh mentioned this issue Jul 30, 2020

<fstream>: basic_filebuf doesn't comply with setbuf(0,0) requirement in the standard #1113

Open

AlexGuteniev mentioned this issue Feb 20, 2024

Vectorize more algorithms for x86 / x64 using SSE4.2 and/or AVX2 #4415

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

<execution>: Parallelize more algorithms #7

<execution>: Parallelize more algorithms #7

BillyONeal commented Sep 5, 2019 •

edited by StephanTLavavej

Loading

AlexGuteniev commented Aug 9, 2020

CaseyCarter commented Aug 9, 2020

AlexGuteniev commented Aug 10, 2020

BillyONeal commented Aug 10, 2020

AlexGuteniev commented Aug 10, 2020 •

edited

Loading

BillyONeal commented Aug 10, 2020

<execution>: Parallelize more algorithms #7

<execution>: Parallelize more algorithms #7

Comments

BillyONeal commented Sep 5, 2019 • edited by StephanTLavavej Loading

AlexGuteniev commented Aug 9, 2020

CaseyCarter commented Aug 9, 2020

AlexGuteniev commented Aug 10, 2020

BillyONeal commented Aug 10, 2020

AlexGuteniev commented Aug 10, 2020 • edited Loading

BillyONeal commented Aug 10, 2020

BillyONeal commented Sep 5, 2019 •

edited by StephanTLavavej

Loading

AlexGuteniev commented Aug 10, 2020 •

edited

Loading