Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible anomaly running the mt_word_counter example #199

Closed
marioroy opened this issue Jun 20, 2023 · 4 comments
Closed

Possible anomaly running the mt_word_counter example #199

marioroy opened this issue Jun 20, 2023 · 4 comments

Comments

@marioroy
Copy link

marioroy commented Jun 20, 2023

I created a new gist C++ parallel chunking demonstrations, inspired by your mt_word_counter.cc example.

Once parallel processing is completed, the time to output takes mt_word_counter longer than expected. I'm not sure if this is a bug or nothing more than an anomaly. This anomaly caused me to create another chunking variant to factor out OpenMP. Both omp_word_counter and thr_word_counter work as expected, time wise.

There are less than 8,000 (or 7,652 to be exact) key-value pairs in the word_counts map table.

I also ran on two different Linux distributions.

8 threads: Clear Linux (left), Fedora 38 (right)

$ NUM_THREADS=8 ./mt_word_counter | cksum
# memory consumption ~ 1360 MB
populate arrays        0.993 secs    1.031 secs
compute word counts    5.142 secs    5.601 secs
output results         1.082 secs    1.100 secs
total time             7.218 secs    7.733 secs
2419766184 3124

$ NUM_THREADS=8 ./omp_word_counter | cksum
# memory consumption < 1 MB
compute word counts    4.575 secs    5.000 secs
output results         0.001 secs    0.001 secs
total time             4.577 secs    5.002 secs
2419766184 3124

$ NUM_THREADS=8 ./thr_word_counter | cksum
# memory consumption < 1 MB
compute word counts    4.574 secs    5.019 secs
output results         0.001 secs    0.001 secs
total time             4.576 secs    5.021 secs
2419766184 3124

24 threads: Clear Linux (left), Fedora 38 (right)

$ NUM_THREADS=24 ./mt_word_counter | cksum
# memory consumption ~ 1372 MB
populate arrays        1.082 secs    1.111 secs
compute word counts    2.574 secs    2.874 secs
output results         1.146 secs    1.177 secs
total time             4.803 secs    5.162 secs
2419766184 3124

$ NUM_THREADS=24 ./omp_word_counter | cksum
# memory consumption < 1 MB
compute word counts    2.209 secs    2.573 secs
output results         0.001 secs    0.001 secs
total time             2.210 secs    2.575 secs
2419766184 3124

$ NUM_THREADS=24 ./thr_word_counter | cksum
# memory consumption < 1 MB
compute word counts    2.210 secs    2.571 secs
output results         0.001 secs    0.001 secs
total time             2.212 secs    2.573 secs
2419766184 3124
@marioroy
Copy link
Author

marioroy commented Jun 20, 2023

The following workaround resolves the issue; output completing in 0.001 seconds.

97,98c97,98
<             [&word_counts](std::vector<std::string>&& lines) {
<                 for (auto& line : lines) {
---
>             [&word_counts, &lines_array, i] {
>                 for (auto& line : lines_array[i]) {
121,122c121
<             },
<             std::move(lines_array[i])
---
>             }

Ah, C++ boggles my mind :). Thank you, for the parallel hash map library.

@marioroy
Copy link
Author

marioroy commented Jun 20, 2023

Previously, seeing output in mt_word_counter take over 1 second, for less than 8,000 key-value pairs, puzzled me. This is now resolved in my gist.

8 threads: Clear Linux (left), Fedora 38 (right)

$ NUM_THREADS=8 ./mt_word_counter | cksum
# memory consumption ~ 1360 MB
populate arrays        0.991 secs    1.037 secs
compute word counts    4.847 secs    5.307 secs
output results         0.001 secs    0.001 secs
total time             5.839 secs    6.346 secs
2419766184 3124

$ NUM_THREADS=8 ./omp_word_counter | cksum
# memory consumption < 1 MB
compute word counts    4.575 secs    5.000 secs
output results         0.001 secs    0.001 secs
total time             4.577 secs    5.002 secs
2419766184 3124

$ NUM_THREADS=8 ./thr_word_counter | cksum
# memory consumption < 1 MB
compute word counts    4.574 secs    5.019 secs
output results         0.001 secs    0.001 secs
total time             4.576 secs    5.021 secs
2419766184 3124

24 threads: Clear Linux (left), Fedora 38 (right)

$ NUM_THREADS=24 ./mt_word_counter | cksum
# memory consumption ~ 1372 MB
populate arrays        1.056 secs    1.119 secs
compute word counts    2.285 secs    2.762 secs
output results         0.001 secs    0.001 secs
total time             3.344 secs    3.884 secs
2419766184 3124

$ NUM_THREADS=24 ./omp_word_counter | cksum
# memory consumption < 1 MB
compute word counts    2.209 secs    2.573 secs
output results         0.001 secs    0.001 secs
total time             2.210 secs    2.575 secs
2419766184 3124

$ NUM_THREADS=24 ./thr_word_counter | cksum
# memory consumption < 1 MB
compute word counts    2.210 secs    2.571 secs
output results         0.001 secs    0.001 secs
total time             2.212 secs    2.573 secs
2419766184 3124

@marioroy
Copy link
Author

marioroy commented Jun 20, 2023

I just realized that the individual threads can update a local map. Subsequently, the threads update the shared map after exiting the loop. This is something I tried in order to scale better.

The gist was updated to showcase both parallel_flat_hash_map_m (for shared map) and flat_hash_map (for local map). What I find interesting is not experiencing any extra IO time for the chunking demonstrations when comparing compute.

8 threads: Clear Linux (left), Fedora 38 (right)

$ NUM_THREADS=8 ./mt_word_counter | cksum
# memory consumption ~ 1360 MB
populate arrays        0.983 secs    1.042 secs
compute word counts    1.586 secs    1.924 secs
output results         0.001 secs    0.001 secs
total time             2.571 secs    2.968 secs
2419766184 3124

$ NUM_THREADS=8 ./omp_word_counter | cksum
# memory consumption < 1 MB
compute word counts    1.575 secs    1.799 secs
output results         0.001 secs    0.001 secs
total time             1.577 secs    1.801 secs
2419766184 3124

$ NUM_THREADS=8 ./thr_word_counter | cksum
# memory consumption < 1 MB
compute word counts    1.578 secs    1.779 secs
output results         0.001 secs    0.001 secs
total time             1.580 secs    1.781 secs
2419766184 3124

24 threads: Clear Linux (left), Fedora 38 (right)

$ NUM_THREADS=24 ./mt_word_counter | cksum
# memory consumption ~ 1372 MB
populate arrays        1.056 secs    1.114 secs
compute word counts    0.543 secs    0.623 secs
output results         0.001 secs    0.001 secs
total time             1.601 secs    1.740 secs
2419766184 3124

$ NUM_THREADS=24 ./omp_word_counter | cksum
# memory consumption < 1 MB
compute word counts    0.537 secs    0.606 secs
output results         0.001 secs    0.001 secs
total time             0.538 secs    0.608 secs
2419766184 3124

$ NUM_THREADS=24 ./thr_word_counter | cksum
# memory consumption < 1 MB
compute word counts    0.537 secs    0.597 secs
output results         0.001 secs    0.001 secs
total time             0.539 secs    0.599 secs
2419766184 3124

@marioroy
Copy link
Author

I'm closing the issue. I was able to work around the anomaly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant