Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The use of TBB spawn vs TBB enqueue in MPQC version of Madness code #357

Open
victor-anisimov opened this issue Aug 14, 2020 · 7 comments
Open

Comments

@victor-anisimov
Copy link

victor-anisimov commented Aug 14, 2020

While analyzing the use of TBB in MPQC version of Madness (commit c6ec0e7) code I noticed a difference between that code and the master code in thread.h file. Since MADNESS_CAN_USE_TBB_PRIORITY is undefined in MPQC build the latest (master) copy of Madness has the following piece of code that deals with tasks of different priority in thread.h

            tbb::task::enqueue(*task);

The MPQC version of Madness carries a slightly different code in that part of thread.h

        if(task->is_high_priority()) {
            tbb::task::spawn(*task);
        } else {
            tbb::task::enqueue(*task);
        }

This code obviously works with MPQC. Interestingly, If I replace the above 5 lines with just one line

            tbb::task::enqueue(*task);

MPQC still gives numerically correct results and even runs a bit faster.

What concerns me is the following. If I put

        if(task->is_high_priority())
          tbb::task::enqueue(*task, tbb::priority_high);
        else
            tbb::task::enqueue(*task);

instead of the original piece of code (5 lines) present in MPQC version of Madness, MPQC start producing numerically incorrect results. This is tested on ALCF Theta. I checked. There is nearly equal number of tasks marked with high priority as those not carrying that attribute (hence, having regular priority). Using "tbb::task::enqueue(*task, tbb::priority_high);" appears to be the correct way of handling tasks with high priority. Why use " tbb::task::spawn(*task);" instead of "tbb::task::enqueue(*task, tbb::priority_high);" if the task carries high-priority attribute? The command "tbb::task::spawn(*task);" appears to be not much different from the regular "tbb::task::enqueue(*task);" since using only the latter single line in place of 5 lines works. Perhaps my question should be rephrased - do we have to honor high priority attribute in this part of thread.h code? Would using just one line "tbb::task::enqueue(*task);" and deleting "tbb::task::spawn(*task);" be correct in MPQC version of Madness?

@evaleev
Copy link
Contributor

evaleev commented Aug 15, 2020

@victor-anisimov ideally we do need to have the ability to control task priority. PThread-based thread-pool implementation certainly does. The reason tbb::task::spawn was removed due to the apparent inability to guarantee forward progress under some circumstances ... see the commit msg of 6717bd3 ... by default the TBB task priority is now disabled to avoid these issues. but can be enabled manually.

P.S. I'm surprised that MPQC uses thread.h that is different from master ... really??? MPQC trunk follows MADNESS trunk quite closely.

@victor-anisimov
Copy link
Author

@evaleev Thank you for quick response! It is possible that I'm using not the latest version of MPQC/TiledArray.

What puzzles me is that running tasks, which have high-priority attribute on, by using function "tbb::task::enqueue(*task)", which ignores priority, gives correct numerical results in MPQC; however, using "tbb::task::enqueue(*task, tbb::priority_high)" to run tasks of high priority breaks the numerical correctness of CCSD job. The function "tbb::task::enqueue(*task)" should be interchangeable with function "tbb::task::enqueue(*task, tbb::priority_high)" but it does not appear so. Could it be that the attribute tbb::priority_high is not correctly set everywhere or function "tbb::task::enqueue(*task, tbb::priority_high)" simply fails to run a task when high-priority attribute is on? What would be the right approach to track the execution of high-priority tasks?

@robertjharrison
Copy link
Contributor

robertjharrison commented Aug 18, 2020 via email

@justusc
Copy link
Member

justusc commented Aug 18, 2020

@victor-anisimov High priority tasks in TiledArray are almost exclusively communication tasks. Everything else should be normal priority. There may be some changes that have been introduced since I worked on it, but that was my general approach. There may also be other components in MPQC that use high priority tasks.

As for the differences in numerical precision, TA was designed to eliminate barriers in tensor contractions. As a result, the order of the tile level contractions is not guaranteed. Thus you can get some numerical noise in your output from run to run. This is an unfortunate side effect of the break in order dependencies that were required to achieve high performance. This is often disconcerting to some people as they expect the output to be 100% consistent from run to run. However, this is no different than the effect you might see from changing the order of evaluation in a deterministic system. It can also be quite useful in showing you the degree that numerical noise plays in your result. Changing the task submission so that everything goes through the shared queue is probably making the order of evaluation more stable, thus your numerical results are more stable.

Having said that, it is entirely possible that there is a race condition that is causing the error in your result. Here again, submitting all tasks though the shared queue could be mask a race condition.

@evaleev It is possible to introduce a race condition with TA expressions. Specifically when using the in-place updates (for example += operations). There is some measure of protection against this for a series of in place updates where the tensor receiving the in place updates is on the left-hand side of the assignment operator. However, you can introduce a race condition if the tensor receiving the in place updates appears on the right hand side of the assignment. There needs to be a barrier in that scenario.

As for the deadlock problem, the most likely cause is that Future::get() is called inside a task function. Submitting tasks through the shared queue can certainly mask this kind of error.

@evaleev
Copy link
Contributor

evaleev commented Aug 18, 2020

@victor-anisimov could you clarify what you mean when you say "MPQC start producing numerically incorrect results" with high-priority TBB tasks? please provide more details. it could be just the effects of change in task schedule, but typically we see such results to be negligibly small.

@robertjharrison
Copy link
Contributor

robertjharrison commented Aug 18, 2020 via email

@victor-anisimov
Copy link
Author

Thank you for your input @justusc @evaleev @robertjharrison! I was wrong referring to numerical errors that I thought I had after enabling task priority

    if(task->is_high_priority())
        tbb::task::enqueue(*task, tbb::priority_high);
    else
        tbb::task::enqueue(*task);

What happens is that MPQC CCSD becomes very slow and the execution timeouts, but the CCSD energies are actually unaffected. I mistakenly attributed the code crash to numerical errors.

It appears that "tbb::task::enqueue(*task, tbb::priority_high)" does what it is designed for but it cannot be used with the current state of TiledArray/MPQC code.

Do we need anything from Intel TBB developers to improve handling of high-priority tasks? If so, supplying them with a stand-alone test to illustrate the issue would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants