Make AvxFftPlanner: Send+Sync #55

cassiersg · 2021-01-19T16:47:04Z

Why making FftPlanner: Send+Sync ?

Making FftPlanner: Send+Sync simplifies the usage of a single FftPlanner
in a large chunk of an multi-threaded application, hence benefitting of
the cache across threads.
This allows for instance to put the FftPlanner in a global variable
(e.g. using lazy_static!), which is may be useful even if the
application is actually not multi-threaded.

There is currently no performance cost in making FftPlanner: Send+Sync,
and other parts of the APIs already pay some cost in order to allow
multi-threading (e.g. FftPlanner.plan_ftt that returns an Arc).

Changes

make AvxFftPlanner: Send+Sync by adding a Send+Sync bound on the trait AvxPlannerInternalAPI declaration.
make FftPlannerScalar: Send+Sync by replacing the internal Rc<Recipe> by an arena (implemented as a simple Vec, but this can be easily changed to any of the arena crates).
add a unit test that FftPlanner: Send+Sync

ejmahler · 2021-01-19T20:17:15Z

I would have expected that send+sync doesn't matter for the planners, because all of their functions are &mut, so you have to wrap them in a mutex if you're gonna use them across threads.

Could you give an example of how you use this? I'm on board with supporting it, I just want to make sure I understand what the requirement is.

cassiersg · 2021-01-19T20:56:32Z

The main usage is indeed probably to wrap the planner in a Mutex. Let us consider the common pattern of multi-thread ref-counted sharing through Arc<Mutex<FftPlanner>>: std::sync::Mutex has the following impls:

impl<T: ?Sized + Send> Send for Mutex<T>
impl<T: ?Sized + Send> Sync for Mutex<T>

while for std::sync::Arc:

impl<T> Send for Arc<T> where  T: Send + Sync + ?Sized

and you want the Arc to be Sen to be able to send it to another thread.
Another common case would be to put a Mutex<FftPlanner> in a lazy_static, which requires the Mutex to be Sync.

This makes the case for FftPlanner: Send, but not for FftPlanner: Sync. As you mention, there is no current usage to Sync since the methods on FftPlanner use &mut self.
It is possible to remove the impl Sync as an API-stability future-proofing measure (the most robust technique I know of for this is to include a PhantomData<Cell<()>> field in FftPlanner), but I do not see a need for doing that.

ejmahler · 2021-01-19T21:58:25Z

If the use case is putting the planner inside a mutex, that's definitely something I'd like to support. The best way might to guarantee that might be to make a small test in tests/accuracy.rs that just declares FftPlannerScalar, FftPlannerAvx, and FftPlanner inside of mutexes, so that if they aren't send, the test fails to compile.

If there isn't a use case for sync, I'd rather not commit to it right now. I don't know what being send but not sync would even enable the planner to do, but it seems best not to tie our hands if we don't have to.

src/plan.rs

cassiersg · 2021-01-20T23:23:24Z

If the use case is putting the planner inside a mutex, that's definitely something I'd like to support. The best way might to guarantee that might be to make a small test in tests/accuracy.rs that just declares FftPlannerScalar, FftPlannerAvx, and FftPlanner inside of mutexes, so that if they aren't send, the test fails to compile.
This is unfortunately not enough since you can build a Mutex with a !Send type, but it is actually impossible to use it across threads (which is arguably quite useless). However, we can make a test that fails to compile if a type is !Send. I added such a test in plan.rs for FftPlanner.

If there isn't a use case for sync, I'd rather not commit to it right now. I don't know what being send but not sync would even enable the planner to do, but it seems best not to tie our hands if we don't have to.
The main types that are Send but not Sync in the standard library are Cell and RefCell. To prevent a type from implementing Sync, the best (but imo hacky) way I know is to add a field _non_sync: PhantomData<Cell<()>> to the struct. (I do not know if this is a common/recommended pattern in rust...).

src/plan.rs

ejmahler · 2021-01-22T20:52:29Z

I just realized one more thing - one thing we're losing with this new setup is the ability to dbg!() a recipe and get the full tree.

I think we should challenge our assumption that the overhead of Arc is a major component of FFT planning time. If it isn't, we could switch the Rc to an Arc, have a much smaller diff, and still get dbg!() support for debugging plans

HEnquist · 2021-01-22T20:59:59Z

I just realized one more thing - one thing we're losing with this new setup is the ability to dbg!() a recipe and get the full tree.

I think we should challenge our assumption that the overhead of Arc is a major component of FFT planning time. If it isn't, we could switch the Rc to an Arc, have a much smaller diff, and still get dbg!() support for debugging plans

Oh right, losing the ability of debug printing a full recipe is a major disadvantage! The main idea behind the recipes was to make debugging and testing easier.

cassiersg · 2021-01-22T22:22:49Z

The Arc cost is probably not too annoying. At least, it looks to me that since the actual Fft contains as many Arc's as the recipe does, putting Arc's everywhere in the Recipe is at most doubling the Fft construction cost.

I do not know the development history and I don't know the kind of issues that have to be often debugged, but IMHO the new Recipe code has the advantage of making explicit the invariant that there is only one Recipe for each len. That Recipes are explicitly composing Fft's of smaller len makes the Recipe non-recursive. This makes Recipe's correctness a much more local property: it should be correctly composing smaller len's. The correctness of the full FFt recursive build naturally results from all Recipes being locally correct.
An example in the way this simplifies the code is that we don't need the Recipe::len function anymore, which was somewhat redundant information.

Nevertheless, as I'm not very knowledgeable in this crate, I you are convinced that Arc'ing the Recipe's is the way to go, I'll be happy implement this in the PR.

ejmahler · 2021-01-22T22:54:10Z

You make a good point about the invariant. Let me think about it.

ejmahler · 2021-01-25T03:05:49Z

After thinking about, I realized something that will factor in: If you take a look at the "estimating planner" PR, it adds a cost() method to Recipe - the idea being that you can hypothetically construct several recipes and compare their costs, and choose the recipe with the lowest cost.

If we compare how easy that will be to implement with the index approach vs the Arc approach, i think the Arc approach will be much cleaner.

Based on that, I think we should go with the Arc approach.

ejmahler · 2021-01-25T03:08:22Z

The invariant is a problem, so i'm glad you pointed it out, and we should make sure to fix it - but that fix doesn't have to be a part of this PR

This is to make FftPlannerScalar: Send

This is done by making AvxPlannerInternalAPI: Send, which adds no effective additional constraint since the only implementer of this trait is AvxPlannerInternal, which is Send. A test is added to ensure that FftPlanner: Send. Why making FftPlanner: Send? ---------------------------------- Making FftPlanner: Send simplifies the usage of a single FftPlanner in a large chunk of an multi-threaded application, hence benefitting of the cache across threads. This allows for instance to use the FftPlanner in a Mutex, or in a global variable (e.g. using lazy_static!), which is may be useful even if the application is actually not multi-threaded. There is currently no performance cost in making FftPlanner: Send, and other parts of the APIs already pay some cost in order to allow multi-threading (e.g. FftPlanner.plan_ftt that returns an Arc).

cassiersg · 2021-01-26T22:36:06Z

@ejmahler This PR now implements the Arc strategy, which is indeed trivial. Let's take the easy path first.

I did not look at the "estimating planner" PR in depth, but as long as the invariant is valid, the len-as-id solution should work: you build multiple Recipe's compute their costs, and store in the HashMap only the one with the lower cost. The invariant means that you will later always want to use that one (and not the more costly one that you can destroy immediately).

For future reference, here is the commit with the approach based on using the len as the identifiers and storing all Recipe in an HashMap.

ejmahler reviewed Jan 19, 2021

View reviewed changes

src/plan.rs Outdated Show resolved Hide resolved

cassiersg commented Jan 20, 2021

View reviewed changes

src/plan.rs Outdated Show resolved Hide resolved

cassiersg force-pushed the send branch from fdb94a5 to 28cbff5 Compare January 21, 2021 10:58

cassiersg added 2 commits January 26, 2021 23:18

Use Arc instead of Rc for Recipe in FftPlannerScalar

b0225e7

This is to make FftPlannerScalar: Send

cassiersg force-pushed the send branch from 28cbff5 to 83f752e Compare January 26, 2021 22:21

ejmahler merged commit ad434ec into ejmahler:master Jan 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make AvxFftPlanner: Send+Sync #55

Make AvxFftPlanner: Send+Sync #55

cassiersg commented Jan 19, 2021

ejmahler commented Jan 19, 2021 •

edited

Loading

cassiersg commented Jan 19, 2021

ejmahler commented Jan 19, 2021

cassiersg commented Jan 20, 2021

ejmahler commented Jan 22, 2021

HEnquist commented Jan 22, 2021

cassiersg commented Jan 22, 2021

ejmahler commented Jan 22, 2021

ejmahler commented Jan 25, 2021

ejmahler commented Jan 25, 2021

cassiersg commented Jan 26, 2021

Make AvxFftPlanner: Send+Sync #55

Make AvxFftPlanner: Send+Sync #55

Conversation

cassiersg commented Jan 19, 2021

Why making FftPlanner: Send+Sync ?

Changes

ejmahler commented Jan 19, 2021 • edited Loading

cassiersg commented Jan 19, 2021

ejmahler commented Jan 19, 2021

cassiersg commented Jan 20, 2021

ejmahler commented Jan 22, 2021

HEnquist commented Jan 22, 2021

cassiersg commented Jan 22, 2021

ejmahler commented Jan 22, 2021

ejmahler commented Jan 25, 2021

ejmahler commented Jan 25, 2021

cassiersg commented Jan 26, 2021

ejmahler commented Jan 19, 2021 •

edited

Loading