Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make AvxFftPlanner: Send+Sync #55

Merged
merged 2 commits into from
Jan 27, 2021
Merged

Make AvxFftPlanner: Send+Sync #55

merged 2 commits into from
Jan 27, 2021

Conversation

cassiersg
Copy link
Contributor

Why making FftPlanner: Send+Sync ?

Making FftPlanner: Send+Sync simplifies the usage of a single FftPlanner
in a large chunk of an multi-threaded application, hence benefitting of
the cache across threads.
This allows for instance to put the FftPlanner in a global variable
(e.g. using lazy_static!), which is may be useful even if the
application is actually not multi-threaded.

There is currently no performance cost in making FftPlanner: Send+Sync,
and other parts of the APIs already pay some cost in order to allow
multi-threading (e.g. FftPlanner.plan_ftt that returns an Arc).

Changes

  • make AvxFftPlanner: Send+Sync by adding a Send+Sync bound on the trait AvxPlannerInternalAPI declaration.
  • make FftPlannerScalar: Send+Sync by replacing the internal Rc<Recipe> by an arena (implemented as a simple Vec, but this can be easily changed to any of the arena crates).
  • add a unit test that FftPlanner: Send+Sync

@ejmahler
Copy link
Owner

ejmahler commented Jan 19, 2021

I would have expected that send+sync doesn't matter for the planners, because all of their functions are &mut, so you have to wrap them in a mutex if you're gonna use them across threads.

Could you give an example of how you use this? I'm on board with supporting it, I just want to make sure I understand what the requirement is.

@cassiersg
Copy link
Contributor Author

The main usage is indeed probably to wrap the planner in a Mutex. Let us consider the common pattern of multi-thread ref-counted sharing through Arc<Mutex<FftPlanner>>: std::sync::Mutex has the following impls:

impl<T: ?Sized + Send> Send for Mutex<T>
impl<T: ?Sized + Send> Sync for Mutex<T>

while for std::sync::Arc:

impl<T> Send for Arc<T> where  T: Send + Sync + ?Sized

and you want the Arc to be Sen to be able to send it to another thread.
Another common case would be to put a Mutex<FftPlanner> in a lazy_static, which requires the Mutex to be Sync.

This makes the case for FftPlanner: Send, but not for FftPlanner: Sync. As you mention, there is no current usage to Sync since the methods on FftPlanner use &mut self.
It is possible to remove the impl Sync as an API-stability future-proofing measure (the most robust technique I know of for this is to include a PhantomData<Cell<()>> field in FftPlanner), but I do not see a need for doing that.

@ejmahler
Copy link
Owner

If the use case is putting the planner inside a mutex, that's definitely something I'd like to support. The best way might to guarantee that might be to make a small test in tests/accuracy.rs that just declares FftPlannerScalar, FftPlannerAvx, and FftPlanner inside of mutexes, so that if they aren't send, the test fails to compile.

If there isn't a use case for sync, I'd rather not commit to it right now. I don't know what being send but not sync would even enable the planner to do, but it seems best not to tie our hands if we don't have to.

src/plan.rs Outdated Show resolved Hide resolved
@cassiersg
Copy link
Contributor Author

If the use case is putting the planner inside a mutex, that's definitely something I'd like to support. The best way might to guarantee that might be to make a small test in tests/accuracy.rs that just declares FftPlannerScalar, FftPlannerAvx, and FftPlanner inside of mutexes, so that if they aren't send, the test fails to compile.
This is unfortunately not enough since you can build a Mutex with a !Send type, but it is actually impossible to use it across threads (which is arguably quite useless). However, we can make a test that fails to compile if a type is !Send. I added such a test in plan.rs for FftPlanner.

If there isn't a use case for sync, I'd rather not commit to it right now. I don't know what being send but not sync would even enable the planner to do, but it seems best not to tie our hands if we don't have to.
The main types that are Send but not Sync in the standard library are Cell and RefCell. To prevent a type from implementing Sync, the best (but imo hacky) way I know is to add a field _non_sync: PhantomData<Cell<()>> to the struct. (I do not know if this is a common/recommended pattern in rust...).

src/plan.rs Outdated Show resolved Hide resolved
@ejmahler
Copy link
Owner

I just realized one more thing - one thing we're losing with this new setup is the ability to dbg!() a recipe and get the full tree.

I think we should challenge our assumption that the overhead of Arc is a major component of FFT planning time. If it isn't, we could switch the Rc to an Arc, have a much smaller diff, and still get dbg!() support for debugging plans

@HEnquist
Copy link
Contributor

I just realized one more thing - one thing we're losing with this new setup is the ability to dbg!() a recipe and get the full tree.

I think we should challenge our assumption that the overhead of Arc is a major component of FFT planning time. If it isn't, we could switch the Rc to an Arc, have a much smaller diff, and still get dbg!() support for debugging plans

Oh right, losing the ability of debug printing a full recipe is a major disadvantage! The main idea behind the recipes was to make debugging and testing easier.

@cassiersg
Copy link
Contributor Author

The Arc cost is probably not too annoying. At least, it looks to me that since the actual Fft contains as many Arc's as the recipe does, putting Arc's everywhere in the Recipe is at most doubling the Fft construction cost.

I do not know the development history and I don't know the kind of issues that have to be often debugged, but IMHO the new Recipe code has the advantage of making explicit the invariant that there is only one Recipe for each len. That Recipes are explicitly composing Fft's of smaller len makes the Recipe non-recursive. This makes Recipe's correctness a much more local property: it should be correctly composing smaller len's. The correctness of the full FFt recursive build naturally results from all Recipes being locally correct.
An example in the way this simplifies the code is that we don't need the Recipe::len function anymore, which was somewhat redundant information.

Nevertheless, as I'm not very knowledgeable in this crate, I you are convinced that Arc'ing the Recipe's is the way to go, I'll be happy implement this in the PR.

@ejmahler
Copy link
Owner

You make a good point about the invariant. Let me think about it.

@ejmahler
Copy link
Owner

After thinking about, I realized something that will factor in: If you take a look at the "estimating planner" PR, it adds a cost() method to Recipe - the idea being that you can hypothetically construct several recipes and compare their costs, and choose the recipe with the lowest cost.

If we compare how easy that will be to implement with the index approach vs the Arc approach, i think the Arc approach will be much cleaner.

Based on that, I think we should go with the Arc approach.

@ejmahler
Copy link
Owner

The invariant is a problem, so i'm glad you pointed it out, and we should make sure to fix it - but that fix doesn't have to be a part of this PR

This is to make FftPlannerScalar: Send
This is done by making AvxPlannerInternalAPI: Send, which adds no
effective additional constraint since the only implementer of this trait
is AvxPlannerInternal, which is Send.

A test is added to ensure that FftPlanner: Send.

Why making FftPlanner: Send?
----------------------------------

Making FftPlanner: Send simplifies the usage of a single FftPlanner
in a large chunk of an multi-threaded application, hence benefitting of
the cache across threads.
This allows for instance to use the FftPlanner in a Mutex, or in a
global variable (e.g. using lazy_static!), which is may be useful even
if the application is actually not multi-threaded.

There is currently no performance cost in making FftPlanner: Send,
and other parts of the APIs already pay some cost in order to allow
multi-threading (e.g. FftPlanner.plan_ftt that returns an Arc).
@cassiersg
Copy link
Contributor Author

@ejmahler This PR now implements the Arc strategy, which is indeed trivial. Let's take the easy path first.

I did not look at the "estimating planner" PR in depth, but as long as the invariant is valid, the len-as-id solution should work: you build multiple Recipe's compute their costs, and store in the HashMap only the one with the lower cost. The invariant means that you will later always want to use that one (and not the more costly one that you can destroy immediately).

For future reference, here is the commit with the approach based on using the len as the identifiers and storing all Recipe in an HashMap.

@ejmahler ejmahler merged commit ad434ec into ejmahler:master Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants