-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] DrawingAreas should be threadsafe. #56
Comments
Ideally this feature should work with #54. Now that I think about it, this might be really hard to implement efficiently, since it would probably require either:
... Alright, with both issues in mind, it might be worthwhile to do a performance analysis if having multiple owned backing arrays (and copying / blitting when merging) is "on average better", or avoiding multi-threaded access altogether (but reducing allocations & copying). |
Hi, That's a really nice suggestion and I think it's really good to have it. Particularly, I think splitting the buffer based on drawing region may have the following issue:
Also if we synchronizing the access per pixel, that will be very expensive. So that I am thinking is, for sure we need synchronizing threads to manipulate the image buffer. What we can improve it is to use a buffer that caches the modification in drawing area and later we acquire the lock once and put all pixels to the same one. In fact, I am thinking about could we have an bitmap element. As long as it reduce the synchronized operations, it might have OK performance. Ideally, we should have a pixel granularity lock, but it seems hard to have one based on the image crate. So any thoughts? |
I believe Rayon create has some of needed functionality.
вт, 15 окт. 2019 г., 22:30 Hao Hou <notifications@github.com>:
… Hi,
That's a really nice suggestion and I think it's really good to have it.
Particularly, I think splitting the buffer based on drawing region may
have the following issue:
1. As you mentioned. The memory region may not be consecutive, for
example we have a region that handles from (100, 100) to (200, 200), Since
the image buffer is stored as a 2D array, it's not easy to have a reference
like that.
2. The drawing area may overlaps, so there's no thread safety
guarantee.
Also if we synchronizing the access per pixel, that will be very expensive.
So that I am thinking is, for sure we need synchronizing threads to
manipulate the image buffer. What we can improve it is to use a buffer that
caches the modification in drawing area and later we acquire the lock once
and put all pixels to the same one.
In fact, I am thinking about could we have an bitmap element.
Basically each thread just render a different in-memory image. And after
that one thread convert all the in-memory image into bitmap elements and
paste it on the main buffer ?
As long as it reduce the synchronized operations, it might be good.
Ideally, we should have a pixel granularity lock, but it seems hard to
have one based on the image crate.
So any thoughts?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#56>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACPXFN4SBOLQNO3JX7SXWF3QOYK6HANCNFSM4JA76AMQ>
.
|
It really depends on what we want to do. If we just do the bitmap element, which allows background rendering and then paste to foreground, then nothing much needs to be changed. However, if we want to make a real thread safety guarantee, we may need many things to think about. |
I think how to implement this depends a bit on the overarching values to address. In this scope mine would be, in order of priority:
From that angle I would want:
This sounds like an interesting idea to hack with different image sizes / graphs, to see how much time plotting usually takes today, and how long image assembly would be, to get a ballpark figure. |
Alright, I did a small experiment:
With grid
Thoughts:
Same experiment without the grid
Without the grid, 320x240
Even then, this drawing into separate buffers and blitting once done seems totally worth it. However, it would be nice not paying the extra cost of blitting + allocation, unless I actually want to pay it, using the threading library of my choice. |
Open a new issue tracking that. But your benchmark seems to be very interesting. #58 Do you know any details about this ? == And yes, we can actually do something for parallel rendering just for BitMapBackend. We probably would have a API for BitMapBackend allowing split a backend obect to a few different one, each one can draw independently. (But only vertically split?) Thoughts? |
Hi @ralfbiedert , Just implemented the BitMapElement and try your benchmark quickly, sees a speedup. But lower than estimated. My guess is the thread synchronization overhead.
Check the dev branch if you want to try this. And this is one of the new benchmark I added #[bench]
fn sine_640_480_2_horizontal_subgraphs_draw_and_blit(b: &mut Bencher) {
let dim = (640 as u32, 360 as u32);
let mut buffer = vec![0; (dim.0 * dim.1 * 3) as usize];
let mut root = BitMapBackend::with_buffer(&mut buffer, dim).into_drawing_area();
let (mut left, mut right) = root.split_horizentally(320);
b.iter(|| {
let plots:Vec<_> = [left.dim_in_pixel(), right.dim_in_pixel()]
.par_iter()
.map(|(w,h)| {
let mut element = plotters::element::BitMapElement::new((0,0), (*w,*h));
{
let left = element.as_bitmap_backend().into_drawing_area();
left.fill(&WHITE).unwrap();
let mut chart = ChartBuilder::on(&left)
.caption("y=x^2", ("Arial", 50).into_font())
.margin(5)
.x_label_area_size(30)
.y_label_area_size(30)
.build_ranged(-1f32..1f32, -0.1f32..1f32)
.unwrap();
chart.configure_mesh().draw().unwrap();
chart
.draw_series(LineSeries::new(
(-50..=50).map(|x| x as f32 / 50.0).map(|x| (x, x * x)),
&RED,
))
.unwrap()
.label("y = x^2")
.legend(|(x, y)| Path::new(vec![(x, y), (x + 20, y)], &RED));
chart
.configure_series_labels()
.background_style(&WHITE.mix(0.8))
.border_style(&BLACK)
.draw()
.unwrap();
}
element
})
.collect();
left.draw(&plots[0]).unwrap();
right.draw(&plots[1]).unwrap();
root.get_base_pixel()
}); // `black_box` prevents `f` from being optimized away.
} |
Hello @ralfbiedert , After working for a while on this, I just want to let you know some updates:
For the last update, I think the major problem is partially mutable borrow a vector seems to be unsafe anyway. So as a crate maintainer, I think it's important to not leaking the unsafe behavior to the safe rust code. So what I am thinking is like the code in my benchmark: let mut buffer = vec![0u8; (W * H * 3) as usize];
let (upper, lower) = unsafe {
let upper_addr = &mut buffer[0] as *mut u8;
let lower_addr = &mut buffer[(W * H * 3 / 2) as usize] as *mut u8;
(
std::slice::from_raw_parts_mut(upper_addr, (W * H * 3 / 2) as usize),
std::slice::from_raw_parts_mut(lower_addr, (W * H * 3 / 2) as usize),
)
};
[upper, lower]
.par_iter_mut()
.for_each(|b| draw_plot(&BitMapBackend::with_buffer(*b, (W, H / 2)).into_drawing_area(), 2.0)); So would you mind share you idea on this? Thanks! |
I just measured with my own benchmarks again, and can confirm drawing the graphs went from:
to
My actual app went from 3% to 1% CPU usage! This is awesome!
Although I think a "any layout" parallel story would be nice, I can see the overhead this might bring. Thanks for investigating!
I submitted a PR, I think this can be simplified and made let (upper, lower) = buffer.split_at_mut((W * H * 3 / 2) as usize); |
Glad you confirmed the performance improvement.
Thanks for the suggestion. You PR just remainders me. I wasn't realize there's So I think my argument on introducing any unsafe behavior into safe code isn't the case. If this is the case, I am really happy to have the feature. Update: I worked on this. And if you would like to try, just pull the latest dev branch. Code looks like this: let mut back = BitMapBackend::with_buffer(&mut buffer, (W, H));
back.split(&[H / 2])
.into_par_iter()
.for_each(|b| draw_plot(&b.into_drawing_area(), 2.0)); |
Seems we have investigave and impled things suggested under this one. Closing the issue for now. Feel free to reopen it if there's anything else. Thanks |
Background
Similar to #55. In addition, it would be nice to make use of multiple cores when rendering real time sub-graphs.
When doing:
It would be nice if
upper
andlower
could be used from separate threads to render both sub-graphs in parallel, since they should not share any overlapping area.See above, to improve multi-core utilization when doing software rendering of real time graphs.
Addition Information
Plotters should not do the multithreading itself and should remain lightweight. Instead, I as the API user want to be in charge which threading library I use, e.g., when rendering sub-graphs.
The text was updated successfully, but these errors were encountered: