Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate where netcdf-c breaks thread safety #43

Closed
magnusuMET opened this issue Dec 18, 2019 · 22 comments
Closed

Investigate where netcdf-c breaks thread safety #43

magnusuMET opened this issue Dec 18, 2019 · 22 comments

Comments

@magnusuMET
Copy link
Member

magnusuMET commented Dec 18, 2019

The global mutex is in no way ideal when reading/writing from multiple threads, and could be split into several (global/per file), or replaced by RWlocks.

This requires investigation into where netcdf does something thread unsafe, and limit the locking to this part. Should also investigate where HDF5 might be problematic

One could also integrate bindings to https://github.com/Parallel-NetCDF/PnetCDF, but this limits the formats to CLASSIC

Unidata/netcdf-c#1373

@magnusuMET
Copy link
Member Author

Constraints from Rust should also be taken into account when investigating (multiple reader, no writers), which could allow relaxing of the locking

@magnusuMET
Copy link
Member Author

magnusuMET commented Dec 23, 2019

There is currently no locking (in netcdf-c) of the global NCList, which contains the file-handles. One could either lock when opening/closing files, or try to upstream a RwLock protecting this structure.

A newer version of netcdf-c has the nc_initialize function which initializes all the dispatchers. This could be called once on initializing the library.

@magnusuMET
Copy link
Member Author

Caching of variables and attributes requires a separate lock when reading/writing variables/groups.

@gauteh
Copy link

gauteh commented Dec 30, 2019

Continuing from #42 which seems to overlap with this one. HDF5 does support single-writer / multiple-readers (e.g. http://docs.h5py.org/en/stable/swmr.html). So maybe this is something that could be supported on newer HDF5 based netcdf files. I saw somewhere that using hdf5 to write, and especially adding new variables, to a netcdf file is likely to make the file unreadable by netcdf. But reading should be fine.

@magnusuMET
Copy link
Member Author

Multiple readers should be supported out of the box, but some initialisation might be required on first access of the variable. I think we can have a lock surrounding opening/closing files and smaller locks on variable access

@magnusuMET
Copy link
Member Author

@gauteh Do you have a suitable benchmark we could use for testing parallell performance?

@gauteh
Copy link

gauteh commented Dec 30, 2019 via email

@gauteh
Copy link

gauteh commented Dec 30, 2019 via email

@magnusuMET
Copy link
Member Author

Are we talking about concurrent access from threads or from processes?

I believe we are mostly interested in threads

the difference between this issue and the parallel read issue

The parallel access requires an MPI communicator, and parallell instances of the same program. This is not exposed at all in the current form of the library

@gauteh
Copy link

gauteh commented Dec 30, 2019 via email

@magnusuMET
Copy link
Member Author

And to same file struct? Or one struct per thread to same netcdf file?

I am not really sure whether Variable should be Send/Sync yet. I'll need to dig into netcdf-c and see.

@gauteh
Copy link

gauteh commented Jan 7, 2020

Tried making a small example now using latest master. I'm not able to pass a Arc<netcdf::File> between threads anymore due to UnsafeCell on groups: https://github.com/gauteh/rust-netcdf/blob/thread-perf/tests/concurrency.rs

I also noticed you are working on some big changes in #51, which is probably good to get in before resolving this..?

   Compiling netcdf v0.3.1 (/home/gauteh/dev/rust-netcdf)
error[E0277]: `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>` cannot be sent between threads safely
  --> tests/concurrency.rs:18:10
   |
18 |     pool.scope(move |s| {
   |          ^^^^^ `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>` cannot be sent between threads safely
   |
   = help: within `netcdf::file::ReadOnlyFile`, the trait `std::marker::Send` is not implemented for `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>`
   = note: required because it appears within the type `netcdf::file::File`
   = note: required because it appears within the type `netcdf::file::ReadOnlyFile`
   = note: required because of the requirements on the impl of `std::marker::Send` for `std::sync::Arc<netcdf::file::ReadOnlyFile>`
   = note: required because it appears within the type `[closure@tests/concurrency.rs:18:16: 31:6 f:std::sync::Arc<netcdf::file::ReadOnlyFile>]`

error[E0277]: `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>` cannot be shared between threads safely
  --> tests/concurrency.rs:18:10
   |
18 |     pool.scope(move |s| {
   |          ^^^^^ `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>` cannot be shared between threads safely
   |
   = help: within `netcdf::file::ReadOnlyFile`, the trait `std::marker::Sync` is not implemented for `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>`
   = note: required because it appears within the type `netcdf::file::File`
   = note: required because it appears within the type `netcdf::file::ReadOnlyFile`
   = note: required because of the requirements on the impl of `std::marker::Send` for `std::sync::Arc<netcdf::file::ReadOnlyFile>`
   = note: required because it appears within the type `[closure@tests/concurrency.rs:18:16: 31:6 f:std::sync::Arc<netcdf::file::ReadOnlyFile>]`

error: aborting due to 2 previous errors

@magnusuMET
Copy link
Member Author

Yeah, I've pretty much redone entirely how this crate parses the netcdf file, which will remove the UnsafeCell and pull the crate to a one-to-one mapping towards netcdf. I don't think there is any strong opposition against merging, so I'll go ahead and do just that.

@gauteh
Copy link

gauteh commented Jan 7, 2020

Great!

@magnusuMET
Copy link
Member Author

Some observations from using a non-threadsafe HDF5 library: Segfaults and aborts, even when working on unrelated netCDF files. HDF5 should be considered highly unsafe for multithreading.

@gauteh
Copy link

gauteh commented Jan 14, 2020

I guess that settles the discussion on HDF5. I asked developers of Hyrax (an official opendap server) about their HDF5 interface. It seems to be more thread-safe, but I am not sure if this refers to the HDF5 reading library or possible an interface in-between. If somehow HDF5 can be made thread-safe w.r.t. at least reading that will put the performance much higher than netcdf-libs in other languages (depending on use case of course).

OPENDAP/hdf5_handler#13

@lwandrebeck
Copy link
Contributor

Maybe using https://github.com/aldanor/hdf5-rust could help ? Note that I haven’t used that crate, but readme says « provides thread-safe Rust bindings and high-level wrappers for the HDF5 library API. »
HTH.

@gauteh
Copy link

gauteh commented Jan 14, 2020

Maybe using https://github.com/aldanor/hdf5-rust could help ? Note that I haven’t used that crate, but readme says « provides thread-safe Rust bindings and high-level wrappers for the HDF5 library API. »
HTH.

It also relies on the official HDF5 library, but provides thread-safety through a global lock. This means that a single process can only sequentially use any of the (unsafe) functions from HDF5.

@magnusuMET
Copy link
Member Author

magnusuMET commented Jan 14, 2020

We have https://crates.io/crates/hdf5file, but quite a lot of work is neccessary to get this up to spec, and gain performance relative to a linked thread-safe hdf5

@magnusuMET
Copy link
Member Author

For CDF-1,2,5 we could realistically create a safe dispatcher, I already have a CDF-parser written in nom

@magnusuMET
Copy link
Member Author

Just to demonstrate the crash in netcdf (I have yet to repeat this in raw hdf): https://gist.github.com/magnusuMET/28a7991db0fcb5392b56573837aa7289

@magnusuMET
Copy link
Member Author

netcdf-c does not make any reasonable guarantee regarding thread safety, so a global mutex is needed. If multi-reading is necessary, consider using MPI (requires some additional support in this library), or copy the approach of dars

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants