Remove a few parking_lot dependencies for faster builds #969

jplatte · 2022-08-23T09:27:36Z

Unfortunately we still always pull in parking_lot 0.11 (sled, wasm-timer) which I suspect is not really needed, plus parking_lot 0.12 (but only on target_os = "linux" when checking / building benchmarks). That is, the overall reduction in dependencies is very small, but at least the parking_lot 0.12 dependency for wasm and macos through the examples is gone, so it should speed up those CI jobs a little bit.

gnunicorn

Has the std::sync::RwLock cought up to the speed and safety of parking_lot now? My understanding was that parking_lot is quite a bit faster at runtime and has certain checks (as in for deadlocks) that it can even do at compile time. (Honestly, I have never checked that myself, I was only told that). I am not convinced I want to trade in the compile time improvement over loosing the ability to check for deadlocks at compile time.

Either way, even if we decide to go this way, please let's refrain from doing unwrap() in any non-example non-test-code but always use expect(MSG) (ideally with a unique MSG) instead, stating the specific assumptions that are made why this should never fail. Reason being that these assumption might be broken by later changes, and if all you get is an unwrap-panic it is hard to figure out where that is (in particular in FFI) and why that breaks. A unique message explaining the underlying assumption is easier to find in the source code and understand why you might have violated that with your changes now.

codecov · 2022-08-23T09:51:53Z

Codecov Report

Merging #969 (75812ef) into main (9462061) will not change coverage.
The diff coverage is n/a.

❗ Current head 75812ef differs from pull request most recent head 0fdd4fc. Consider uploading reports for the commit 0fdd4fc to get more accurate results

@@           Coverage Diff           @@
##             main     #969   +/-   ##
=======================================
  Coverage   80.35%   80.35%           
=======================================
  Files         110      110           
  Lines       15809    15809           
=======================================
  Hits        12704    12704           
  Misses       3105     3105

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

jplatte · 2022-08-23T10:00:35Z

Has the std::sync::RwLock cought up to the speed and safety of parking_lot now?

Yes: https://blog.rust-lang.org/2022/06/30/Rust-1.62.0.html#thinner-faster-mutexes-on-linux

My understanding was that parking_lot is quite a bit faster at runtime and has certain checks (as in deadlocks) that it can prevent even at compile time from (honestly, I have never checked that, I was only told that). I am not convinced I want to trade in the compile time improvement over loosing the ability to check for deadlocks at compile time.

There's no extra compile-time guarantees AFAIK, the docs mention a few differences of which the most important one is probably that parking_lots Mutex doesn't have poisoning, i.e. when a thread panics while holding a mutex / rwlock guard, the mutex / rwlock will simply be unlocked.

Either way, even if we decide to go this way, please let's refrain from doing unwrap() in any non-example non-test-code but always use expect(MSG) (ideally with a unique MSG) instead, stating the specific assumptions that are made why this should never fail. Reason being that these assumption might be broken by later changes, and if all you get is an unwrap-panic it is hard to figure out where that is (in particular in FFI) and why that breaks. A unique message explaining the underlying assumption is easier to find in the source code and understand why you might have violated that with your changes now.

I disagree for this case, using .expect("RwLock isn't poisoned") is pointless noise IMO (and I'm not alone with that opinion). lock / read / write returning a Result instead of panicking when the underlying mutex / rwlock is poisoned is regarded by some parts of the community as one of the standard libraries biggest mistakes, as the vast majority of calls to these functions are followed by .unwrap(), and not just in sloppily-written code.

Hywan

Do you think using some expect(msg) instead of all those unwrap would help the user to diagnostic the error faster?

jplatte · 2022-08-23T13:02:45Z

No, because there is no useful detail we can add. .unwrap() will already print ~~"poisoned lock: another task failed inside"~~ PoisonError and we won't know anything extra.

gnunicorn · 2022-08-23T13:07:18Z

Fair enough on replacing parking_lot, but on the unwrap vs. expect you didn't address my main concern (and neither does the opinion you linked to): debugability. Afaics, in the ffi-scenario with a release-build lib, we won't get a nice source-line reference unwrapping of the panic, we will only see the debug print version of the error message (which just says PoinsonError on a stack-trace that doesn't even tell us which fn unwrap was called, nor which lock this is about). Instead if we had an .expect("Login can acquire client lock"), there is a unique error message that is easy to identify and start from.

using .expect("RwLock isn't poisoned") is pointless noise

Yes, totally agreed. But that's not what I asked for (nor the "" or "Unwrap failed" variants that where hinted at in the linked blog post). I was referring to actually descriptive and ideally unique pointers that guide the person reading the error to figure out what went wrong since they changed stuff. Ideally these errors never pop up, no question, but when they do they are hard to figure out.

In the current code base we are even holding a lock around callback we do outside of our own codebase into the external environment (the delegates), which means the thread failure might not even be in our side of what is called, leaving a poisened lock.

"Login can acquire client lock" makes it a lot more obvious which lock was poisoned and where to search for the problem.

Within a pure-rust environment I can accept the argument that the stack trace will tell you enough - though here we often return anyhow::Result<()> and could just do ? rather than panicking. But if the target is a staticlibs and/or embedded in other environments (like also wasm), you often have way to little to go with when a panic kills the process, and we get blamed if even the calling environment is at fault. This little context can get you a long way, when trying to figure it out.

No, because there is no useful detail we can add. .unwrap() will already print PoisonError and we won't know anything extra.

That depends on what you write. See above.

jplatte · 2022-08-23T13:11:38Z

Fair enough on replacing parking_lot, but on the unwrap vs. expect you didn't address my main concern (and neither does the opinion you linked to): debugability. Afaics, in the ffi-scenario with a release-build lib, we won't get a nice source-line reference unwrapping of the panic, we will only see the debug print version of the error message (which just says PoinsonError on a stack-trace that doesn't even tell us which fn unwrap was called, nor which lock this is about. Instead if we had an .expect("Login can acquire client lock"), there is a unique error message that is easy to identify and start from.

But that's not at all relevant. A poison error happens when some other task / thread first panics while holding the same log, and that's the error you will want to debug, not further tasks failing because the initial panic poisoned a lock.

"Login can acquire client lock" makes it a lot more obvious which lock was poisoned

Sure.

and where to search for the problem.

That's the thing – it doesn't (except through knowing which lock it was). The relevant panic message / location is always the one from the panic that caused the lock to be poisoned in the first place, not the message / location of the next thread trying to acquire the same lock and failing.

gnunicorn · 2022-08-23T13:13:54Z

except through knowing which lock it was

that's already worth a lot, if the code you look at is:

                .sync_with_callback(sync_settings, |sync_response| async {
                    if !state.read().unwrap().has_first_synced {
                        state.write().unwrap().has_first_synced = true;
                    }

                    if state.read().unwrap().should_stop_syncing {
                        state.write().unwrap().is_syncing = false;
                        return LoopCtrl::Break;
                    } else if !state.read().unwrap().is_syncing {
                        state.write().unwrap().is_syncing = true;
                    }

                    if let Some(delegate) = &*delegate.read().unwrap() {
                        delegate.did_receive_sync_update()
                    }

or

            *self.client.write().unwrap() = Some(client);
            *self.homeserver_details.write().unwrap() = Some(details);

gnunicorn · 2022-08-23T13:19:44Z

The relevant panic message / location is always the one from the panic that caused the lock to be poisoned in the first place, not the message / location of the next thread trying to acquire the same lock and failing.

Agreed. Ideally, that is true. But if that happens outside of rust, e.g. with delegate in the outer environment, and then they call the following function, the only thing they will tell us login panics with PoisonError. If we can point that back to them fucking up with the client or the delegate, that already helps.

I am saying in FFI or any other embedded env, we might not have the luxury of being informed about these properly as "we have dealt with the issue", leaving a dangling lock on our side and all we get are the breadcrumbs we leave ourselves...

jplatte · 2022-08-23T13:21:19Z

Agreed. Ideally, that is true.

No, not just ideally. There is no way non-Rust code can poison a Rust mutex.

gnunicorn · 2022-08-23T14:37:42Z

There is no way non-Rust code can poison a Rust mutex.

I honestly don't know the std Mutex enough and reading the code it isn't entirely clear to me, if that is true - the panicking-code isn't that obvious to follow for ffi-cases. I wonder where you get that confidence from. Is it because if the thread is failing outside of the FFI callback, the drop on MutexGuard is never called? In which case it means we'd not poisioned indeed, but the lock would deadlock, wouldn't it? Or because any death in the thread outside of Rust would not be handled by rust as a panic? But in that case, I don't understand, how we'd actually be calling drop either.

So, if I read this correctly, for neither parking_lot nor std::sync::RwLock are we able to recover from a failure of the thread caused within the delegate over ffi, is that correct? And in neither case do we actually panic/unwind, but for both we'd simply deadlock on the next attempt to lock?!?

If that is what it is, then ... okay... We should probably clearly document on the delegates that if they fail and cause the thread to unwind, we are not able to continue operation and will run into a halt without any error soon after. It's not something I'd necessarily expect to happen ...

But then yeah, if you really feel that strongly about having unwrap() rather than expect, and we can be fairly certain they will never be seen from ffi, sure, go for it. But we probably should document this much more terrible and annoying failure case.

jplatte requested a review from a team August 23, 2022 09:27

gnunicorn suggested changes Aug 23, 2022

View reviewed changes

Hywan reviewed Aug 23, 2022

View reviewed changes

gnunicorn approved these changes Aug 23, 2022

View reviewed changes

jplatte added 2 commits August 23, 2022 17:34

chore(bindings): Replace parking_lot RwLock by std RwLock

76198a2

chore: Reduce indirect dependencies of examples

0fdd4fc

jplatte force-pushed the jplatte/rm-parking-lot branch from 75812ef to 0fdd4fc Compare August 23, 2022 15:34

jplatte enabled auto-merge (rebase) August 23, 2022 15:34

jplatte merged commit 0b84624 into main Aug 23, 2022

jplatte deleted the jplatte/rm-parking-lot branch August 23, 2022 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove a few parking_lot dependencies for faster builds #969

Remove a few parking_lot dependencies for faster builds #969

jplatte commented Aug 23, 2022

gnunicorn left a comment •

edited

codecov bot commented Aug 23, 2022 •

edited

jplatte commented Aug 23, 2022

Hywan left a comment

jplatte commented Aug 23, 2022 •

edited

gnunicorn commented Aug 23, 2022 •

edited

jplatte commented Aug 23, 2022

gnunicorn commented Aug 23, 2022 •

edited

gnunicorn commented Aug 23, 2022 •

edited

jplatte commented Aug 23, 2022

gnunicorn commented Aug 23, 2022 •

edited

Remove a few parking_lot dependencies for faster builds #969

Remove a few parking_lot dependencies for faster builds #969

Conversation

jplatte commented Aug 23, 2022

gnunicorn left a comment • edited

Choose a reason for hiding this comment

codecov bot commented Aug 23, 2022 • edited

Codecov Report

jplatte commented Aug 23, 2022

Hywan left a comment

Choose a reason for hiding this comment

jplatte commented Aug 23, 2022 • edited

gnunicorn commented Aug 23, 2022 • edited

jplatte commented Aug 23, 2022

gnunicorn commented Aug 23, 2022 • edited

gnunicorn commented Aug 23, 2022 • edited

jplatte commented Aug 23, 2022

gnunicorn commented Aug 23, 2022 • edited

gnunicorn left a comment •

edited

codecov bot commented Aug 23, 2022 •

edited

jplatte commented Aug 23, 2022 •

edited

gnunicorn commented Aug 23, 2022 •

edited

gnunicorn commented Aug 23, 2022 •

edited

gnunicorn commented Aug 23, 2022 •

edited

gnunicorn commented Aug 23, 2022 •

edited