Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove a few parking_lot dependencies for faster builds #969

Merged
merged 2 commits into from Aug 23, 2022

Conversation

jplatte
Copy link
Collaborator

@jplatte jplatte commented Aug 23, 2022

Unfortunately we still always pull in parking_lot 0.11 (sled, wasm-timer) which I suspect is not really needed, plus parking_lot 0.12 (but only on target_os = "linux" when checking / building benchmarks). That is, the overall reduction in dependencies is very small, but at least the parking_lot 0.12 dependency for wasm and macos through the examples is gone, so it should speed up those CI jobs a little bit.

@jplatte jplatte requested a review from a team August 23, 2022 09:27
Copy link
Contributor

@gnunicorn gnunicorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has the std::sync::RwLock cought up to the speed and safety of parking_lot now? My understanding was that parking_lot is quite a bit faster at runtime and has certain checks (as in for deadlocks) that it can even do at compile time. (Honestly, I have never checked that myself, I was only told that). I am not convinced I want to trade in the compile time improvement over loosing the ability to check for deadlocks at compile time.

Either way, even if we decide to go this way, please let's refrain from doing unwrap() in any non-example non-test-code but always use expect(MSG) (ideally with a unique MSG) instead, stating the specific assumptions that are made why this should never fail. Reason being that these assumption might be broken by later changes, and if all you get is an unwrap-panic it is hard to figure out where that is (in particular in FFI) and why that breaks. A unique message explaining the underlying assumption is easier to find in the source code and understand why you might have violated that with your changes now.

@codecov
Copy link

codecov bot commented Aug 23, 2022

Codecov Report

Merging #969 (75812ef) into main (9462061) will not change coverage.
The diff coverage is n/a.

❗ Current head 75812ef differs from pull request most recent head 0fdd4fc. Consider uploading reports for the commit 0fdd4fc to get more accurate results

@@           Coverage Diff           @@
##             main     #969   +/-   ##
=======================================
  Coverage   80.35%   80.35%           
=======================================
  Files         110      110           
  Lines       15809    15809           
=======================================
  Hits        12704    12704           
  Misses       3105     3105           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@jplatte
Copy link
Collaborator Author

jplatte commented Aug 23, 2022

Has the std::sync::RwLock cought up to the speed and safety of parking_lot now?

Yes: https://blog.rust-lang.org/2022/06/30/Rust-1.62.0.html#thinner-faster-mutexes-on-linux

My understanding was that parking_lot is quite a bit faster at runtime and has certain checks (as in deadlocks) that it can prevent even at compile time from (honestly, I have never checked that, I was only told that). I am not convinced I want to trade in the compile time improvement over loosing the ability to check for deadlocks at compile time.

There's no extra compile-time guarantees AFAIK, the docs mention a few differences of which the most important one is probably that parking_lots Mutex doesn't have poisoning, i.e. when a thread panics while holding a mutex / rwlock guard, the mutex / rwlock will simply be unlocked.

Either way, even if we decide to go this way, please let's refrain from doing unwrap() in any non-example non-test-code but always use expect(MSG) (ideally with a unique MSG) instead, stating the specific assumptions that are made why this should never fail. Reason being that these assumption might be broken by later changes, and if all you get is an unwrap-panic it is hard to figure out where that is (in particular in FFI) and why that breaks. A unique message explaining the underlying assumption is easier to find in the source code and understand why you might have violated that with your changes now.

I disagree for this case, using .expect("RwLock isn't poisoned") is pointless noise IMO (and I'm not alone with that opinion). lock / read / write returning a Result instead of panicking when the underlying mutex / rwlock is poisoned is regarded by some parts of the community as one of the standard libraries biggest mistakes, as the vast majority of calls to these functions are followed by .unwrap(), and not just in sloppily-written code.

Copy link
Member

@Hywan Hywan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think using some expect(msg) instead of all those unwrap would help the user to diagnostic the error faster?

@jplatte
Copy link
Collaborator Author

jplatte commented Aug 23, 2022

No, because there is no useful detail we can add. .unwrap() will already print "poisoned lock: another task failed inside" PoisonError and we won't know anything extra.

@gnunicorn
Copy link
Contributor

gnunicorn commented Aug 23, 2022

Fair enough on replacing parking_lot, but on the unwrap vs. expect you didn't address my main concern (and neither does the opinion you linked to): debugability. Afaics, in the ffi-scenario with a release-build lib, we won't get a nice source-line reference unwrapping of the panic, we will only see the debug print version of the error message (which just says PoinsonError on a stack-trace that doesn't even tell us which fn unwrap was called, nor which lock this is about). Instead if we had an .expect("Login can acquire client lock"), there is a unique error message that is easy to identify and start from.

using .expect("RwLock isn't poisoned") is pointless noise

Yes, totally agreed. But that's not what I asked for (nor the "" or "Unwrap failed" variants that where hinted at in the linked blog post). I was referring to actually descriptive and ideally unique pointers that guide the person reading the error to figure out what went wrong since they changed stuff. Ideally these errors never pop up, no question, but when they do they are hard to figure out.

In the current code base we are even holding a lock around callback we do outside of our own codebase into the external environment (the delegates), which means the thread failure might not even be in our side of what is called, leaving a poisened lock.

"Login can acquire client lock" makes it a lot more obvious which lock was poisoned and where to search for the problem.

Within a pure-rust environment I can accept the argument that the stack trace will tell you enough - though here we often return anyhow::Result<()> and could just do ? rather than panicking. But if the target is a staticlibs and/or embedded in other environments (like also wasm), you often have way to little to go with when a panic kills the process, and we get blamed if even the calling environment is at fault. This little context can get you a long way, when trying to figure it out.

No, because there is no useful detail we can add. .unwrap() will already print PoisonError and we won't know anything extra.

That depends on what you write. See above.

@jplatte
Copy link
Collaborator Author

jplatte commented Aug 23, 2022

Fair enough on replacing parking_lot, but on the unwrap vs. expect you didn't address my main concern (and neither does the opinion you linked to): debugability. Afaics, in the ffi-scenario with a release-build lib, we won't get a nice source-line reference unwrapping of the panic, we will only see the debug print version of the error message (which just says PoinsonError on a stack-trace that doesn't even tell us which fn unwrap was called, nor which lock this is about. Instead if we had an .expect("Login can acquire client lock"), there is a unique error message that is easy to identify and start from.

But that's not at all relevant. A poison error happens when some other task / thread first panics while holding the same log, and that's the error you will want to debug, not further tasks failing because the initial panic poisoned a lock.

"Login can acquire client lock" makes it a lot more obvious which lock was poisoned

Sure.

and where to search for the problem.

That's the thing – it doesn't (except through knowing which lock it was). The relevant panic message / location is always the one from the panic that caused the lock to be poisoned in the first place, not the message / location of the next thread trying to acquire the same lock and failing.

@gnunicorn
Copy link
Contributor

gnunicorn commented Aug 23, 2022

except through knowing which lock it was

that's already worth a lot, if the code you look at is:

                .sync_with_callback(sync_settings, |sync_response| async {
                    if !state.read().unwrap().has_first_synced {
                        state.write().unwrap().has_first_synced = true;
                    }

                    if state.read().unwrap().should_stop_syncing {
                        state.write().unwrap().is_syncing = false;
                        return LoopCtrl::Break;
                    } else if !state.read().unwrap().is_syncing {
                        state.write().unwrap().is_syncing = true;
                    }

                    if let Some(delegate) = &*delegate.read().unwrap() {
                        delegate.did_receive_sync_update()
                    }

or

            *self.client.write().unwrap() = Some(client);
            *self.homeserver_details.write().unwrap() = Some(details);

@gnunicorn
Copy link
Contributor

gnunicorn commented Aug 23, 2022

The relevant panic message / location is always the one from the panic that caused the lock to be poisoned in the first place, not the message / location of the next thread trying to acquire the same lock and failing.

Agreed. Ideally, that is true. But if that happens outside of rust, e.g. with delegate in the outer environment, and then they call the following function, the only thing they will tell us login panics with PoisonError. If we can point that back to them fucking up with the client or the delegate, that already helps.

I am saying in FFI or any other embedded env, we might not have the luxury of being informed about these properly as "we have dealt with the issue", leaving a dangling lock on our side and all we get are the breadcrumbs we leave ourselves...

@jplatte
Copy link
Collaborator Author

jplatte commented Aug 23, 2022

Agreed. Ideally, that is true.

No, not just ideally. There is no way non-Rust code can poison a Rust mutex.

@gnunicorn
Copy link
Contributor

gnunicorn commented Aug 23, 2022

There is no way non-Rust code can poison a Rust mutex.

I honestly don't know the std Mutex enough and reading the code it isn't entirely clear to me, if that is true - the panicking-code isn't that obvious to follow for ffi-cases. I wonder where you get that confidence from. Is it because if the thread is failing outside of the FFI callback, the drop on MutexGuard is never called? In which case it means we'd not poisioned indeed, but the lock would deadlock, wouldn't it? Or because any death in the thread outside of Rust would not be handled by rust as a panic? But in that case, I don't understand, how we'd actually be calling drop either.

So, if I read this correctly, for neither parking_lot nor std::sync::RwLock are we able to recover from a failure of the thread caused within the delegate over ffi, is that correct? And in neither case do we actually panic/unwind, but for both we'd simply deadlock on the next attempt to lock?!?

If that is what it is, then ... okay... We should probably clearly document on the delegates that if they fail and cause the thread to unwind, we are not able to continue operation and will run into a halt without any error soon after. It's not something I'd necessarily expect to happen ...

But then yeah, if you really feel that strongly about having unwrap() rather than expect, and we can be fairly certain they will never be seen from ffi, sure, go for it. But we probably should document this much more terrible and annoying failure case.

@jplatte jplatte enabled auto-merge (rebase) August 23, 2022 15:34
@jplatte jplatte merged commit 0b84624 into main Aug 23, 2022
@jplatte jplatte deleted the jplatte/rm-parking-lot branch August 23, 2022 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants