Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match other implementations #6

Merged
merged 2 commits into from
May 16, 2023
Merged

Match other implementations #6

merged 2 commits into from
May 16, 2023

Conversation

novacrazy
Copy link
Contributor

@novacrazy novacrazy commented May 8, 2023

  • Avoid inlining the destructor
  • Use a load specifically on count instead of a fence, this is what std and triomphe do.

@fereidani
Copy link
Owner

Hi! Thanks for your contribution, drop part looks good to me, but I think you are inaccurate about the fence part, as the std library under the hood is using:

#[cfg(not(sanitize = "thread"))]
macro_rules! acquire {
    ($x:expr) => {
        atomic::fence(Acquire)
    };
}

#[cfg(sanitize = "thread")]
macro_rules! acquire {
    ($x:expr) => {
        $x.load(Acquire)
    };
}

which actually is not loading in the standard build without the thread sanitizer, and is using atomic::fence(Acquire) like what we do.
if you like to add compatibility with thread sanitizer like std, you should include cfg in your commit.

@novacrazy
Copy link
Contributor Author

Unfortunately cfg(sanitize = "thread") is still unstable, so that's not an option.

According to rust-lang/rust#41714, as far as the actual counter is concerned it's basically equivalent, but they use a fence just in case there are other (possibly incorrectly used) nearby atomic things related to the inner data.

triomphe just uses a regular load as well: https://github.com/Manishearth/triomphe/blob/a51e576bf932d2e81de18f84e1bbaaeea3fbfba1/src/arc.rs#L619

Perhaps std is being a bit overly defensive in case the inner user data has done weird things?

@fereidani
Copy link
Owner

Based on my understanding and knowledge of atomic operations, in our specific case of the "drop" operation, using fence(Acquire) should be sufficient. This fence ensures that both read and write operations are not reordered, and we can assume that the fetch_sub operation will be upgraded from Release to AcqRel after the fence is reached.

If we unreasonably use atomic load, it will significantly impact the performance of the code because atomic loads are not cheap.

According to the documentation in sync.rs of the standard library, the reason they use atomic loads with thread sanitizer is to avoid false positives with ThreadSanitizer:

// ThreadSanitizer does not support memory fences. To avoid false positive
// reports in Arc / Weak implementation use atomic loads for synchronization
// instead.

In my opinion, the current implementation of the triomphe drop is not optimized, and there is no justification for us to replicate the same approach.

@fereidani
Copy link
Owner

Furthermore, I believe it would be more beneficial to avoid explicitly marking drop_slow as #[inline(never)] and instead allow the optimizer to determine whether it is appropriate to inline it or not. For instance, if the type does not implement the Drop trait, invoking free on the pointer would result in code of nearly the same size as indirectly calling drop_slow, yet it would be much more efficient.

In general, I am opposed to compelling the compiler to avoid inlining something when we are uncertain if it is a sound strategy.

@novacrazy
Copy link
Contributor Author

I could do:

if mem::needs_drop::<T>() {
     self.drop_slow();
}

so it never generates a call at all if the type doesn't need it. However, it's not like a function call is particularly expensive.

And regarding the memory fence, you're probably right. Perhaps triomphe chose what it did because of the thread sanitizer limitations.

@fereidani
Copy link
Owner

Unfortunately, that will not be a sound approach as the code will leak memory because we need to drop the pointer to data and counter even if the type does not implement the Drop trait. Just remove the #[inline(never)] and everything should be fine.
Actually, the jump cost of calling and returning is high as assembly instructions. By removing the inline(never), we put the compiler in charge to decide if it is cheap to inline the function or not.

@novacrazy
Copy link
Contributor Author

Oh, of course, my mistake. I had been thinking about how I did something very similar in generic-array to avoid unnecessary drop costs, but that was stack-allocated so it slipped my mind. Will make those changes shortly.

@fereidani
Copy link
Owner

No problem at all, it happens.
Nice, let me know when you are ready to run the tests.

@novacrazy
Copy link
Contributor Author

There you go. Only real change now is pulling out the drop.

@fereidani
Copy link
Owner

Thanks, could you please edit your PR as a single commit with title like "Adding drop_slow to Drop trait implementation"?
My bad, i should add contribution guideline to the library asap.

@novacrazy
Copy link
Contributor Author

You should be able to squash commits from GitHub’s UI, as one of the modes of merging.

@fereidani fereidani merged commit eabe4fc into fereidani:main May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants