Match other implementations #6

novacrazy · 2023-05-08T19:00:15Z

Avoid inlining the destructor
~~Use a load specifically on count instead of a fence, this is what std and triomphe do.~~

fereidani · 2023-05-13T11:15:04Z

Hi! Thanks for your contribution, drop part looks good to me, but I think you are inaccurate about the fence part, as the std library under the hood is using:

#[cfg(not(sanitize = "thread"))]
macro_rules! acquire {
    ($x:expr) => {
        atomic::fence(Acquire)
    };
}

#[cfg(sanitize = "thread")]
macro_rules! acquire {
    ($x:expr) => {
        $x.load(Acquire)
    };
}

which actually is not loading in the standard build without the thread sanitizer, and is using atomic::fence(Acquire) like what we do.
if you like to add compatibility with thread sanitizer like std, you should include cfg in your commit.

novacrazy · 2023-05-13T11:47:10Z

Unfortunately cfg(sanitize = "thread") is still unstable, so that's not an option.

According to rust-lang/rust#41714, as far as the actual counter is concerned it's basically equivalent, but they use a fence just in case there are other (possibly incorrectly used) nearby atomic things related to the inner data.

triomphe just uses a regular load as well: https://github.com/Manishearth/triomphe/blob/a51e576bf932d2e81de18f84e1bbaaeea3fbfba1/src/arc.rs#L619

Perhaps std is being a bit overly defensive in case the inner user data has done weird things?

fereidani · 2023-05-13T14:56:36Z

Based on my understanding and knowledge of atomic operations, in our specific case of the "drop" operation, using fence(Acquire) should be sufficient. This fence ensures that both read and write operations are not reordered, and we can assume that the fetch_sub operation will be upgraded from Release to AcqRel after the fence is reached.

If we unreasonably use atomic load, it will significantly impact the performance of the code because atomic loads are not cheap.

According to the documentation in sync.rs of the standard library, the reason they use atomic loads with thread sanitizer is to avoid false positives with ThreadSanitizer:

// ThreadSanitizer does not support memory fences. To avoid false positive
// reports in Arc / Weak implementation use atomic loads for synchronization
// instead.

In my opinion, the current implementation of the triomphe drop is not optimized, and there is no justification for us to replicate the same approach.

fereidani · 2023-05-13T15:08:10Z

Furthermore, I believe it would be more beneficial to avoid explicitly marking drop_slow as #[inline(never)] and instead allow the optimizer to determine whether it is appropriate to inline it or not. For instance, if the type does not implement the Drop trait, invoking free on the pointer would result in code of nearly the same size as indirectly calling drop_slow, yet it would be much more efficient.

In general, I am opposed to compelling the compiler to avoid inlining something when we are uncertain if it is a sound strategy.

novacrazy · 2023-05-13T15:13:23Z

I could do:

if mem::needs_drop::<T>() {
     self.drop_slow();
}

so it never generates a call at all if the type doesn't need it. However, it's not like a function call is particularly expensive.

And regarding the memory fence, you're probably right. Perhaps triomphe chose what it did because of the thread sanitizer limitations.

fereidani · 2023-05-13T15:53:04Z

Unfortunately, that will not be a sound approach as the code will leak memory because we need to drop the pointer to data and counter even if the type does not implement the Drop trait. Just remove the #[inline(never)] and everything should be fine.
Actually, the jump cost of calling and returning is high as assembly instructions. By removing the inline(never), we put the compiler in charge to decide if it is cheap to inline the function or not.

novacrazy · 2023-05-13T15:55:29Z

Oh, of course, my mistake. I had been thinking about how I did something very similar in generic-array to avoid unnecessary drop costs, but that was stack-allocated so it slipped my mind. Will make those changes shortly.

fereidani · 2023-05-13T17:39:37Z

No problem at all, it happens.
Nice, let me know when you are ready to run the tests.

novacrazy · 2023-05-14T00:06:10Z

There you go. Only real change now is pulling out the drop.

fereidani · 2023-05-14T05:17:45Z

Thanks, could you please edit your PR as a single commit with title like "Adding drop_slow to Drop trait implementation"?
My bad, i should add contribution guideline to the library asap.

novacrazy · 2023-05-16T10:27:58Z

You should be able to squash commits from GitHub’s UI, as one of the modes of merging.

Match other implementations

cd90ca2

Requested changes

ee1c35a

fereidani merged commit eabe4fc into fereidani:main May 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match other implementations #6

Match other implementations #6

novacrazy commented May 8, 2023 •

edited

Loading

fereidani commented May 13, 2023

novacrazy commented May 13, 2023

fereidani commented May 13, 2023

fereidani commented May 13, 2023

novacrazy commented May 13, 2023

fereidani commented May 13, 2023

novacrazy commented May 13, 2023

fereidani commented May 13, 2023

novacrazy commented May 14, 2023

fereidani commented May 14, 2023

novacrazy commented May 16, 2023

Match other implementations #6

Match other implementations #6

Conversation

novacrazy commented May 8, 2023 • edited Loading

fereidani commented May 13, 2023

novacrazy commented May 13, 2023

fereidani commented May 13, 2023

fereidani commented May 13, 2023

novacrazy commented May 13, 2023

fereidani commented May 13, 2023

novacrazy commented May 13, 2023

fereidani commented May 13, 2023

novacrazy commented May 14, 2023

fereidani commented May 14, 2023

novacrazy commented May 16, 2023

novacrazy commented May 8, 2023 •

edited

Loading