Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_stack fail on linux/arm64 (clang) #45

Open
ivmai opened this issue Apr 2, 2020 · 13 comments
Open

test_stack fail on linux/arm64 (clang) #45

ivmai opened this issue Apr 2, 2020 · 13 comments

Comments

@ivmai
Copy link
Owner

ivmai commented Apr 2, 2020

Build: https://travis-ci.org/github/ivmai/libatomic_ops/jobs/670012675
Source: master (0cb2374)
Compiler: clang version 3.8.0-2ubuntu4
Target: Linux/arm64 (aarch64-unknown-linux-gnu)
How to reproduce: configure --enable-assertions && make -j check CFLAGS_EXTRA="-O3" CC=clang
Reproduction rate: ~1/20

@ivmai
Copy link
Owner Author

ivmai commented Jan 2, 2022

Not reproduced locally with clang-12 (RPi4, Ubuntu 21.04/arm64) on latest master (1668e46), ~100 runs.

@ivmai
Copy link
Owner Author

ivmai commented Jan 3, 2022

Reproduced on Travis CI (latest master, commit 1668e46): https://app.travis-ci.com/github/ivmai/libatomic_ops/jobs/554045328 (1st test run)

Compiler: clang version 3.8.0-2ubuntu4
Config: CONF_OPTIONS="--enable-assertions"

Output (test_stack):
Missing list element <...>

@ivmai
Copy link
Owner Author

ivmai commented Jan 3, 2022

Not reproduced w/o assertions or w/o (-O2 or -O3). Not reproduced with gcc. Not reproduced if AO_STACK_IS_LOCK_FREE.

@ivmai
Copy link
Owner Author

ivmai commented Feb 8, 2022

Reproduced on latest master (401f7b7)
Compiler: clang version 3.8.1-24 (tags/RELEASE_381/final)
Target: Linux/arm64 (aarch64-unknown-linux-gnu)
Host: gcc117 (farm)
Occurrence: ~1/45

@ivmai
Copy link
Owner Author

ivmai commented Feb 15, 2022

Build: https://app.travis-ci.com/github/ivmai/libatomic_ops/jobs/559848368
Source: master (6be0bc8)
Target: aarch64-unknown-linux-gnu
Compiler: clang version 3.8.0-2ubuntu4
Config: CFLAGS_EXTRA="-O3 -D N_EXPERIMENTS=10" CONF_OPTIONS="--enable-assertions"
Output (test_stack):
Failed - nothing to pop
Use almost-lock-free implementation

@ivmai
Copy link
Owner Author

ivmai commented Feb 16, 2022

Reproduced on gcc117 (farm).
Source: master (6be0bc8)
Compiler: clang version 3.8.1-24 (tags/RELEASE_381/final)
Target: Linux/arm64 (aarch64-unknown-linux-gnu)
How to: ./configure --enable-assertions && make -j check CFLAGS_EXTRA="-D N_EXPERIMENTS=2" CC=clang
Occurrence: ~1/10

Also reproduced with: clang -O3 -D N_EXPERIMENTS=2 -I src tests/test_stack.c src/*.c -lpthread
(if make -j check is running repeatedly in parallel)

Output (test_stack):
Use almost-lock-free implementation
Missing list element <...>

@ivmai
Copy link
Owner Author

ivmai commented Feb 16, 2022

Also, maybe the existing fix is not correct: 7ce7da4

@ivmai
Copy link
Owner Author

ivmai commented Feb 16, 2022

I've reviewed the code in test_stack.c - I don't see any bug with the test code.

@ivmai
Copy link
Owner Author

ivmai commented Feb 16, 2022

After preprocessing of atomic_ops_stack.c, the code looks like:

  void AO_stack_push_explicit_aux_release(volatile AO_t *list, AO_t *x, AO_stack_aux *a)
  {
    AO_t x_bits = (AO_t)x;
    AO_t next;
  retry:
    {
      AO_t entry1 = AO_load(&a->AO_stack_bl[0]);
      AO_t entry2 = AO_load(&a->AO_stack_bl[1]);
      if (entry1 == x_bits || entry2 == x_bits) {
        ++x_bits;
        if ((x_bits & 7) == 0)
          x_bits = (AO_t)x;
        goto retry;
      }
    }

    do {
        next = AO_load(list);
        *x = next;
    } while (!AO_compare_and_swap_release(list, next, x_bits));
  }

  AO_t *AO_stack_pop_explicit_aux_acquire(volatile AO_t *list, AO_stack_aux *a)
  {
    unsigned i;
    int j = 0;
    AO_t first;
    AO_t *first_ptr;
    AO_t next;

  retry:
    first = AO_load(list);
    if (0 == first) return 0;

    for (i = 0;;) {
      if (AO_compare_and_swap_acquire(a->AO_stack_bl+i, 0, first))
        break;
      if (++i >= 2) {
        i = 0;
        AO_pause(++j);
      }
    }
    assert(a->AO_stack_bl[i] == first);
    if (first != AO_load(list)) { /* or AO_load_acquire as WA for powerpc */
      AO_store_release(a->AO_stack_bl+i, 0);
      goto retry;
    }

    first_ptr = (AO_t *)(first & ~7);
    next = AO_load(first_ptr);
    if (!AO_compare_and_swap_release(list, first, next)) {
      AO_store_release(a->AO_stack_bl+i, 0);
      goto retry;
    }

    assert(*list != first);
    AO_store_release(a->AO_stack_bl+i, 0);
    return first_ptr;
  }

@ivmai
Copy link
Owner Author

ivmai commented Feb 16, 2022

Hello @hboehm,
Could you please give some hints regarding this test failure?
It relates to almost-non-blocking implementation. I don't see any issue in test_stack.c code itself.

Some questions (based on the preprocessed code above):

  1. stack_push: AO_t entry1 = AO_load(&a->AO_stack_bl[0]): is no acq barrier OK?
  2. stack_push: while (!AO_compare_and_swap_release(list, next, x_bits)): isn't it better to goto retry on CAS fail?
  3. stack_push: do { ... : what would happen if a long pause occurs between ensuring entry != x_bits and next = AO_load(list)?
  4. stack_pop: first = AO_load(list): is no acq barrier OK?
  5. Once I saw violation of assert(*list != first): is this possible provided the AO primitives are correct?

PS. I haven't looked into the generated code yet (unfortunately I managed to reproduced it only with clang 3.8).

@ivmai
Copy link
Owner Author

ivmai commented Feb 17, 2022

  if (AO_EXPECT_FALSE(first != AO_load_acquire(list)))
                   /* Workaround test failure on AIX, at least, by */

Removing this W/A (AO_load_acquire -> AO_load) in AO_stack_pop_explicit_aux_acquire leads to almost 100% violation in assert(*list != first).

ivmai added a commit that referenced this issue Feb 20, 2022
Issue #45 (libatomic_ops).

Enforce proper alignment of AO_stack_t.AO_pa to avoid the structure
value to cross the CPU cache line boundary.  A workaround for
almost-lock-free push/pop test failures on aarch64, at least.

* src/atomic_ops_stack.h [!AO_STACK_ATTR_ALLIGNED]
(AO_STACK_ATTR_ALLIGNED): Define.
* src/atomic_ops_stack.h (AO_stack_t.AO_pa): Add AO_STACK_ATTR_ALLIGNED
attribute.
@ivmai
Copy link
Owner Author

ivmai commented Feb 20, 2022

No test failure after W/A commit c664a61
But is there any better fix?

ivmai added a commit that referenced this issue Aug 11, 2022
(a cherry-pick of commit c664a61 from 'master')

Issue #45 (libatomic_ops).

Enforce proper alignment of AO_stack_t.AO_ptr to avoid the structure
value to cross the CPU cache line boundary.  A workaround for
almost-lock-free push/pop test failures on aarch64, at least.

* src/atomic_ops_stack.h [AO_USE_ALMOST_LOCK_FREE
&& !AO_STACK_ATTR_ALLIGNED] (AO_STACK_ATTR_ALLIGNED): Define.
* src/atomic_ops_stack.h [AO_USE_ALMOST_LOCK_FREE]
(AO_stack_t.AO_ptr): Add AO_STACK_ATTR_ALLIGNED attribute.
ivmai added a commit that referenced this issue Aug 12, 2022
(a cherry-pick of commit ed712f7 from 'release-7_6')

Issue #45 (libatomic_ops).

Enforce proper alignment of AO_stack_t.AO_ptr to avoid the structure
value to cross the CPU cache line boundary.  A workaround for
almost-lock-free push/pop test failures on aarch64, at least.

* src/atomic_ops_stack.h [AO_USE_ALMOST_LOCK_FREE
&& !AO_STACK_ATTR_ALLIGNED] (AO_STACK_ATTR_ALLIGNED): Define.
* src/atomic_ops_stack.h [AO_USE_ALMOST_LOCK_FREE]
(AO_stack_t.AO_ptr): Add AO_STACK_ATTR_ALLIGNED attribute.
ivmai added a commit that referenced this issue Aug 12, 2022
(a cherry-pick of commit ed712f7 from 'release-7_6')

Issue #45 (libatomic_ops).

Enforce proper alignment of AO_stack_t.AO_ptr to avoid the structure
value to cross the CPU cache line boundary.  A workaround for
almost-lock-free push/pop test failures on aarch64, at least.

* src/atomic_ops_stack.h [AO_USE_ALMOST_LOCK_FREE
&& !AO_STACK_ATTR_ALLIGNED] (AO_STACK_ATTR_ALLIGNED): Define.
* src/atomic_ops_stack.h [AO_USE_ALMOST_LOCK_FREE]
(AO_stack_t.AO_ptr): Add AO_STACK_ATTR_ALLIGNED attribute.
@ivmai
Copy link
Owner Author

ivmai commented May 21, 2024

Hello @hboehm,
Do you have any comment on this issue? I did not see the fail after the W/A. I could close the issue or leave it open for future investigation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant