Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failures on OSS platform #63

Closed
sathyaphoenix opened this issue Oct 13, 2021 Discussed in #61 · 5 comments
Closed

Test failures on OSS platform #63

sathyaphoenix opened this issue Oct 13, 2021 Discussed in #61 · 5 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@sathyaphoenix
Copy link
Contributor

Discussed in #61

Originally posted by vicvicg October 11, 2021
When running CacheLib tests following the instructions (https://cachelib.org/docs/installation/testing), we get different test pass rates depending on the environment, some test failures seem to be intermittent, and we haven’t seen 100% pass rate. Is there a recommended system set up and subset of tests that we can use as an acceptance criteria for code changes?

NvmCacheTests.ConcurrentFills failure :

I0930 19:26:45.051738  8699 BigHash.cpp:110] Reset BigHash
I0930 19:26:45.051754  8699 BlockCache.cpp:611] Reset block cache
/opt/workspace/cachelib/allocator/nvmcache/tests/NvmCacheTests.cpp:385: Failure
Expected: (hdl) != (nullptr), actual: nullptr vs (nullptr)
/opt/workspace/cachelib/allocator/nvmcache/tests/NvmCacheTests.cpp:385: Failure
Expected: (hdl) != (nullptr), actual: nullptr vs (nullptr)

Timer tests failure: This seems like a poorly written test that does not account for timing in code with sleep

Running main() from /opt/workspace/cachelib/external/googletest/googletest/src/gtest_main.cc
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from Util
[ RUN      ] Util.TimerTest
/opt/workspace/cachelib/common/tests/TimeTests.cpp:40: Failure
Expected equality of these values:
  timer.getDurationMs()
    Which is: 1487
  rnd
    Which is: 1484
[  FAILED  ] Util.TimerTest (1487 ms)
[----------] 1 test from Util (1487 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (1487 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Util.TimerTest

 1 FAILED TEST
@sathyaphoenix sathyaphoenix changed the title GTEST failures on OSS platform Test failures on OSS platform Oct 13, 2021
@sathyaphoenix sathyaphoenix self-assigned this Oct 13, 2021
@sathyaphoenix sathyaphoenix added the good first issue Good for newcomers label Oct 13, 2021
@vicvicg
Copy link

vicvicg commented Oct 13, 2021

Here's contents of dockerfile that we use to build containers to reproduce the mentioned failures:
FROM registry.hub.docker.com/library/centos:8

RUN dnf install -y
sudo
git
tzdata
vim
gdb
clang

COPY contrib/prerequisites-centos8.sh prerequisites-centos8.sh
RUN sed 's/sudo //' -i prerequisites-centos8.sh
RUN ./prerequisites-centos8.sh

Docker run command:

docker run --rm --name vic --tmpfs /tmp -v /home/vic/cachlib/:/opt/workspace:z -e http_proxy=<…> -e https_proxy=<…> -it cachelib:centos-8 /bin/bash

@agordon
Copy link
Contributor

agordon commented Oct 13, 2021

@vicvicg - thanks, I'm working on reproducing it locally. Does your docker environment enforces any additional restrictions (e.g. seccomp, apparmor, or similar) ?

@vicvicg
Copy link

vicvicg commented Oct 14, 2021

@vicvicg - thanks, I'm working on reproducing it locally. Does your docker environment enforces any additional restrictions (e.g. seccomp, apparmor, or similar) ?

@agordon: No, our docker environment doesn't enforce any additional restrictions.

@haowu14
Copy link
Contributor

haowu14 commented Jul 7, 2022

Hi! We found a problem in NvmCacheTests.ConcurrentFills that under a certain race condition it fails. We will be working on fixing this.

facebook-github-bot pushed a commit that referenced this issue Jul 20, 2022
Summary:
When an item is insertOrReplaced into hybrid cache, a nvm.remove is scheduled.
If the remove is still in flight (not completed) when the same item is evicted from RAM, the nvm.put from DRAM eviction could get aborted.
This diff changes the unit test to account for this case.
We anticipate this solves a problem mentioned in #150 and #63.

Reviewed By: therealgymmy

Differential Revision: D37807056

fbshipit-source-id: ba27a31eb418b41b0e3223d1644830e38970387d
@haowu14
Copy link
Contributor

haowu14 commented Aug 22, 2022

@vicvicg Do you still see test failures in the built?

@haowu14 haowu14 closed this as completed Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants