Skip to content

Conversation

@xingbowang
Copy link
Contributor

Summary:

Some of the stress tests script run tests multiple times with TEST_TMPDIR set. When TEST_TMPDIR is set, the db directory is a fixed string. This caused the same DB directory was reused across db_crashtest.py script run. Each db_crashtest.py run would randomize some of the parameters. This caused different parameters to be used with same DB directory, violating some of the assumption such as use_put_entity_one_in parameter. This change added a suffix to DB directory, so that each db_crashtest.py script run would generate a unique DB directory, which prevents the issue from happening again.

Test Plan:

Stress test local run

TEST_TMPDIR=/tmp/aaa /usr/local/bin/python3 -u tools/db_crashtest.py --stress_cmd=./db_stress --cleanup_cmd='' --simple blackbox  --duration 15 --interval 10

>>> Running db_stress with pid=113810: ./db_stress ... --db=/tmp/aaa/rocksdb_crashtest_blackbox_6967584463401575611 ...

Reviewers:

Subscribers:

Tasks:

Tags:

Summary:

Some of the stress tests script run tests multiple times with
TEST_TMPDIR set. When TEST_TMPDIR is set, the db directory is a fixed
string. This caused the same DB directory was reused across
db_crashtest.py script run. Each db_crashtest.py run would randomize
some of the parameters. This caused different parameters to be used with
same DB directory, violating some of the assumption such as
use_put_entity_one_in parameter. This change added a suffix to DB
directory, so that each db_crashtest.py script run would generate a
unique DB directory, which prevents the issue from happening again.

Test Plan:

Stress test

Reviewers:

Subscribers:

Tasks:

Tags:
@meta-cla meta-cla bot added the CLA Signed label Jan 20, 2026
@meta-codesync
Copy link

meta-codesync bot commented Jan 20, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D91069655.

@xingbowang xingbowang marked this pull request as ready for review January 20, 2026 23:46
Copy link
Contributor

@hx235 hx235 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good finding—this improvement makes the stress test more resilient to directory-deletion errors.

@meta-codesync meta-codesync bot closed this in eb5e1a2 Jan 21, 2026
@meta-codesync
Copy link

meta-codesync bot commented Jan 21, 2026

@xingbowang merged this pull request in eb5e1a2.

pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Jan 21, 2026
Summary: Clearing DB dir for crash test is currently a hodgepodge of
1. Caller of db_crashtest.py maybe tries to clear the dir
2. db_crashtest.py tries to clear the dir in get_dbname() (but ignoring
   failure)
3. db_crashtest.py passes --destroy_db_initially to some db_stress calls
   as needed
4. db_crashtest.py tries to clear the dir between some db_stress calls
5. db_crashtest.py tries to clear the dir after everything is done and
   successful (no artifacts to investigate or save) (but ignoring
   failure)
6. Try to add more uniqueness to the directory from facebook#14249

This change reverts or replaces 2, 4, 5, and 6 by doubling-down on
(expanding) 3 and a small variant of it:

* crash_test.mk passes --destroy_db_initially=1 so that the first run of
  db_stress clears the db dir.
* After each db_stress invocation, db_crashtest.py resets
  destroy_db_initially=0 so that the next invocation reuses the same DB,
  except in cases where there is an incompatibility that requires a
  fresh DB (from cases 3 and 4 above).
* On success, uses new `db_stress --destroy_db_and_exit` option to clean
  up the DB dir without needing a custom cleanup_cmd (now ignored)

Note that although case 1 is likely obsolete, it is out of control of an
open source PR.

Test Plan: some manual runs
meta-codesync bot pushed a commit that referenced this pull request Jan 22, 2026
Summary:
Clearing DB dir for crash test is currently a hodgepodge of
1. Caller of db_crashtest.py maybe tries to clear the dir
2. db_crashtest.py tries to clear the dir in get_dbname() (but ignoring failure)
3. db_crashtest.py passes --destroy_db_initially to some db_stress calls as needed
4. db_crashtest.py tries to clear the dir between some db_stress calls
5. db_crashtest.py tries to clear the dir after everything is done and successful (no artifacts to investigate or save) (but ignoring failure)
6. Try to add more uniqueness to the directory from #14249

This change reverts or replaces 2, 4, 5, and 6 by doubling-down on (expanding) 3 and a small variant of it:

* crash_test.mk passes --destroy_db_initially=1 so that the first run of db_stress clears the db dir.
* After each db_stress invocation, db_crashtest.py resets destroy_db_initially=0 so that the next invocation reuses the same DB, except in cases where there is an incompatibility that requires a fresh DB (from cases 3 and 4 above).
* On success, uses new `db_stress --destroy_db_and_exit` option to clean up the DB dir without needing a custom cleanup_cmd (now ignored)

Note that although case 1 is likely obsolete, it is out of control of an open source PR.

Pull Request resolved: #14254

Test Plan: some manual runs

Reviewed By: xingbowang

Differential Revision: D91164731

Pulled By: pdillinger

fbshipit-source-id: 0a66c8c0e130c9eeacc55af411a18a09bc9debdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants