Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIFO Compaction with TTL #2480

Closed
wants to merge 11 commits into from
Closed

Conversation

sagar0
Copy link
Contributor

@sagar0 sagar0 commented Jun 22, 2017

Introducing FIFO compactions with TTL.

FIFO compaction is based on size only which makes it tricky to enable in production as use cases can have organic growth. A user requested an option to drop files based on the time of their creation instead of the total size.

To address that request:

  • Added a new TTL option to FIFO compaction options.
  • Updated FIFO compaction score to take TTL into consideration.
  • Added a new table property, creation_time, to keep track of when the SST file is created.
  • Creation_time is set as below:
    • On Flush: Set to the time of flush.
    • On Compaction: Set to the max creation_time of all the files involved in the compaction.
    • On Repair and Recovery: Set to the time of repair/recovery.
    • Old files created prior to this code change will have a creation_time of 0.
  • FIFO compaction with TTL is enabled when ttl > 0. All files older than ttl will be deleted during compaction. i.e. if (file.creation_time < (current_time - ttl)) then delete(file). This will enable cases where you might want to delete all files older than, say, 1 day.
  • FIFO compaction will fall back to the prior way of deleting files based on size if:
    • the creation_time of all files involved in compaction is 0.
    • the total size (of all SST files combined) does not drop below compaction_options_fifo.max_table_files_size even if the files older than ttl are deleted.

This feature is not supported if max_open_files != -1 or with table formats other than Block-based.

Test Plan:
Added tests.

Benchmark results:
Base: FIFO with max size: 100MB ::

svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100

readwhilewriting :       1.924 micros/op 519858 ops/sec;   13.6 MB/s (1176277 of 5000000 found)

With TTL (a low one for testing) ::

svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100 --fifo_compaction_ttl=20

readwhilewriting :       1.902 micros/op 525817 ops/sec;   13.7 MB/s (1185057 of 5000000 found)

Example Log lines:

2017/06/26-15:17:24.609249 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609177) [db/compaction_picker.cc:1471] [default] FIFO compaction: picking file 40 with creation time 1498515423 for deletion
2017/06/26-15:17:24.609255 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609234) [db/db_impl_compaction_flush.cc:1541] [default] Deleted 1 files
...
2017/06/26-15:17:25.553185 7fd5a61a5800 [DEBUG] [db/db_impl_files.cc:309] [JOB 0] Delete /dev/shm/dbbench/000040.sst type=2 #40 -- OK
2017/06/26-15:17:25.553205 7fd5a61a5800 EVENT_LOG_v1 {"time_micros": 1498515445553199, "job": 0, "event": "table_file_deletion", "file_number": 40}

SST Files remaining in the dbbench dir, after db_bench execution completed:

svemuri@dev15905 ~/rocksdb (fifo-compaction)  $ ls -l /dev/shm//dbbench/*.sst
-rw-r--r--. 1 svemuri users 30749887 Jun 26 15:17 /dev/shm//dbbench/000042.sst
-rw-r--r--. 1 svemuri users 30768779 Jun 26 15:17 /dev/shm//dbbench/000044.sst
-rw-r--r--. 1 svemuri users 30757481 Jun 26 15:17 /dev/shm//dbbench/000046.sst

@facebook-github-bot
Copy link
Contributor

@sagar0 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

- Added a new TableProperty, creation_time, to keep track of when the
  SST file is created.
- Creation_time:
  - On Flush: Set to the time of flush.
  - On Compaction: Set to the max creation_time of all the files
    involved in the compaction.
- Added a new TTL option to FIFO compaction options.
Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how it works, please clarify or provide unit tests.

include/rocksdb/advanced_options.h Outdated Show resolved Hide resolved
db/builder.cc Show resolved Hide resolved
Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few more comments. Glad we'll have this feature soon!

table/table_builder.h Show resolved Hide resolved
db/flush_job.cc Outdated Show resolved Hide resolved
@@ -62,6 +62,13 @@ struct CompactionOptionsFIFO {
// Default: 1GB
uint64_t max_table_files_size;

// Drop files older than TTL. TTL based deletion will take precedence over
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the semantics to me were a bit confusing -- consider:

  • if we exceed size limit and TTL of last file expired: then we drop the last file and don't necessarily go under size limit
  • if we exceed size limit and TTL not expired: we drop files from the end until we're under size limit

my preference would be in either case we pick a compaction such that, once it's finished, we satisfy all user-specified limits.

I'd also consider letting max_table_files_size == 0 disable the size-triggered file dropping now, similarly to what you did for ttl.

Copy link
Contributor Author

@sagar0 sagar0 Jun 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, and I too noticed it. I changed it now to fall back to size-based deletion when ttl-based deletion does not bring the total-size to be under the max_table_files_size threshold.

I am deferring the work of allowing max_table_files_size to be set to 0, for later. I will look into it in future, as that involves a lot more changes.

db/compaction.cc Outdated Show resolved Hide resolved
- Added unit tests.
- Split PickCompaction into PickTTLCompaction and PickSizeCompaction.
- Fixed a bug in BuildTable where time wasn't being passed to
  NewTableBuilder.
Also handled the return status of GetCurrentTime, so that a garbage
value is not used when return statu is not ok. time is set to 0 instead.
@facebook-github-bot
Copy link
Contributor

@sagar0 updated the pull request - view changes - changes since last import

@sagar0 sagar0 changed the title WIP: FIFO Compaction with TTL FIFO Compaction with TTL Jun 25, 2017
@facebook-github-bot
Copy link
Contributor

@sagar0 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@sagar0 updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@sagar0 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Also moved GetTotalFilesSize into anonymouse namespace from
FIFOCompactionPicker class.
@facebook-github-bot
Copy link
Contributor

@sagar0 updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@sagar0 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Since there is a potential chance that we could discard the files picked
in the main loop in PickTTLCompaction, move the log message to the right
place, to be right before creating the Compaction object.
@facebook-github-bot
Copy link
Contributor

@sagar0 updated the pull request - view changes - changes since last import

@sagar0
Copy link
Contributor Author

sagar0 commented Jun 26, 2017

Benchmark results:
Base: FIFO with max size: 100MB ::

svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100

readwhilewriting :       1.924 micros/op 519858 ops/sec;   13.6 MB/s (1176277 of 5000000 found)

With TTL (a low one for testing) ::

svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100 --fifo_compaction_ttl=20

readwhilewriting :       1.902 micros/op 525817 ops/sec;   13.7 MB/s (1185057 of 5000000 found)

Example Log lines:

2017/06/26-15:17:24.609249 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609177) [db/compaction_picker.cc:1471] [default] FIFO compaction: picking file 40 with creation time 1498515423 for deletion
2017/06/26-15:17:24.609255 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609234) [db/db_impl_compaction_flush.cc:1541] [default] Deleted 1 files
...
2017/06/26-15:17:25.553185 7fd5a61a5800 [DEBUG] [db/db_impl_files.cc:309] [JOB 0] Delete /dev/shm/dbbench/000040.sst type=2 #40 -- OK
2017/06/26-15:17:25.553205 7fd5a61a5800 EVENT_LOG_v1 {"time_micros": 1498515445553199, "job": 0, "event": "table_file_deletion", "file_number": 40}

SST Files remaining in the dbbench dir, after db_bench execution completed:

svemuri@dev15905 ~/rocksdb (fifo-compaction)  $ ls -l /dev/shm//dbbench/*.sst
-rw-r--r--. 1 svemuri users 30749887 Jun 26 15:17 /dev/shm//dbbench/000042.sst
-rw-r--r--. 1 svemuri users 30768779 Jun 26 15:17 /dev/shm//dbbench/000044.sst
-rw-r--r--. 1 svemuri users 30757481 Jun 26 15:17 /dev/shm//dbbench/000046.sst

Support FIFO-compaction-with-TTL only when max_open_files=-1, as the
creation_time embedded in every file's table properties need to be
consulted to figure out the files that need to be deleted. table_reader
embedded in a FileDescriptor could potentially get deleted if
max_open_files is not set to -1.

We could initialize the table_reader again to get the table properties
again, but it would involve a performance penalty due to reading new
blocks from disk. We could potentially support it in the future, but may
be not in the first version.
@facebook-github-bot
Copy link
Contributor

@sagar0 updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@sagar0 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ajkr
Copy link
Contributor

ajkr commented Jun 27, 2017

Sorry I wasn't able to get to reviewing today, I'll try again in the morning.

1. It is only supported with max_open_files = -1.
2. It is only supported with Block based table format.
@facebook-github-bot
Copy link
Contributor

@sagar0 updated the pull request - view changes - changes since last import

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! let's ship it today :)

db/compaction.cc Outdated Show resolved Hide resolved
db/compaction_job.cc Show resolved Hide resolved
db/compaction_picker.cc Show resolved Hide resolved
db/compaction_picker.cc Outdated Show resolved Hide resolved
db/db_test.cc Show resolved Hide resolved
Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry last comment - is there a way to implement the test without sleeping?

@sagar0
Copy link
Contributor Author

sagar0 commented Jun 27, 2017

Re. you comment about sleep ... I found that there is a MockEnv::FakeSleepForMicroseconds function, and I did try to use it instead of resorting to an actual sleep. But unfortunately I couldn't get it to work in this case. (I was trying to use a big enough ttl like 1 day, and use that fake sleep function to mimic a real world scenario.) I definitely plan to investigate some more in the future and make such a test work.

Addressing review comments:
1. Pick only continguous files for deletion in FIFO-with-TTL.
2. Add a check to make sure that GetTableProperties() does not return a
null pointer, before accessing fields fruther.
@facebook-github-bot
Copy link
Contributor

@sagar0 updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@sagar0 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

sagar0 added a commit that referenced this pull request Jun 28, 2017
Summary:
Introducing FIFO compactions with TTL.

FIFO compaction is based on size only which makes it tricky to enable in production as use cases can have organic growth. A user requested an option to drop files based on the time of their creation instead of the total size.

To address that request:
- Added a new TTL option to FIFO compaction options.
- Updated FIFO compaction score to take TTL into consideration.
- Added a new table property, creation_time, to keep track of when the SST file is created.
- Creation_time is set as below:
  - On Flush: Set to the time of flush.
  - On Compaction: Set to the max creation_time of all the files involved in the compaction.
  - On Repair and Recovery: Set to the time of repair/recovery.
  - Old files created prior to this code change will have a creation_time of 0.
- FIFO compaction with TTL is enabled when ttl > 0. All files older than ttl will be deleted during compaction. i.e. `if (file.creation_time < (current_time - ttl)) then delete(file)`. This will enable cases where you might want to delete all files older than, say, 1 day.
- FIFO compaction will fall back to the prior way of deleting files based on size if:
  - the creation_time of all files involved in compaction is 0.
  - the total size (of all SST files combined) does not drop below `compaction_options_fifo.max_table_files_size` even if the files older than ttl are deleted.

This feature is not supported if max_open_files != -1 or with table formats other than Block-based.

**Test Plan:**
Added tests.

**Benchmark results:**
Base: FIFO with max size: 100MB ::
```
svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100

readwhilewriting :       1.924 micros/op 519858 ops/sec;   13.6 MB/s (1176277 of 5000000 found)
```

With TTL (a low one for testing) ::
```
svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100 --fifo_compaction_ttl=20

readwhilewriting :       1.902 micros/op 525817 ops/sec;   13.7 MB/s (1185057 of 5000000 found)
```
Example Log lines:
```
2017/06/26-15:17:24.609249 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609177) [db/compaction_picker.cc:1471] [default] FIFO compaction: picking file 40 with creation time 1498515423 for deletion
2017/06/26-15:17:24.609255 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609234) [db/db_impl_compaction_flush.cc:1541] [default] Deleted 1 files
...
2017/06/26-15:17:25.553185 7fd5a61a5800 [DEBUG] [db/db_impl_files.cc:309] [JOB 0] Delete /dev/shm/dbbench/000040.sst type=2 #40 -- OK
2017/06/26-15:17:25.553205 7fd5a61a5800 EVENT_LOG_v1 {"time_micros": 1498515445553199, "job": 0, "event": "table_file_deletion", "file_number": 40}
```

SST Files remaining in the dbbench dir, after db_bench execution completed:
```
svemuri@dev15905 ~/rocksdb (fifo-compaction)  $ ls -l /dev/shm//dbbench/*.sst
-rw-r--r--. 1 svemuri users 30749887 Jun 26 15:17 /dev/shm//dbbench/000042.sst
-rw-r--r--. 1 svemuri users 30768779 Jun 26 15:17 /dev/shm//dbbench/000044.sst
-rw-r--r--. 1 svemuri users 30757481 Jun 26 15:17 /dev/shm//dbbench/000046.sst
```
Closes #2480

Differential Revision: D5305116

Pulled By: sagar0

fbshipit-source-id: 3e5cfcf5dd07ed2211b5b37492eb235b45139174
facebook-github-bot pushed a commit that referenced this pull request Jul 5, 2017
Summary:
Valgrind had false positive complaints about the initialization pattern for `GetCurrentTime()`'s argument in #2480. We can instead have the client initialize the time variable before calling `GetCurrentTime()`, and have `GetCurrentTime()` promise to only overwrite it in success case.
Closes #2526

Differential Revision: D5358689

Pulled By: ajkr

fbshipit-source-id: 857b189f24c19196f6bb299216f3e23e7bc4be42
facebook-github-bot pushed a commit that referenced this pull request Dec 7, 2017
Summary:
`GetCurrentTime()` is used to populate `creation_time` table property during flushes and compactions. It is safe to ignore `GetCurrentTime()` failures here but they should be logged.

(Note that `creation_time` property was introduced as part of TTL-based FIFO compaction in #2480.)

Tes Plan:
`make check`
Closes #3231

Differential Revision: D6501935

Pulled By: sagar0

fbshipit-source-id: 376adcf4ab801d3a43ec4453894b9a10909c8eb6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants