Add seqno to time mapping #10338

jay-zhuang · 2022-07-11T19:03:05Z

Which will be used for tiered storage to preclude hot data from
compacting to the cold tier (the last level).
Internally, adding seqno to time mapping. A periodic_task is scheduled
to record the current_seqno -> current_time in certain cadence. When
memtable flush, the mapping informaiton is stored in sstable property.
During compaction, the mapping information are merged and get the
approximate time of sequence number, which is used to determine if a key
is recently inserted or not and preclude it from the last level if it's
recently inserted (within the preclude_last_level_data_seconds).

Test Plan: CI

facebook-github-bot · 2022-07-13T04:24:56Z

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-07-13T04:47:47Z

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

db/column_family.cc

db/compaction/compaction_iterator.cc

table/table_properties.cc

siying · 2022-07-12T18:27:58Z

table/sst_file_writer.cc

-      r->next_file_number);
+      TableFileCreationReason::kMisc, 0 /* oldest_key_time */,
+      0 /* file_creation_time */, "SST Writer" /* db_id */, r->db_session_id,
+      0 /* target_file_size */, r->next_file_number);


I guess if a user only ingests data through bulkloading, it should still work since we are adding data points of oldest_ancester_time and smallest_seqno? It's better to add a code comment somewhere on this case.

Yeah, but ingested file has the "oldest_ancester_time" set to 0 (kUnknownOldestAncesterTime), so even the smallest_seqno -> oldest_ancester_time pair is useless.
I'm not sure how we should handle ingested file usecase, I'll add a TODO for that.

Hmm.... Maybe use file creation time in this case?

file_creation_time is also set to 0 for ingested sst. not sure if we should set it to current time, I'll leave it as it is.

This feels like a bug. Bulkloaded files also needed to be periodically compacted for options.periodic_compaction_seconds

include/rocksdb/advanced_options.h

db/seqno_to_time_mapping.h

table/block_based/block_based_table_builder.cc

db/seqno_to_time_mapping.cc

siying · 2022-07-13T16:09:40Z

db/seqno_to_time_mapping.cc

+  for (const auto& it : copy) {
+    // If sequence number is the same, pick the one with larger time, which is
+    // more accurate than the older time.
+    if (it.seqno == prev.seqno) {


Since duplication should be very common (since compaction input files from non-L0 tend to have the same source of seqno->time mapping), I wonder whether we should do Sort(), or always maintain deduplicated order in Append().

Append() does maintain the order and dedup. (which is used for flush).
Sort() is mostly needed in compaction, which add all information together and do sort once. (ideally we should do k-merge sort, but just to keep it simple and also make it easier to add smallest_seqno -> oldest_ancestor_time pair.

My question is about the compaction case. Compaction case will have lots of duplication. For example, after a L1->L2, many entries in L2 files are the same, as they come from the same L1 file. And then another L1->L2 comes and these entries need to be dedup. So duplication is common here and my understanding is that right now Sort() is critical to make sure we don't store duplicate data here. Maybe we can even consider to use a std::map?

L0 files typically should not have duplicated entries, as assume they do not have seqno overlap.
On the other hand, the data set is pretty small here. The maximum is 100, but typically for higher level files, it's pretty small. For example, if the preclude_last_level_data_seconds is set to 3 days, it samples a seqno->time pair every 1 hour, if memtable life-span is less than 1 hour, the map is actually empty for most of L0 files.

facebook-github-bot · 2022-07-14T01:17:53Z

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-14T01:50:53Z

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-14T04:54:35Z

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-14T04:55:06Z

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-14T04:59:28Z

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-07-14T16:29:07Z

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

siying · 2022-07-14T17:18:51Z

db/db_impl/db_impl.cc

+      min_first_mem_seqno =
+          std::min(cfd->GetFirstMemtableSequenceNumber(), min_first_mem_seqno);
+    }
+  }


Is min_first_mem_seqno only for binary search hint? If that is the case, it feels too complex and might not necessarily to be cheaper. Some applications might have thousands of CFs. Looping through all CFs is generally encouraged to be avoided if possible, especially within DB mutex (I know it is now outside but still).

That makes sense. I'm going to remove that part for now as it's just an optimization to reduce the memory usage. As now the memory usage is capped at 1.6K per DB instance (100 * (8+8)) or 16K for the worst case (multiple CFs with different settings). It's not worth doing I think.

Oh I got it. It is for retention. I think there is another counter for that. In that case, this number is better to be updated in flush/compaction path where we already loop through all CFs for something else, just like VersionSet::PreComputeMinLogNumberWithUnflushedData(). The information can be used by something else. We might even have the information precalculated, though I didn't find it.

Of course we don't have to do it here.

db/db_impl/db_impl.cc

table/block_based/block_based_table_builder.cc

facebook-github-bot · 2022-07-15T00:42:37Z

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-15T00:50:25Z

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-15T00:50:41Z

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Which will be used for tiered storage to preclude hot data from compacting to the cold tier (the last level). Internally, adding seqno to time mapping. A periodic_task is scheduled to record the current_seqno -> current_time in certain cadence. When memtable flush, the mapping informaiton is stored in sstable property. During compaction, the mapping information are merged and get the approximate time of sequence number, which is used to determine if a key is recently inserted or not and preclude it from the last level if it's recently inserted (within the `preclude_last_level_data_seconds`). Test Plan: CI

facebook-github-bot · 2022-07-15T01:01:34Z

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-15T01:02:08Z

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-07-15T01:51:10Z

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-07-15T02:46:51Z

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-15T02:47:18Z

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot added the CLA Signed label Jul 11, 2022

jay-zhuang requested a review from siying July 12, 2022 05:19

jay-zhuang marked this pull request as ready for review July 13, 2022 04:24

siying reviewed Jul 13, 2022

View reviewed changes

jay-zhuang changed the title ~~[Draft] Add seqno to time mapping~~ Add seqno to time mapping Jul 13, 2022

siying reviewed Jul 13, 2022

View reviewed changes

jay-zhuang force-pushed the non_last_time branch from 0b5cca3 to 0577d3f Compare July 14, 2022 04:54

siying approved these changes Jul 14, 2022

View reviewed changes

jay-zhuang added 5 commits July 14, 2022 18:01

make format

e7c4d49

Fix windows build

67d8fe8

Review feedbacks

5c45f4d

Add unsuported for plaintable

0006484

jay-zhuang force-pushed the non_last_time branch from 7609f5b to 0006484 Compare July 15, 2022 01:01

Make internal build happy

51c44e8

facebook-github-bot closed this in a3acf2e Jul 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add seqno to time mapping #10338

Add seqno to time mapping #10338

jay-zhuang commented Jul 11, 2022 •

edited

Loading

facebook-github-bot commented Jul 13, 2022

facebook-github-bot commented Jul 13, 2022

siying Jul 12, 2022

jay-zhuang Jul 13, 2022

siying Jul 13, 2022

jay-zhuang Jul 15, 2022

siying Jul 15, 2022

siying Jul 13, 2022

jay-zhuang Jul 13, 2022

siying Jul 13, 2022

jay-zhuang Jul 15, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

siying Jul 14, 2022

jay-zhuang Jul 15, 2022

siying Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

Add seqno to time mapping #10338

Add seqno to time mapping #10338

Conversation

jay-zhuang commented Jul 11, 2022 • edited Loading

facebook-github-bot commented Jul 13, 2022

facebook-github-bot commented Jul 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

jay-zhuang commented Jul 11, 2022 •

edited

Loading