Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add GetStatsHistory to retrieve stats snapshots #4748

Closed
wants to merge 17 commits into from

Conversation

miasantreble
Copy link
Contributor

@miasantreble miasantreble commented Dec 5, 2018

This PR adds public GetStatsHistory API to retrieve stats history in the form of an std map. The key of the map is the timestamp in microseconds when the stats snapshot is taken, the value is another std map from stats name to stats value (stored in std string). Two DBOptions are introduced: stats_persist_period_sec (default 10 minutes) controls the intervals between two snapshots are taken; max_stats_history_count (default 10) controls the max number of history snapshots to keep in memory. RocksDB will stop collecting stats snapshots if stats_persist_period_sec is set to 0.

(This PR is the in-memory part of #4535)

@siying
Copy link
Contributor

siying commented Dec 5, 2018

Where is the public API? I don't see a change to db.h

db/db_impl.cc Outdated
DBImpl::FindStatsBetween(uint64_t start_time, uint64_t end_time) {
std::map<uint64_t, std::map<std::string, std::string> > stats_history;
// first dump whatever is in-memory that satisfies the time range
for (const auto& stats : stats_history_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is stats_history_ thread safe? Is it safe to access it while another thread might be updating it?

db/db_impl.cc Outdated
if (cfd->initialized()) {
cfd->internal_stats()->GetMapProperty(
*cf_property_info, DB::Properties::kCFStatsNoFileHistogram,
&stats_map);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Will stats from different CFs overwrite the same entry in the map?

db/db_impl.h Outdated
@@ -1565,6 +1579,9 @@ class DBImpl : public DB {
return Env::WLTH_SHORT;
}

std::map<uint64_t, std::map<std::string, std::string>>
FindStatsBetween(uint64_t start_time, uint64_t end_time);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this API needs more people to review. I am concerned to it. It assumes that all results need to be copied to memory. This makes memory budget hard to control. Yes we can add one more argument about maximum records to report, but this makes the function harder to understand. Another possible way is to make it an iterator interface, so that we only need to page in the records people are currently reading.
It's worth asking MyRocks people about how they plan to turn this interface into a virtual SQL table and allow arbitrary SQL queries, and we should enable them to do it through our API.

Copy link
Contributor Author

@miasantreble miasantreble Dec 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we have discussed two general approaches to storing and retrieving stats history.

  • in-memory: implemented in this PR. Basic idea is to periodically store stats into a map and query through a new API.
    • pros: no backward compatibility issue
    • cons: increased memory consumption/limited length of stats history (unable to query history before the last DB restart)
  • persist to disk: implemented in Persistent stats: periodically persist RocksDB stats  #4535. Basic idea is to implicitly create a new column family to store stats history, and have the new
    GetStatsHistory API create iterator to query history
    • pros: limited memory cost, supports queries over lifetime of the DB
    • cons: additional downgrade operational cost (to manage the new column family), complication with managing additional column family

The original plan was to first roll out in-memory solution to get user engagement with the new API, and then roll out persist to disk when user are comfortable with the potential operational cost of downgrading or find a better solution for smooth downgrade.
The same set of API will be used throughout the process. In the first stage the API will only query in-memory data structure and in the second stage it will create iterators to query on disk data.
@yoshinorim do you mind taking a look and let us know your preference? especially with MyRocks integration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the roll out plan makes sense. From high level point of view, memory consumption is probably the biggest concern.

  • Capping memory usage if Stats History consumes too much (capping the configurable number of entries or memory size. The latter is easier to tune for users)
  • Reporting rough memory size used by this feature

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miasantreble I am lost in your comment. I've never suggested any "new API". Both in-memory and persistent-to-disk solution should use the same API. The benefit of having the in-memory version at all is that people can adapt to it, and then later naturally migrate to the persistence version without any API change.

Copy link
Contributor Author

@miasantreble miasantreble Dec 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying thanks for pointing out the ambiguity, I've updated the comment above. Also how do you like the idea of adding an option to control the amount of memory stats history consumes? I can replace max_stats_history_count with something like stats_history_memory_budget and default to 128K(~15 stat history entries) maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If holding the stats in-memory is a temporary feature I'd be careful about adding an option especially for it.

In the persistent version of the stats, will there be any API to control how much storage is used for stats? Or how much time we retain stats? If so, perhaps we can use that same API to control memory usage when we have the in-memory implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr maybe we can add another API StartCollectStatsHistory(GetStatsOptions&) to start the background stats collection threads where we can specify the memory limit in GetStatsOptions, and to retrieve most recent stats history, still use GetStatsHistory. Basically we can put all new options in GetStatsOptions to avoid polluting the general options space. How does that sound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying I gave some thought about the iterator interface. My main concern is that the stats history data structure (current a std::map) will be constantly updated (inserted and erased), and iterator interface will need to either 1) delay purging expired data to ensure iterator will have a consistent view of the data, or 2) create possible race condition with iterator reads and data purging. Yoshi pointed to me a similar case in MyRocks where we need to create virtual table using RocksDB data structure: https://github.com/facebook/mysql-5.6/blob/60ab18528d63709cc82364d5b9b81ecbfe8d86e8/storage/rocksdb/rdb_i_s.cc#L1148
MyRocks calls GetPropertiesOfAllTables which takes a pointer to a std::unordered_map<std::string, std::shared_ptr<const TableProperties>>. We can do similar things here to save some parameter copying cost

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying I gave some thought about the iterator interface. My main concern is that the stats history data structure (current a std::map) will be constantly updated (inserted and erased), and iterator interface will need to either 1) delay purging expired data to ensure iterator will have a consistent view of the data, or 2) create possible race condition with iterator reads and data purging. Yoshi pointed to me a similar case in MyRocks where we need to create virtual table using RocksDB data structure: https://github.com/facebook/mysql-5.6/blob/60ab18528d63709cc82364d5b9b81ecbfe8d86e8/storage/rocksdb/rdb_i_s.cc#L1148

Is the in-memory history stats data a permanent feature? If not, I think we could perhaps prioritize API design over consideration of specific (in-memory) data structures, e.g. std::map. Once we have an interface, we can find a in-memory data structure that meets the requirement of the API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miasantreble Let's put the in-memory case aside for just a moment. If we only consider the persistent case, what kind of function will you use?

@miasantreble
Copy link
Contributor Author

sample CF and DB stats returned by GetStatsHistory with commit 088b8d9:

time = 1544221786796536, key = pikachu.compaction.L0.AvgSec, value = 0.000971
time = 1544221786796536, key = pikachu.compaction.L0.CompCount, value = 1.000000
time = 1544221786796536, key = pikachu.compaction.L0.CompSec, value = 0.000971
time = 1544221786796536, key = pikachu.compaction.L0.CompactedFiles, value = 0.000000
time = 1544221786796536, key = default.io_stalls.stop_for_pending_compaction_bytes, value = 0
time = 1544221786796536, key = default.io_stalls.total_slowdown, value = 0
time = 1544221786796536, key = default.io_stalls.total_stop, value = 0
time = 1544221786796536, key = interval_num_keys_written, value = 0
time = 1544221786796536, key = interval_seconds_up, value = 0.000211
time = 1544221786796536, key = interval_stall_micros_readable, value = 00:00:0.000 H:M:S

db/db_impl.cc Outdated
}
}

void DBImpl::PersistStats() {
TEST_SYNC_POINT("DBImpl::PersistStats:1");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use a informative name for test sync point?

db/db_impl.cc Outdated
}
{
InstrumentedMutexLock l(&mutex_);
default_cf_internal_stats_->GetMapProperty(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We collect different stats for default cf?

db/db_impl.h Outdated
@@ -713,6 +714,10 @@ class DBImpl : public DB {
static Status CreateAndNewDirectory(Env* env, const std::string& dirname,
std::unique_ptr<Directory>* directory);

// return a map of DBStats and CFstats, specify time window etc in stats_opts
virtual Status GetStatsHistory(GetStatsOptions& stats_opts,
std::map<uint64_t, std::map<std::string, std::string>>& stats_history) override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointer type for out arg?

// allow caller to pass a mutex and temporarily unlock the mutex during
// the process of cancel. Useful when function_ also need to lock the same
// mutex
void cancel(InstrumentedMutex* mutex = nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cancel function looks a little strange with an extra mutex parameter. Is it necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @riversand963 . Passing in a mutex, which just to be unlocked is a little bit strange.

@miasantreble miasantreble force-pushed the get-stats-history branch 2 times, most recently from 855328f to 634c071 Compare January 24, 2019 21:43
db/db_impl.cc Outdated
}
}

Status DBImpl::GetStatsHistory(GetStatsOptions& stats_opts,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that start and end time should be argument, not an option. It's just like in Get(), key is not an option.

AggregationOps aggr_ops;
};

class StatsHistoryIterator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to stay in the public headers.


// Position at the first key in the source. The iterator is Valid()
// after this call iff the source is not empty.
void SeekToFirst();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

// the returned slice is valid only until the next modification of
// the iterator.
// REQUIRES: Valid()
uint64_t key(); // stats history creation timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key(), value() is confusing to me. Why not return StatsRecord, StatSlice or something like that, which is a struct and contain timestamp and stats?

Status status();

// Advance the iterator to any valid timestamp >= time_start
void AdvanceIteratorByTime(uint64_t start_time, uint64_t end_time);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need it at all?

// No copying allowed
StatsHistoryIterator(const StatsHistoryIterator&);
void operator=(const StatsHistoryIterator&);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to have the same class to serve both of persistent stats or stats in memory? If you want to do separate ones, maybe it's should be an interface and the in memory one is one implementation.

// list of counters to query, call GetSupportedStatsCounters to get full list
std::vector<Tickers> tickers;
// aggregation operation like avg, sum, min, max, count, percentiles, etc
AggregationOps aggr_ops;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is it for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for user to specify what kind of aggregation operation to apply on the data, for example, average, sum, min/max, p90, etc. it's not implemented or used anywhere in the initial version.
One thing that's not clear is whether to have two separate accessor for raw StatsRecord and aggregated stats. For example, GetStatsHistory is called for 1 hour interval, certain counters, aggregation operation of average and p90. What is the best way to return the data? I think we can have a StatsRecord() to return the raw data together with timestamp and a AggregatedStats() to return a single aggregated result? Do we need to support aggregation operations that returns more than a single floating point value? How about aggregation operations that takes multiple counters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not implemented or used. I suggest we remove it for now. We can always add it back later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying got a question for tickers: what form of ticker should users provide?
Currently each ticker have two forms, one is of type enum and another is std::string. For example, Tickers::BLOCK_CACHE_MISS or "rocksdb.block.cache.miss", and the mapping is maintained in TickersNameMap. Is it more user friendly to ask users to provide the latter and add lookup functions to return the enum type to do the filtering before returning the results?

db/db_impl.cc Outdated
*cf_property_info, DB::Properties::kCFFileHistogram, &cf_stats_map);
// add column family name to the key string
for (const auto& cf_stats : cf_stats_map) {
std::string key_with_cf_name = cfd->GetName() + "." + cf_stats.first;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"." is not a good delimiter for column family. There are a lot of "." in the name of stats.

db/db_impl.cc Outdated
// add column family name to the key string
for (const auto& cf_stats : cf_stats_map) {
std::string key_with_cf_name = cfd->GetName() + "." + cf_stats.first;
stats_map[key_with_cf_name] = cf_stats.second;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right that the stats are accumulative counters? I believe we want the counters for the current window.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stats_map is a single level std::map that stores the stats kv pairs. In this loop we augment the stats name to include the column family name. Once stats_map is populated, it will be added to stats_history_ (two level std::map) as the value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InternalStats->GetMapProperty returns all cumulative counters in the format of map<string, string>. Of all the counters, only one is not in floating point form: write_stall_micros_readable and there is a related write_stall_micros so skipping it will not cause loss of information. Shall convert the map to map<string, dobule> to make it easier to calculate the delta?

db/db_impl.cc Outdated
@@ -646,8 +646,21 @@ void DBImpl::StartTimedTasks() {
}
}

size_t DBImpl::GetStatsHistorySize() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call it "Estiamte", not "Get".

db/db_impl.cc Outdated
size_total += sizeof(stats.first) + sizeof(stats.second);
if (stats.second.size() != 0) {
auto s = stats.second.begin();
size_total += (sizeof(s->first) + sizeof(s->second)) * stats.second.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we go though every time slice, and for each, estimating the memory by using the first one in the map, times map size. I think a much more accurate estimation will be the other around: do a more accurate estimate of the memory consumption for one time slice, and times number of time slices.

db/db_impl.cc Outdated
it = stats_history_.erase(it);
}
}
bool GC_needed = GetStatsHistorySize() >
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GC => "gc"

db/db_impl.cc Outdated
it = stats_history_.erase(it);
}
}
bool GC_needed = GetStatsHistorySize() >
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling "GC" may be a little bit confusing. Maybe just call it "purge".

@@ -48,8 +44,8 @@ void InMemoryStatsHistoryIterator::AdvanceIteratorByTime(uint64_t start_time, ui
found = db_impl_->FindStatsByTime(start_time, end_time, &new_time, &stats_map);
}
if (found) {
value_.swap(stats_map);
key_ = new_time;
stats_map_.swap(stats_map);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we directly pass &stats_map_ to FindStatsByTime()?

@@ -3834,10 +3833,7 @@ Status VersionSet::ReduceNumberOfLevels(const std::string& dbname,
ColumnFamilyDescriptor dummy_descriptor(kDefaultColumnFamilyName,
ColumnFamilyOptions(*options));
dummy.push_back(dummy_descriptor);
{
InstrumentedMutexLock l(&dummy_mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the change for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to revert some old change that is no longer needed

// if not zero, periodically take stats snapshots and store in memory, the
// memory size for stats snapshots is capped at stats_history_buffer_size
// Default: 1MB
unsigned int stats_history_buffer_size = 1024*1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1024 * 1024

Should be size_t.

std::vector<Tickers> tickers;
// aggregation operation like avg, sum, min, max, count, percentiles, etc
AggregationOps aggr_ops;
std::vector<std::string> tickers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used. Can we remove it?


// Return the value for the current entry. The underlying storage for
// the returned slice is valid only until the next modification of
// Return the current stats history as an std::map. The underlying storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention it is map from what to what.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miasantreble has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@miasantreble has updated the pull request. Re-import the pull request

@miasantreble
Copy link
Contributor Author

miasantreble commented Feb 7, 2019

GetStatsHistory output with the latest commit e428280
https://gist.github.com/miasantreble/465de00c2432f5c675d73978eefabea5
The test does two batches of operations (to populate statistics) and sleep for 4 seconds after each batch. stats_persist_period_sec (sample interval in seconds) is set to 1 second
At end of test, GetStatsHistory returns 8 stats slices and only two of them contains non-zero data, which is consistent with the expectation
The test can be found here

@facebook-github-bot
Copy link
Contributor

@miasantreble has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miasantreble has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@miasantreble has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miasantreble has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@miasantreble has updated the pull request. Re-import the pull request

@miasantreble
Copy link
Contributor Author

@riversand963 thanks for your review, I think all you comments have been addressed, do you mind taking another look to see if there is any other issues?

// Wait for stats persist to finish
for (; mock_time < 20; ++mock_time) {
dbfull()->TEST_WaitForPersistStatsRun(
[&] { mock_env->set_current_time(mock_time); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here TEST_WaitForPersistStatsRun() is called again. Did you say that it may hang?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only occasion where hang is observed is when first turning off persistent stats thread through SetOptions and then reenable it, other than that it works fine.

db_->CompactRange(CompactRangeOptions(), nullptr, nullptr);
for (; mock_time < 10; ++mock_time) {
dbfull()->TEST_WaitForPersistStatsRun(
[&] { mock_env->set_current_time(mock_time); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you say that you can't run TEST_WaitForPersistStatsRun() twice? Here it is 10 times.

size_t stats_history_size = dbfull()->TEST_EstiamteStatsHistorySize();
ASSERT_GE(slice_count, 9);
ASSERT_GE(stats_history_size, 12000);
// capping memory cost at 2000 bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it says 2000, but it set to be 12000.

}
delete stats_iter;
size_t stats_history_size_reopen = dbfull()->TEST_EstiamteStatsHistorySize();
ASSERT_LE(slice_count, 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to explain why it is <=2 ? Is it that we know size of a slice will be at at least 4000?

@facebook-github-bot
Copy link
Contributor

@miasantreble has updated the pull request. Re-import the pull request

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @miasantreble. I've taken another pass and left some comments.

HISTORY.md Outdated
@@ -7,10 +7,11 @@
* For users of dictionary compression with ZSTD v0.7.0+, we now reuse the same digested dictionary when compressing each of an SST file's data blocks for faster compression speeds.
* For all users of dictionary compression who set `cache_index_and_filter_blocks == true`, we now store dictionary data used for decompression in the block cache for better control over memory usage. For users of ZSTD v1.1.4+ who compile with -DZSTD_STATIC_LINKING_ONLY, this includes a digested dictionary, which is used to increase decompression speed.
* Add support for block checksums verification for external SST files before ingestion.
* Introduced stats history which periodically saves Statistics snapshots and added `GetStatsHistory` API to retrieve these snapshots.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced -> Introduce

db/db_impl.cc Show resolved Hide resolved
db/db_impl.cc Show resolved Hide resolved
db/db_impl.cc Show resolved Hide resolved
db/db_impl.cc Outdated Show resolved Hide resolved
db/db_impl.h Outdated Show resolved Hide resolved
db/db_impl.h Outdated
uint64_t start_time, uint64_t end_time,
StatsHistoryIterator** stats_iterator) override;

// return list of currently supported stats counters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there supposed to be a function here, or should the comment be removed?

Copy link
Contributor Author

@miasantreble miasantreble Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove

monitoring/statistics.cc Show resolved Hide resolved
db/db_impl.cc Outdated Show resolved Hide resolved
db/db_impl.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@miasantreble has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miasantreble has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@miasantreble
Copy link
Contributor Author

all review comments have been addressed, @siying @riversand963 PTAL?

db/db_impl.cc Show resolved Hide resolved
db/db_impl.cc Outdated
return new_iterator->status();
std::unique_ptr<StatsHistoryIterator>* stats_iterator) {
if (!stats_iterator) {
return Status::InvalidArgument();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass a string to InvalidArgument to provide more info.

db/db_options_test.cc Outdated Show resolved Hide resolved
monitoring/statistics.cc Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@miasantreble has updated the pull request. Re-import the pull request

db/db_impl.cc Outdated
bool DBImpl::FindStatsByTime(uint64_t start_time, uint64_t end_time,
uint64_t* new_time,
std::map<std::string, uint64_t>* stats_map) {
assert(new_time && stats_map);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually use one assertion for each condition. but this is minor here, and will leave this up to you.

db/db_impl.cc Outdated
stats_history_[now_micros] = stats_delta;
}
stats_slice_initialized_ = true;
stats_slice_ = stats_map;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use std::vector's swap?

db/db_impl.h Outdated
@@ -724,6 +728,17 @@ class DBImpl : public DB {
static Status CreateAndNewDirectory(Env* env, const std::string& dirname,
std::unique_ptr<Directory>* directory);

// Given a time window, return an iterator for accessing stats history
virtual Status GetStatsHistory(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove virtual. Use only override. Follow the updated, enforced code convention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disregard this

// found in the LICENSE file. See the AUTHORS file for names of contributors.

#include "db/in_memory_stats_history.h"
#include "db/db_impl.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: sort the includes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interestingly this was suggested by make format but I will sort by lexicographic order

AdvanceIteratorByTime(start_time_, end_time_);
}
virtual ~InMemoryStatsHistoryIterator() override;
virtual bool Valid() const override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits: remove virtual and keep override to indicate that these functions do override virtual functions in their base class, i.e. StatsHistoryIterator with the same signature. https://stackoverflow.com/questions/39932391/virtual-override-or-both-c. This also helps avoid potential Phabricator warnings/errors.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you make sure you address the comments from @siying ?
Regarding the virtual vs. override thing, maybe you can first rebase your changes on top of the latest master, then see if this rule has been applied to neighboring functions. If so, then maybe a good idea to change in your PR as well. Otherwise it's OK to leave it there as is.

@siying
Copy link
Contributor

siying commented Feb 20, 2019

I remember some lint tool will fail if "virtual ... override" is used. I'm surprised that we didn't see any.

@facebook-github-bot
Copy link
Contributor

@miasantreble has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miasantreble has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@miasantreble miasantreble deleted the get-stats-history branch February 22, 2019 18:15
facebook-github-bot pushed a commit that referenced this pull request Jun 17, 2019
Summary:
This PR continues the work in #4748 and #4535 by adding a new DBOption `persist_stats_to_disk` which instructs RocksDB to persist stats history to RocksDB itself. When statistics is enabled, and  both options `stats_persist_period_sec` and `persist_stats_to_disk` are set, RocksDB will periodically write stats to a built-in column family in the following form: key -> (timestamp in microseconds)#(stats name), value -> stats value. The existing API `GetStatsHistory` will detect the current value of `persist_stats_to_disk` and either read from in-memory data structure or from the hidden column family on disk.
Pull Request resolved: #5046

Differential Revision: D15863138

Pulled By: miasantreble

fbshipit-source-id: bb82abdb3f2ca581aa42531734ac799f113e931b
vagogte pushed a commit to vagogte/rocksdb that referenced this pull request Jun 18, 2019
Summary:
This PR continues the work in facebook#4748 and facebook#4535 by adding a new DBOption `persist_stats_to_disk` which instructs RocksDB to persist stats history to RocksDB itself. When statistics is enabled, and  both options `stats_persist_period_sec` and `persist_stats_to_disk` are set, RocksDB will periodically write stats to a built-in column family in the following form: key -> (timestamp in microseconds)#(stats name), value -> stats value. The existing API `GetStatsHistory` will detect the current value of `persist_stats_to_disk` and either read from in-memory data structure or from the hidden column family on disk.
Pull Request resolved: facebook#5046

Differential Revision: D15863138

Pulled By: miasantreble

fbshipit-source-id: bb82abdb3f2ca581aa42531734ac799f113e931b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants