Skip to content

Latest commit

 

History

History
390 lines (338 loc) · 21.8 KB

stats.org

File metadata and controls

390 lines (338 loc) · 21.8 KB

EP Stats

1 Getting Started

For introductory information on stats within membase, start with the membase wiki stats page.

2 Stats Definitions

2.1 Toplevel Stats

StatDescription
ep_versionVersion number of ep_engine.
ep_storage_ageSeconds since most recently
stored object was initially queued.
ep_storage_age_highwatep_storage_age high water mark
ep_min_data_ageMinimum data age setting.
ep_queue_age_capQueue age cap setting.
ep_max_txn_sizeMax number of updates per transaction.
ep_data_ageSeconds since most recently
stored object was modified.
ep_data_age_highwatep_data_age high water mark
ep_too_youngNumber of times an object was
not stored due to being too young.
ep_too_oldNumber of times an object was
stored after being dirty too long.
ep_total_enqueuedTotal number of items queued for
persistence.
ep_total_new_itemsTotal number of persisted new items.
ep_total_del_itemsTotal number of persisted deletions.
ep_total_persistedTotal number of items persisted.
ep_item_flush_failedNumber of times an item failed to flush
due to storage errors.
ep_item_commit_failedNumber of times a transaction failed to
commit due to storage errors.
ep_item_begin_failedNumber of times a transaction failed to
start due to storage errors.
ep_expiredNumber of times an item was expired.
ep_item_flush_expiredNumber of times an item is not flushed
due to the expiry of the item
ep_queue_sizeNumber of items queued for storage.
ep_flusher_todoNumber of items remaining to be written.
ep_flusher_stateCurrent state of the flusher thread.
ep_commit_numTotal number of write commits.
ep_commit_timeNumber of seconds of most recent commit.
ep_commit_time_totalCumulative seconds spent committing.
ep_vbucket_delNumber of vbucket deletion events.
ep_vbucket_del_failNumber of failed vbucket deletion events.
ep_vbucket_del_max_walltimeMax wall time (µs) spent by deleting
a vbucket
ep_vbucket_del_total_walltimeTotal wall time (µs) spent by deleting
vbuckets
ep_vbucket_del_avg_walltimeAvg wall time (µs) spent by deleting
a vbucket
ep_flush_preemptsNum of flush early exits for read reqs.
ep_flush_durationNumber of seconds of most recent flush.
ep_flush_duration_totalCumulative seconds spent flushing.
ep_flush_duration_highwatep_flush_duration high water mark.
curr_itemsNum items in active vbuckets.
curr_items_totNum current items including those not
active (replica, dead and pending states)
ep_kv_sizeMemory used to store keys and values.
ep_overheadExtra memory used by rep queues, etc..
ep_max_data_sizeMax amount of data allowed in memory.
ep_mem_low_watLow water mark for auto-evictions.
ep_mem_high_watHigh water mark for auto-evictions.
ep_total_cache_sizeThe total size of all items in the cache
ep_oom_errorsNumber of times unrecoverable OOMs
happened while processing operations
ep_tmp_oom_errorsNumber of times temporary OOMs
happened while processing operations
ep_bg_fetchedNumber of items fetched from disk.
ep_tap_bg_fetchedNumber of tap disk fetches
ep_tap_bg_fetch_requeuedNumber of times a tap bg fetch task is
requeued.
ep_num_pager_runsNumber of times we ran pager loops
to seek additional memory.
ep_num_expiry_pager_runsNumber of times we ran expiry pager loops
to purge expired items from memory/disk
ep_num_value_ejectsNumber of times item values got ejected
from memory to disk
ep_num_eject_replicasNumber of times replica item values got
ejected from memory to disk
ep_num_eject_failuresNumber of items that could not be ejected
ep_num_not_my_vbucketsNumber of times Not My VBucket exception
happened during runtime
ep_warmup_threadWarmup thread status.
ep_warmed_upNumber of items warmed up.
ep_warmup_dupsDuplicates encountered during warmup.
ep_warmup_oomOOMs encountered during warmup.
ep_warmup_timeTime (µs) spent by warming data.
ep_tap_keepaliveTap keepalive time.
ep_dbnameDB path.
ep_dbinitNumber of seconds to initialize DB.
ep_dbshardsNumber of shards for db store
ep_db_strategySQLite db strategy
ep_warmuptrue if warmup is enabled.
ep_io_num_readNumber of io read operations
ep_io_num_writeNumber of io write operations
ep_io_read_bytesNumber of bytes read (key + values)
ep_io_write_bytesNumber of bytes written (key + values)
ep_pending_opsNumber of ops awaiting pending vbuckets
ep_pending_ops_totalTotal blocked pending ops since reset
ep_pending_ops_maxMax ops seen awaiting 1 pending vbucket
ep_pending_ops_max_durationMax time (µs) used waiting on pending
vbuckets
ep_bg_num_samplesThe number of samples included in the avg
ep_bg_min_waitThe shortest time (µs) in the wait queue
ep_bg_max_waitThe longest time (µs) in the wait queue
ep_bg_wait_avgThe average wait time (µs) for an item
before it is serviced by the dispatcher
ep_bg_min_loadThe shortest load time (µs)
ep_bg_max_loadThe longest load time (µs)
ep_bg_load_avgThe average time (µs) for an item to be
loaded from the persistence layer
ep_num_non_residentThe number of non-resident items
ep_num_active_non_residentNumber of non-resident items in active
vbuckets.
ep_store_max_concurrencyMaximum allowed concurrency at the storage
layer.
ep_store_max_readersMaximum number of concurrent read-only.
storage threads.
ep_store_max_readwriteMaximum number of concurrent read/write
storage threads.
ep_db_cleaner_statusStatus of database cleaner that cleans up
invalid items with old vbucket versions

2.2 Tap stats

ep_tap_total_queueSum of tap queue sizes on the current
tap queues
ep_tap_total_fetchedSum of all tap messages sent
ep_tap_bg_max_pendingThe maximum number of bg jobs a tap
connection may have
ep_tap_bg_fetchedNumber of tap disk fetches
ep_tap_bg_fetch_requeuedNumber of times a tap bg fetch task is
requeued.
ep_tap_fg_fetchedNumber of tap memory fetches
ep_tap_deletesNumber of tap deletion messages sent
ep_tap_throttledNumber of tap messages refused due to
throttling.
ep_tap_keepaliveHow long to keep tap connection state
after client disconnect.
ep_tap_countNumber of tap connections.
ep_tap_bg_num_samplesThe number of tap bg fetch samples
included in the avg
ep_tap_bg_min_waitThe shortest time (µs) for a tap item
before it is serviced by the dispatcher
ep_tap_bg_max_waitThe longest time (µs) for a tap item
before it is serviced by the dispatcher
ep_tap_bg_wait_avgThe average wait time (µs) for a tap item
before it is serviced by the dispatcher
ep_tap_bg_min_loadThe shortest time (µs) for a tap item to
be loaded from the persistence layer
ep_tap_bg_max_loadThe longest time (µs) for a tap item to
be loaded from the persistence layer
ep_tap_bg_load_avgThe average time (µs) for a tap item to
be loaded from the persistence layer
ep_tap_noop_intervalThe number of secs between a noop is added
to an idle connection
ep_tap_backoff_periodThe number of seconds the tap connection
should back off after receiving ETMPFAIL

2.2.1 Per Tap Client Stats

Each stat begins with ep_tapq: followed by a unique client_id and another colon. For example, if your client is named, slave1, the qlen stat would be ep_tapq:slave1:qlen.

qlenQueue size for the given client_id.
qlen_high_priHigh priority tap queue items.
qlen_low_priLow priority tap queue items.
vb_filtersSize of connection vbucket filter set.
vb_filterThe content of the vbucket filter
rec_fetchedTap messages sent to the client.
rec_skippedNumber of messages skipped due to
tap reconnect with a different filter
idleTrue if this connection is idle.
emptyTrue if this connection has no items.
completeTrue if backfill is complete.
has_itemTrue when there is a bg fetched item
ready.
has_queued_itemTrue when there is a key ready to be
looked up (may become fg or bg item)
bg_wait_for_resultTrue if the max number of background
operations is started
bg_queue_sizeNumber of bg fetches enqueued for this
connection.
bg_queuedNumber of background fetches enqueued.
bg_result_sizeNumber of ready background results.
bg_resultsNumber of background results ready.
bg_jobs_issuedNumber of background jobs started.
bg_jobs_completedNumber of background jobs completed.
bg_backlog_sizeNumber of items pending bg fetch.
flagsConnection flags set by the client.
connectedtrue if this client is connected
pending_disconnecttrue if we’re hanging up on this client
pausedtrue if this client is blocked
pending_backfilltrue if we’re still backfilling keys
for this connection
pending_disk_backfilltrue if we’re still backfilling keys
from disk for this connection
reconnectsNumber of reconnects from this client.
disconnectsNumber of disconnects from this client.
backfill_ageThe age of the start of the backfill.
ack_seqnoThe current tap ACK sequence number.
recv_ack_seqnoLast receive tap ACK sequence number.
ack_log_sizeTap ACK backlog size.
ack_window_fulltrue if our tap ACK window is full.
expiresWhen this ACK backlog expires.
num_tap_nackThe number of negative tap acks received
num_tap_tmpfail_survivorsThe number of items rescheduled due to
a temporary nack.

2.3 Timing Stats

Timing stats provide histogram data from high resolution timers over various operations within the system.

2.3.1 General Form

As this data is multi-dimensional, some parsing may be required for machine processing. It’s somewhat human readable, but the stats script mentioned in the Getting Started section above will do fancier formatting for you.

Consider the following sample stats:

STAT disk_insert_8,16 9488
STAT disk_insert_16,32 290
STAT disk_insert_32,64 73
STAT disk_insert_64,128 86
STAT disk_insert_128,256 48
STAT disk_insert_256,512 2
STAT disk_insert_512,1024 12
STAT disk_insert_1024,2048 1

This tells you that disk_insert took 8-16µs 9,488 times, 16-32µs 290 times, and so on.

The same stats displayed through the stats CLI tool would look like this:

disk_insert (10008 total)
   8us - 16us    : ( 94.80%) 9488 ###########################################
   16us - 32us   : ( 97.70%)  290 #
   32us - 64us   : ( 98.43%)   73
   64us - 128us  : ( 99.29%)   86
   128us - 256us : ( 99.77%)   48
   256us - 512us : ( 99.79%)    2
   512us - 1ms   : ( 99.91%)   12
   1ms - 2ms     : ( 99.92%)    1

2.3.2 Available Stats

The following histograms are available from “timings” in the above form to describe when time was spent doing various things:

bg_waitbg fetches waiting in the dispatcher queue
bg_loadbg fetches waiting for disk
bg_tap_waittap bg fetches waiting in the dispatcher queue
bg_tap_laodtap bg fetches waiting for disk
pending_opsclient connections blocked for operations
in pending vbuckets.
storage_ageAnalogous to ep_storage_age in main stats.
data_ageAnalogous to ep_data_age in main stats.
get_cmdservicing get requests
store_cmdservicing store requests
arith_cmdservicing incr/decr requests
get_vb_cmdservicing vbucket status requests
set_vb_cmdservicing vbucket set state commands
del_vb_cmdservicing vbucket deletion commands
tap_vb_setservicing tap vbucket set state commands
tap_mutationservicing tap mutations
notify_iowaking blocked connections
disk_insertwaiting for disk to store a new item
disk_updatewaiting for disk to modify an existing item
disk_delwaiting for disk to delete an item
disk_vb_delwaiting for disk to delete a vbucket
disk_vb_chunk_delwaiting for disk to delete a vbucket chunk
disk_commitwaiting for a commit after a batch of updates
disk_invalid_item_delWaiting for disk to delete a chunk of invalid
items with the old vbucket version

2.4 Hash Stats

Hash stats provide information on your per-vbucket hash tables.

Requesting these stats does affect performance, so don’t do it too regularly, but it’s useful for debugging certain types of performance issues. For example, if your hash table is tuned to have too few buckets for the data load within it, the max_depth will be too large and performance will suffer.

Each stat is prefixed with vb_ followed by a number, a colon, then the individual stat name.

For example, the stat representing the size of the hash table for vbucket 0 is vb_0:size.

stateThe current state of this vbucket
sizeNumber of hash buckets
locksNumber of locks covering hash table operations
min_depthMinimum number of items found in a bucket
max_depthMaximum number of items found in a bucket
reportedNumber of items this hash table reports having
countedNumber of items found while walking the table
resizedNumber of times the hash table resized.
mem_sizeRunning sum of memory used by each item.
mem_size_countedCounted sum of current memory used by each item.

3 Details

3.1 Ages

The difference between ep_storage_age and ep_data_age is somewhat subtle, but when you consider that a given record may be updated multiple times before hitting persistence, it starts to be clearer.

ep_data_age is how old the data we actually wrote is.

ep_storage_age is how long the object has been waiting to be persisted.

3.2 Too Young

ep_too_young is incremented every time an object is encountered whose data age is more recent than is allowable for the persistence layer.

For example, if an object that was queued five minutes ago is picked off the todo queue and found to have been updated fifteen seconds ago, it will not be stored, ep_too_young will be incremented, and the key will go back on the input queue.

3.3 Too Old

ep_too_old is incremented every time an object is encountered whose queue age exceeds the ep_queue_age_cap setting.

ep_queue_age_cap generally exists as a safety net to prevent the ep_min_data_age setting from preventing persistence altogether.

3.4 Warming Up

Opening the data store is broken into three distinct phases:

3.4.1 Initializing

During the initialization phase, the server is not accepting connections or otherwise functional. This is often quick, but in a server crash can take some time to perform recovery of the underlying storage.

This time is made available via the ep_dbinit stat.

3.4.2 Warming Up

After initialization, warmup begins. At this point, the server is capable of taking new writes and responding to reads. However, only records that have been pulled out of the storage or have been updated from other clients will be available for request.

(note that records read from persistence will not overwrite new records captured from the network)

During this phase, ep_warmup_thread will report running and ep_warmed_up will be increasing as records are being read.

3.4.3 Complete

Once complete, ep_warmed_up will stop increasing and ep_warmup_thread will report complete.