Skip to content

Latest commit

 

History

History
554 lines (482 loc) · 32.4 KB

stats.org

File metadata and controls

554 lines (482 loc) · 32.4 KB

EP Stats

1 Getting Started

For introductory information on stats within membase, start with the membase wiki stats page.

2 Stats Definitions

2.1 Toplevel Stats

StatDescription
ep_versionVersion number of ep_engine.
ep_storage_ageSeconds since most recently
stored object was initially queued.
ep_storage_age_highwatep_storage_age high water mark
ep_min_data_ageMinimum data age setting.
ep_queue_age_capQueue age cap setting.
ep_max_txn_sizeMax number of updates per transaction.
ep_data_ageSeconds since most recently
stored object was modified.
ep_data_age_highwatep_data_age high water mark
ep_too_youngNumber of times an object was
not stored due to being too young.
ep_too_oldNumber of times an object was
stored after being dirty too long.
ep_total_enqueuedTotal number of items queued for
persistence.
ep_total_new_itemsTotal number of persisted new items.
ep_total_del_itemsTotal number of persisted deletions.
ep_total_persistedTotal number of items persisted.
ep_item_flush_failedNumber of times an item failed to flush
due to storage errors.
ep_item_commit_failedNumber of times a transaction failed to
commit due to storage errors.
ep_item_begin_failedNumber of times a transaction failed to
start due to storage errors.
ep_expiredNumber of times an item was expired.
ep_item_flush_expiredNumber of times an item is not flushed
due to the expiry of the item
ep_queue_sizeNumber of items queued for storage.
ep_flusher_todoNumber of items remaining to be written.
ep_flusher_stateCurrent state of the flusher thread.
ep_commit_numTotal number of write commits.
ep_commit_timeNumber of seconds of most recent commit.
ep_commit_time_totalCumulative seconds spent committing.
ep_vbucket_delNumber of vbucket deletion events.
ep_vbucket_del_failNumber of failed vbucket deletion events.
ep_vbucket_del_max_walltimeMax wall time (µs) spent by deleting
a vbucket
ep_vbucket_del_total_walltimeTotal wall time (µs) spent by deleting
vbuckets
ep_vbucket_del_avg_walltimeAvg wall time (µs) spent by deleting
a vbucket
ep_flush_preemptsNum of flush early exits for read reqs.
ep_flush_durationNumber of seconds of most recent flush.
ep_flush_duration_totalCumulative seconds spent flushing.
ep_flush_duration_highwatep_flush_duration high water mark.
ep_flush_allTrue if disk flush_all is scheduled
curr_itemsNum items in active vbuckets.
curr_items_totNum current items including those not
active (replica, dead and pending states)
ep_kv_sizeMemory used to store keys and values.
ep_overheadExtra memory used by rep queues, etc..
ep_max_data_sizeMax amount of data allowed in memory.
ep_mem_low_watLow water mark for auto-evictions.
ep_mem_high_watHigh water mark for auto-evictions.
ep_total_cache_sizeThe total size of all items in the cache
ep_oom_errorsNumber of times unrecoverable OOMs
happened while processing operations
ep_tmp_oom_errorsNumber of times temporary OOMs
happened while processing operations
ep_bg_fetchedNumber of items fetched from disk.
ep_tap_bg_fetchedNumber of tap disk fetches
ep_tap_bg_fetch_requeuedNumber of times a tap bg fetch task is
requeued.
ep_num_pager_runsNumber of times we ran pager loops
to seek additional memory.
ep_num_expiry_pager_runsNumber of times we ran expiry pager loops
to purge expired items from memory/disk
ep_num_checkpoint_remover_runsNumber of times we ran checkpoint remover
to remove closed unreferenced checkpoints.
ep_items_rm_from_checkpointsNumber of items removed from closed
unreferenced checkpoints.
ep_num_value_ejectsNumber of times item values got ejected
from memory to disk
ep_num_eject_replicasNumber of times replica item values got
ejected from memory to disk
ep_num_eject_failuresNumber of items that could not be ejected
ep_num_not_my_vbucketsNumber of times Not My VBucket exception
happened during runtime
ep_warmup_threadWarmup thread status.
ep_warmed_upNumber of items warmed up.
ep_warmup_dupsDuplicates encountered during warmup.
ep_warmup_oomOOMs encountered during warmup.
ep_warmup_timeTime (µs) spent by warming data.
ep_tap_keepaliveTap keepalive time.
ep_dbnameDB path.
ep_dbinitNumber of seconds to initialize DB.
ep_dbshardsNumber of shards for db store
ep_db_strategySQLite db strategy
ep_warmuptrue if warmup is enabled.
ep_io_num_readNumber of io read operations
ep_io_num_writeNumber of io write operations
ep_io_read_bytesNumber of bytes read (key + values)
ep_io_write_bytesNumber of bytes written (key + values)
ep_pending_opsNumber of ops awaiting pending vbuckets
ep_pending_ops_totalTotal blocked pending ops since reset
ep_pending_ops_maxMax ops seen awaiting 1 pending vbucket
ep_pending_ops_max_durationMax time (µs) used waiting on pending
vbuckets
ep_bg_num_samplesThe number of samples included in the avg
ep_bg_min_waitThe shortest time (µs) in the wait queue
ep_bg_max_waitThe longest time (µs) in the wait queue
ep_bg_wait_avgThe average wait time (µs) for an item
before it is serviced by the dispatcher
ep_bg_min_loadThe shortest load time (µs)
ep_bg_max_loadThe longest load time (µs)
ep_bg_load_avgThe average time (µs) for an item to be
loaded from the persistence layer
ep_num_non_residentThe number of non-resident items
ep_num_active_non_residentNumber of non-resident items in active
vbuckets.
ep_store_max_concurrencyMaximum allowed concurrency at the storage
layer.
ep_store_max_readersMaximum number of concurrent read-only.
storage threads.
ep_store_max_readwriteMaximum number of concurrent read/write
storage threads.
ep_db_cleaner_statusStatus of database cleaner that cleans up
invalid items with old vbucket versions
ep_bg_waitThe total elapse time for the wait queue
ep_bg_loadThe total elapse time for items to be
loaded from the persistence layer
ep_latency_get_cmdThe total elapse time for get command
ep_latency_store_cmdThe total elapse time for store command
ep_latency_arith_cmdThe total eplase time for arith command
ep_onlineupdateTrue if engine is in online updated mode
ep_onlineupdate_revert_addNumber of reverted newly added items
ep_onlineupdate_revert_deleteNumber of reverted deleted items
ep_onlineupdate_revert_updateNumber of reverted updated items

2.2 vBucket total stats

StatDescription
ep_vb_totalTotal vBuckets (count)
curr_items_totTotal number of items
curr_itemsNumber of active items in memory
vb_dead_numNumber of dead vBuckets
ep_diskqueue_itemsTotal items in disk queue
ep_diskqueue_memoryTotal memory used in disk queue
ep_diskqueue_fillTotal enqueued items on disk queue
ep_diskqueue_drainTotal drained items on disk queue
ep_diskqueue_pendingWritesTotal bytes of pending writes

2.2.1 Active vBucket class stats

StatDescription
vb_active_numNumber of active vBuckets
vb_active_curr_itemsNumber of in memory items
vb_active_num_non_residentNumber of non-resident items
vb_active_perc_mem_resident% memory resident
vb_active_ejectNumber of times item values got ejected
vb_active_ht_memoryMemory used to store keys and values
vb_active_itm_memoryTotal item memory
vb_active_ops_createNumber of create operations
vb_active_ops_updateNumber of update operations
vb_active_ops_deleteNumber of delete operations
vb_active_ops_rejectNumber of rejected operations
vb_active_queue_sizeActive items in disk queue
vb_active_queue_memoryMemory used for disk queue
vb_active_queue_ageSum of disk queue item age in milliseconds
vb_active_queue_pendingTotal bytes of pending writes
vb_active_queue_fillTotal enqueued items
vb_active_queue_drainTotal drained items

2.2.2 Replica vBucket stats

StatDescription
vb_replica_numNumber of replica vBuckets
vb_replica_curr_itemsNumber of in memory items
vb_replica_num_non_residentNumber of non-resident items
vb_replica_perc_mem_resident% memory resident
vb_replica_ejectNumber of times item values got ejected
vb_replica_ht_memoryMemory used to store keys and values
vb_replica_itm_memoryTotal item memory
vb_replica_ops_createNumber of create operations
vb_replica_ops_updateNumber of update operations
vb_replica_ops_deleteNumber of delete operations
vb_replica_ops_rejectNumber of rejected operations
vb_replica_queue_sizeReplica items in disk queue
vb_replica_queue_memoryMemory used for disk queue
vb_replica_queue_ageSum of disk queue item age in milliseconds
vb_replica_queue_pendingTotal bytes of pending writes
vb_replica_queue_fillTotal enqueued items
vb_replica_queue_drainTotal drained items

2.2.3 Pending vBucket stats

StatDescription
vb_pending_numNumber of pending vBuckets
vb_pending_curr_itemsNumber of in memory items
vb_pending_num_non_residentNumber of non-resident items
vb_pending_perc_mem_resident% memory resident
vb_pending_ejectNumber of times item values got ejected
vb_pending_ht_memoryMemory used to store keys and values
vb_pending_itm_memoryTotal item memory
vb_pending_ops_createNumber of create operations
vb_pending_ops_updateNumber of update operations
vb_pending_ops_deleteNumber of delete operations
vb_pending_ops_rejectNumber of rejected operations
vb_pending_queue_sizePending items in disk queue
vb_pending_queue_memoryMemory used for disk queue
vb_pending_queue_ageSum of disk queue item age in milliseconds
vb_pending_queue_pendingTotal bytes of pending writes
vb_pending_queue_fillTotal enqueued items
vb_pending_queue_drainTotal drained items

2.3 Tap stats

ep_tap_total_queueSum of tap queue sizes on the current
tap queues
ep_tap_total_fetchedSum of all tap messages sent
ep_tap_bg_max_pendingThe maximum number of bg jobs a tap
connection may have
ep_tap_bg_fetchedNumber of tap disk fetches
ep_tap_bg_fetch_requeuedNumber of times a tap bg fetch task is
requeued.
ep_tap_fg_fetchedNumber of tap memory fetches
ep_tap_deletesNumber of tap deletion messages sent
ep_tap_throttledNumber of tap messages refused due to
throttling.
ep_tap_keepaliveHow long to keep tap connection state
after client disconnect.
ep_tap_countNumber of tap connections.
ep_tap_bg_num_samplesThe number of tap bg fetch samples
included in the avg
ep_tap_bg_min_waitThe shortest time (µs) for a tap item
before it is serviced by the dispatcher
ep_tap_bg_max_waitThe longest time (µs) for a tap item
before it is serviced by the dispatcher
ep_tap_bg_wait_avgThe average wait time (µs) for a tap item
before it is serviced by the dispatcher
ep_tap_bg_min_loadThe shortest time (µs) for a tap item to
be loaded from the persistence layer
ep_tap_bg_max_loadThe longest time (µs) for a tap item to
be loaded from the persistence layer
ep_tap_bg_load_avgThe average time (µs) for a tap item to
be loaded from the persistence layer
ep_tap_noop_intervalThe number of secs between a noop is added
to an idle connection
ep_tap_backoff_periodThe number of seconds the tap connection
should back off after receiving ETMPFAIL
ep_tap_queue_fillTotal enqueued items
ep_tap_queue_drainTotal drained items
ep_tap_queue_backoffTotal back-off items
ep_tap_queue_backfillNumber of backfill remaining
ep_tap_queue_itemondiskNumber of items remaining on disk

2.3.1 Per Tap Client Stats

Each stat begins with ep_tapq: followed by a unique client_id and another colon. For example, if your client is named, slave1, the qlen stat would be ep_tapq:slave1:qlen.

typeThe kind of tap connection (producer orPC
consumer)
createdCreation time for the tap connectionPC
supports_acktrue if the connection use acksPC
connectedtrue if this client is connectedPC
disconnectsNumber of disconnects from this client.PC
qlenQueue size for the given client_id.P
qlen_high_priHigh priority tap queue items.P
qlen_low_priLow priority tap queue items.P
vb_filtersSize of connection vbucket filter set.P
vb_filterThe content of the vbucket filterP
rec_fetchedTap messages sent to the client.P
rec_skippedNumber of messages skipped due toP
tap reconnect with a different filterP
idleTrue if this connection is idle.P
emptyTrue if this connection has no items.P
completeTrue if backfill is complete.P
has_itemTrue when there is a bg fetched itemP
ready.P
has_queued_itemTrue when there is a key ready to beP
looked up (may become fg or bg item)P
bg_wait_for_resultTrue if the max number of backgroundP
operations is startedP
bg_queue_sizeNumber of bg fetches enqueued for thisP
connection.P
bg_queuedNumber of background fetches enqueued.P
bg_result_sizeNumber of ready background results.P
bg_resultsNumber of background results ready.P
bg_jobs_issuedNumber of background jobs started.P
bg_jobs_completedNumber of background jobs completed.P
bg_backlog_sizeNumber of items pending bg fetch.P
flagsConnection flags set by the client.P
pending_disconnecttrue if we’re hanging up on this clientP
pausedtrue if this client is blockedP
pending_backfilltrue if we’re still backfilling keysP
for this connectionP
pending_disk_backfilltrue if we’re still backfilling keysP
from disk for this connectionP
backfill_completedtrue if all items from backfill isP
successfully transmitted to the clientP
reconnectsNumber of reconnects from this client.P
backfill_ageThe age of the start of the backfill.P
ack_seqnoThe current tap ACK sequence number.P
recv_ack_seqnoLast receive tap ACK sequence number.P
ack_log_sizeTap ACK backlog size.P
ack_window_fulltrue if our tap ACK window is full.P
expiresWhen this ACK backlog expires.P
num_tap_nackThe number of negative tap acks receivedP
num_tap_tmpfail_survivorsThe number of items rescheduled due toP
a temporary nack.P
queue_memoryMemory used for tap queueP
queue_fillTotal queued itemsP
queue_drainTotal drained itemsP
queue_backoffTotal back-off itemsP
queue_backfillremainingNumber of backfill remainingP
queue_itemondiskNumber of items remaining on diskP
total_backlog_sizeNum of remaining items for replicationP
num_deleteNumber of delete operations consumedC
num_delete_failedNumber of failed delete operationsC
num_flushNumber of flush operationsC
num_flush_failedNumber of failed flush operationsC
num_mutationNumber of mutation operationsC
num_mutation_failedNumber of failed mutation operationsC
num_opaqueNumber of opaque operation consumedC
num_opaque_failedNumber of failed opaque operationsC
num_vbucket_setNumber of vbucket set operationsC
num_vbucket_set_failedNumber of failed vbucket set operationsC
num_unknownNumber of unknown operationsC

2.4 Tap Aggregated Stats

Aggregated tap stats allow named tap connections to be logically grouped and aggregated together by prefixes.

For example, if all of your tap connections started with rebalance_ or replication_, you could call stats tapagg _ to request stats grouped by everything before the first _ character, giving you a set for rebalance and a set for replication.

2.4.1 Results

[prefix]:countNumber of connections matching this prefix
[prefix]:qlenTotal length of queues with this prefix
[prefix]:backfill_remainingNumber of items needing to be backfilled
[prefix]:backoffTotal number of backoff events
[prefix]:drainTotal number of items drained
[prefix]:fillTotal number of items filled
[prefix]:itemondiskNumber of items remaining on disk
[prefix]:total_backlog_sizeNum of remaining items for replication

2.5 Timing Stats

Timing stats provide histogram data from high resolution timers over various operations within the system.

2.5.1 General Form

As this data is multi-dimensional, some parsing may be required for machine processing. It’s somewhat human readable, but the stats script mentioned in the Getting Started section above will do fancier formatting for you.

Consider the following sample stats:

STAT disk_insert_8,16 9488
STAT disk_insert_16,32 290
STAT disk_insert_32,64 73
STAT disk_insert_64,128 86
STAT disk_insert_128,256 48
STAT disk_insert_256,512 2
STAT disk_insert_512,1024 12
STAT disk_insert_1024,2048 1

This tells you that disk_insert took 8-16µs 9,488 times, 16-32µs 290 times, and so on.

The same stats displayed through the stats CLI tool would look like this:

disk_insert (10008 total)
   8us - 16us    : ( 94.80%) 9488 ###########################################
   16us - 32us   : ( 97.70%)  290 #
   32us - 64us   : ( 98.43%)   73
   64us - 128us  : ( 99.29%)   86
   128us - 256us : ( 99.77%)   48
   256us - 512us : ( 99.79%)    2
   512us - 1ms   : ( 99.91%)   12
   1ms - 2ms     : ( 99.92%)    1

2.5.2 Available Stats

The following histograms are available from “timings” in the above form to describe when time was spent doing various things:

bg_waitbg fetches waiting in the dispatcher queue
bg_loadbg fetches waiting for disk
bg_tap_waittap bg fetches waiting in the dispatcher queue
bg_tap_laodtap bg fetches waiting for disk
pending_opsclient connections blocked for operations
in pending vbuckets.
storage_ageAnalogous to ep_storage_age in main stats.
data_ageAnalogous to ep_data_age in main stats.
get_cmdservicing get requests
store_cmdservicing store requests
arith_cmdservicing incr/decr requests
get_vb_cmdservicing vbucket status requests
set_vb_cmdservicing vbucket set state commands
del_vb_cmdservicing vbucket deletion commands
tap_vb_setservicing tap vbucket set state commands
tap_vb_resetservicing tap vbucket reset commands
tap_mutationservicing tap mutations
notify_iowaking blocked connections
disk_insertwaiting for disk to store a new item
disk_updatewaiting for disk to modify an existing item
disk_delwaiting for disk to delete an item
disk_vb_delwaiting for disk to delete a vbucket
disk_vb_chunk_delwaiting for disk to delete a vbucket chunk
disk_commitwaiting for a commit after a batch of updates
disk_invalid_item_delWaiting for disk to delete a chunk of invalid
items with the old vbucket version

2.6 Hash Stats

Hash stats provide information on your per-vbucket hash tables.

Requesting these stats does affect performance, so don’t do it too regularly, but it’s useful for debugging certain types of performance issues. For example, if your hash table is tuned to have too few buckets for the data load within it, the max_depth will be too large and performance will suffer.

Each stat is prefixed with vb_ followed by a number, a colon, then the individual stat name.

For example, the stat representing the size of the hash table for vbucket 0 is vb_0:size.

stateThe current state of this vbucket
sizeNumber of hash buckets
locksNumber of locks covering hash table operations
min_depthMinimum number of items found in a bucket
max_depthMaximum number of items found in a bucket
reportedNumber of items this hash table reports having
countedNumber of items found while walking the table
resizedNumber of times the hash table resized.
mem_sizeRunning sum of memory used by each item.
mem_size_countedCounted sum of current memory used by each item.

2.7 Checkpoint Stats

Checkpoint stats provide detailed information on per-vbucket checkpoint datastructure.

Like Hash stats, requesting these stats has some impact on performance. Therefore, please do not poll them from the server frequently. Each stat is prefixed with vb_ followed by a number, a colon, and then each stat name.

open_checkpoint_idID of the current open checkpoint
num_tap_cursorsNumber of referencing TAP cursors
num_checkpoint_itemsNumber of total items in a checkpoint datastructure
num_checkpointsNumber of checkpoints in a checkpoint datastructure

3 Details

3.1 Ages

The difference between ep_storage_age and ep_data_age is somewhat subtle, but when you consider that a given record may be updated multiple times before hitting persistence, it starts to be clearer.

ep_data_age is how old the data we actually wrote is.

ep_storage_age is how long the object has been waiting to be persisted.

3.2 Too Young

ep_too_young is incremented every time an object is encountered whose data age is more recent than is allowable for the persistence layer.

For example, if an object that was queued five minutes ago is picked off the todo queue and found to have been updated fifteen seconds ago, it will not be stored, ep_too_young will be incremented, and the key will go back on the input queue.

3.3 Too Old

ep_too_old is incremented every time an object is encountered whose queue age exceeds the ep_queue_age_cap setting.

ep_queue_age_cap generally exists as a safety net to prevent the ep_min_data_age setting from preventing persistence altogether.

3.4 Warming Up

Opening the data store is broken into three distinct phases:

3.4.1 Initializing

During the initialization phase, the server is not accepting connections or otherwise functional. This is often quick, but in a server crash can take some time to perform recovery of the underlying storage.

This time is made available via the ep_dbinit stat.

3.4.2 Warming Up

After initialization, warmup begins. At this point, the server is capable of taking new writes and responding to reads. However, only records that have been pulled out of the storage or have been updated from other clients will be available for request.

(note that records read from persistence will not overwrite new records captured from the network)

During this phase, ep_warmup_thread will report running and ep_warmed_up will be increasing as records are being read.

3.4.3 Complete

Once complete, ep_warmed_up will stop increasing and ep_warmup_thread will report complete.