For introductory information on stats within membase, start with the membase wiki stats page.
Stat | Description |
---|---|
ep_version | Version number of ep_engine. |
ep_storage_age | Seconds since most recently |
stored object was initially queued. | |
ep_storage_age_highwat | ep_storage_age high water mark |
ep_min_data_age | Minimum data age setting. |
ep_queue_age_cap | Queue age cap setting. |
ep_max_txn_size | Max number of updates per transaction. |
ep_data_age | Seconds since most recently |
stored object was modified. | |
ep_data_age_highwat | ep_data_age high water mark |
ep_too_young | Number of times an object was |
not stored due to being too young. | |
ep_too_old | Number of times an object was |
stored after being dirty too long. | |
ep_total_enqueued | Total number of items queued for |
persistence. | |
ep_total_new_items | Total number of persisted new items. |
ep_total_del_items | Total number of persisted deletions. |
ep_total_persisted | Total number of items persisted. |
ep_item_flush_failed | Number of times an item failed to flush |
due to storage errors. | |
ep_item_commit_failed | Number of times a transaction failed to |
commit due to storage errors. | |
ep_item_begin_failed | Number of times a transaction failed to |
start due to storage errors. | |
ep_expired | Number of times an item was expired. |
ep_item_flush_expired | Number of times an item is not flushed |
due to the expiry of the item | |
ep_queue_size | Number of items queued for storage. |
ep_flusher_todo | Number of items remaining to be written. |
ep_flusher_state | Current state of the flusher thread. |
ep_commit_num | Total number of write commits. |
ep_commit_time | Number of seconds of most recent commit. |
ep_commit_time_total | Cumulative seconds spent committing. |
ep_vbucket_del | Number of vbucket deletion events. |
ep_vbucket_del_fail | Number of failed vbucket deletion events. |
ep_vbucket_del_max_walltime | Max wall time (µs) spent by deleting |
a vbucket | |
ep_vbucket_del_total_walltime | Total wall time (µs) spent by deleting |
vbuckets | |
ep_vbucket_del_avg_walltime | Avg wall time (µs) spent by deleting |
a vbucket | |
ep_flush_preempts | Num of flush early exits for read reqs. |
ep_flush_duration | Number of seconds of most recent flush. |
ep_flush_duration_total | Cumulative seconds spent flushing. |
ep_flush_duration_highwat | ep_flush_duration high water mark. |
curr_items | Num items in active vbuckets. |
curr_items_tot | Num current items including those not |
active (replica, dead and pending states) | |
ep_kv_size | Memory used to store keys and values. |
ep_overhead | Extra memory used by rep queues, etc.. |
ep_max_data_size | Max amount of data allowed in memory. |
ep_mem_low_wat | Low water mark for auto-evictions. |
ep_mem_high_wat | High water mark for auto-evictions. |
ep_total_cache_size | The total size of all items in the cache |
ep_oom_errors | Number of times unrecoverable OOMs |
happened while processing operations | |
ep_tmp_oom_errors | Number of times temporary OOMs |
happened while processing operations | |
ep_bg_fetched | Number of items fetched from disk. |
ep_tap_bg_fetched | Number of tap disk fetches |
ep_tap_bg_fetch_requeued | Number of times a tap bg fetch task is |
requeued. | |
ep_num_pager_runs | Number of times we ran pager loops |
to seek additional memory. | |
ep_num_expiry_pager_runs | Number of times we ran expiry pager loops |
to purge expired items from memory/disk | |
ep_num_value_ejects | Number of times item values got ejected |
from memory to disk | |
ep_num_eject_replicas | Number of times replica item values got |
ejected from memory to disk | |
ep_num_eject_failures | Number of items that could not be ejected |
ep_num_not_my_vbuckets | Number of times Not My VBucket exception |
happened during runtime | |
ep_warmup_thread | Warmup thread status. |
ep_warmed_up | Number of items warmed up. |
ep_warmup_dups | Duplicates encountered during warmup. |
ep_warmup_oom | OOMs encountered during warmup. |
ep_warmup_time | Time (µs) spent by warming data. |
ep_tap_keepalive | Tap keepalive time. |
ep_dbname | DB path. |
ep_dbinit | Number of seconds to initialize DB. |
ep_dbshards | Number of shards for db store |
ep_db_strategy | SQLite db strategy |
ep_warmup | true if warmup is enabled. |
ep_io_num_read | Number of io read operations |
ep_io_num_write | Number of io write operations |
ep_io_read_bytes | Number of bytes read (key + values) |
ep_io_write_bytes | Number of bytes written (key + values) |
ep_pending_ops | Number of ops awaiting pending vbuckets |
ep_pending_ops_total | Total blocked pending ops since reset |
ep_pending_ops_max | Max ops seen awaiting 1 pending vbucket |
ep_pending_ops_max_duration | Max time (µs) used waiting on pending |
vbuckets | |
ep_bg_num_samples | The number of samples included in the avg |
ep_bg_min_wait | The shortest time (µs) in the wait queue |
ep_bg_max_wait | The longest time (µs) in the wait queue |
ep_bg_wait_avg | The average wait time (µs) for an item |
before it is serviced by the dispatcher | |
ep_bg_min_load | The shortest load time (µs) |
ep_bg_max_load | The longest load time (µs) |
ep_bg_load_avg | The average time (µs) for an item to be |
loaded from the persistence layer | |
ep_num_non_resident | The number of non-resident items |
ep_num_active_non_resident | Number of non-resident items in active |
vbuckets. | |
ep_store_max_concurrency | Maximum allowed concurrency at the storage |
layer. | |
ep_store_max_readers | Maximum number of concurrent read-only. |
storage threads. | |
ep_store_max_readwrite | Maximum number of concurrent read/write |
storage threads. | |
ep_db_cleaner_status | Status of database cleaner that cleans up |
invalid items with old vbucket versions |
ep_tap_total_queue | Sum of tap queue sizes on the current |
tap queues | |
ep_tap_total_fetched | Sum of all tap messages sent |
ep_tap_bg_max_pending | The maximum number of bg jobs a tap |
connection may have | |
ep_tap_bg_fetched | Number of tap disk fetches |
ep_tap_bg_fetch_requeued | Number of times a tap bg fetch task is |
requeued. | |
ep_tap_fg_fetched | Number of tap memory fetches |
ep_tap_deletes | Number of tap deletion messages sent |
ep_tap_throttled | Number of tap messages refused due to |
throttling. | |
ep_tap_keepalive | How long to keep tap connection state |
after client disconnect. | |
ep_tap_count | Number of tap connections. |
ep_tap_bg_num_samples | The number of tap bg fetch samples |
included in the avg | |
ep_tap_bg_min_wait | The shortest time (µs) for a tap item |
before it is serviced by the dispatcher | |
ep_tap_bg_max_wait | The longest time (µs) for a tap item |
before it is serviced by the dispatcher | |
ep_tap_bg_wait_avg | The average wait time (µs) for a tap item |
before it is serviced by the dispatcher | |
ep_tap_bg_min_load | The shortest time (µs) for a tap item to |
be loaded from the persistence layer | |
ep_tap_bg_max_load | The longest time (µs) for a tap item to |
be loaded from the persistence layer | |
ep_tap_bg_load_avg | The average time (µs) for a tap item to |
be loaded from the persistence layer | |
ep_tap_noop_interval | The number of secs between a noop is added |
to an idle connection | |
ep_tap_backoff_period | The number of seconds the tap connection |
should back off after receiving ETMPFAIL |
Each stat begins with ep_tapq:
followed by a unique client_id and
another colon. For example, if your client is named, slave1
, the
qlen
stat would be ep_tapq:slave1:qlen
.
qlen | Queue size for the given client_id. |
qlen_high_pri | High priority tap queue items. |
qlen_low_pri | Low priority tap queue items. |
vb_filters | Size of connection vbucket filter set. |
vb_filter | The content of the vbucket filter |
rec_fetched | Tap messages sent to the client. |
rec_skipped | Number of messages skipped due to |
tap reconnect with a different filter | |
idle | True if this connection is idle. |
empty | True if this connection has no items. |
complete | True if backfill is complete. |
has_item | True when there is a bg fetched item |
ready. | |
has_queued_item | True when there is a key ready to be |
looked up (may become fg or bg item) | |
bg_wait_for_result | True if the max number of background |
operations is started | |
bg_queue_size | Number of bg fetches enqueued for this |
connection. | |
bg_queued | Number of background fetches enqueued. |
bg_result_size | Number of ready background results. |
bg_results | Number of background results ready. |
bg_jobs_issued | Number of background jobs started. |
bg_jobs_completed | Number of background jobs completed. |
bg_backlog_size | Number of items pending bg fetch. |
flags | Connection flags set by the client. |
connected | true if this client is connected |
pending_disconnect | true if we’re hanging up on this client |
paused | true if this client is blocked |
pending_backfill | true if we’re still backfilling keys |
for this connection | |
pending_disk_backfill | true if we’re still backfilling keys |
from disk for this connection | |
reconnects | Number of reconnects from this client. |
disconnects | Number of disconnects from this client. |
backfill_age | The age of the start of the backfill. |
ack_seqno | The current tap ACK sequence number. |
recv_ack_seqno | Last receive tap ACK sequence number. |
ack_log_size | Tap ACK backlog size. |
ack_window_full | true if our tap ACK window is full. |
expires | When this ACK backlog expires. |
num_tap_nack | The number of negative tap acks received |
num_tap_tmpfail_survivors | The number of items rescheduled due to |
a temporary nack. |
Timing stats provide histogram data from high resolution timers over various operations within the system.
As this data is multi-dimensional, some parsing may be required for
machine processing. It’s somewhat human readable, but the stats
script mentioned in the Getting Started section above will do fancier
formatting for you.
Consider the following sample stats:
STAT disk_insert_8,16 9488 STAT disk_insert_16,32 290 STAT disk_insert_32,64 73 STAT disk_insert_64,128 86 STAT disk_insert_128,256 48 STAT disk_insert_256,512 2 STAT disk_insert_512,1024 12 STAT disk_insert_1024,2048 1
This tells you that disk_insert
took 8-16µs 9,488 times, 16-32µs
290 times, and so on.
The same stats displayed through the stats
CLI tool would look like
this:
disk_insert (10008 total) 8us - 16us : ( 94.80%) 9488 ########################################### 16us - 32us : ( 97.70%) 290 # 32us - 64us : ( 98.43%) 73 64us - 128us : ( 99.29%) 86 128us - 256us : ( 99.77%) 48 256us - 512us : ( 99.79%) 2 512us - 1ms : ( 99.91%) 12 1ms - 2ms : ( 99.92%) 1
The following histograms are available from “timings” in the above form to describe when time was spent doing various things:
bg_wait | bg fetches waiting in the dispatcher queue |
bg_load | bg fetches waiting for disk |
bg_tap_wait | tap bg fetches waiting in the dispatcher queue |
bg_tap_laod | tap bg fetches waiting for disk |
pending_ops | client connections blocked for operations |
in pending vbuckets. | |
storage_age | Analogous to ep_storage_age in main stats. |
data_age | Analogous to ep_data_age in main stats. |
get_cmd | servicing get requests |
store_cmd | servicing store requests |
arith_cmd | servicing incr/decr requests |
get_vb_cmd | servicing vbucket status requests |
set_vb_cmd | servicing vbucket set state commands |
del_vb_cmd | servicing vbucket deletion commands |
tap_vb_set | servicing tap vbucket set state commands |
tap_mutation | servicing tap mutations |
notify_io | waking blocked connections |
disk_insert | waiting for disk to store a new item |
disk_update | waiting for disk to modify an existing item |
disk_del | waiting for disk to delete an item |
disk_vb_del | waiting for disk to delete a vbucket |
disk_vb_chunk_del | waiting for disk to delete a vbucket chunk |
disk_commit | waiting for a commit after a batch of updates |
disk_invalid_item_del | Waiting for disk to delete a chunk of invalid |
items with the old vbucket version |
Hash stats provide information on your per-vbucket hash tables.
Requesting these stats does affect performance, so don’t do it too
regularly, but it’s useful for debugging certain types of performance
issues. For example, if your hash table is tuned to have too few
buckets for the data load within it, the max_depth
will be too large
and performance will suffer.
Each stat is prefixed with vb_
followed by a number, a colon, then
the individual stat name.
For example, the stat representing the size of the hash table for
vbucket 0 is vb_0:size
.
state | The current state of this vbucket |
size | Number of hash buckets |
locks | Number of locks covering hash table operations |
min_depth | Minimum number of items found in a bucket |
max_depth | Maximum number of items found in a bucket |
reported | Number of items this hash table reports having |
counted | Number of items found while walking the table |
resized | Number of times the hash table resized. |
mem_size | Running sum of memory used by each item. |
mem_size_counted | Counted sum of current memory used by each item. |
The difference between ep_storage_age
and ep_data_age
is somewhat
subtle, but when you consider that a given record may be updated
multiple times before hitting persistence, it starts to be clearer.
ep_data_age
is how old the data we actually wrote is.
ep_storage_age
is how long the object has been waiting to be
persisted.
ep_too_young
is incremented every time an object is encountered
whose data age
is more recent than is allowable for the persistence
layer.
For example, if an object that was queued five minutes ago is picked
off the todo
queue and found to have been updated fifteen seconds
ago, it will not be stored, ep_too_young
will be incremented, and
the key will go back on the input queue.
ep_too_old
is incremented every time an object is encountered whose
queue age
exceeds the ep_queue_age_cap
setting.
ep_queue_age_cap
generally exists as a safety net to prevent the
ep_min_data_age
setting from preventing persistence altogether.
Opening the data store is broken into three distinct phases:
During the initialization phase, the server is not accepting connections or otherwise functional. This is often quick, but in a server crash can take some time to perform recovery of the underlying storage.
This time is made available via the ep_dbinit
stat.
After initialization, warmup begins. At this point, the server is capable of taking new writes and responding to reads. However, only records that have been pulled out of the storage or have been updated from other clients will be available for request.
(note that records read from persistence will not overwrite new records captured from the network)
During this phase, ep_warmup_thread
will report running
and
ep_warmed_up
will be increasing as records are being read.
Once complete, ep_warmed_up
will stop increasing and
ep_warmup_thread
will report complete
.