Skip to content

Latest commit

 

History

History
366 lines (250 loc) · 15.3 KB

statscollector.rst

File metadata and controls

366 lines (250 loc) · 15.3 KB

Stats Collector

The Stats Collector Utility consists of a series of Redis based counting mechanisms, that allow a program to do distributed counting for particular time periods.

There are many useful types of keys within Redis, and this counting Stats Collector allow you to use the following styles of keys:

  • Integer values
  • Unique values
  • HyperLogLog values
  • Bitmap values

You can also specify the style of time window you wish to do your collection in. The Stats Collect currently supports the following use cases:

  • Sliding window integer counter
  • Step based counter for all other types

The sliding window counter allows you to determine "How many hits have occurred in the last X seconds?". This is useful if you wish to know how many hits your API has had in the last hour, or how many times you have handled a particular exception in the past day.

The step based counter allows you to to collect counts based on a rounded time range chunks. This allows you to collect counts of things in meaningful time ranges, like from 9-10 am, 10-11 am, 11-12 pm, etc. The counter incrementally steps through the day, mapping your counter to the aggregated key for your desired time range. If you wanted to collect in 15 minute chunks, the counter steps through any particular hour from :00-:15, :15-:30, :30-:45, and :45-:00. This applies to all time ranges available. When using the step style counters, you can also specify the number of previous steps to keep.

Note

The step based counter does not map to the same key once all possible steps have been accounted for. 9:00 - 9:15 am is not the same thing as 10:00 - 10:15am. or 9-10 am on Monday is not the same thing as 9-10am on Tuesday (or next Monday). All steps have a unique key associated with them.

You should use the following static class methods to generate your counter objects.

These easy to use variables are provided for convenience for setting up your collection windows. Note that some are duplicates for naming convention only.

var SECONDS_1_MINUTE

The number of seconds in 1 minute

var SECONDS_15_MINUTE

The number of seconds in 15 minutes

var SECONDS_30_MINUTE

The number of seconds in 30 minutes

var SECONDS_1_HOUR

The number of seconds in 1 hour

var SECONDS_2_HOUR

The number of seconds in 2 hours

var SECONDS_4_HOUR

The number of seconds in 4 hours

var SECONDS_6_HOUR

The number of seconds in 6 hours

var SECONDS_12_HOUR

The number of seconds in 12 hours

var SECONDS_24_HOUR

The number of seconds in 24 hours

var SECONDS_48_HOUR

The number of seconds in 48 hours

var SECONDS_1_DAY

The number of seconds in 1 day

var SECONDS_2_DAY

The number of seconds in 2 days

var SECONDS_3_DAY

The number of seconds in 3 day

var SECONDS_7_DAY

The number of seconds in 7 days

var SECONDS_1_WEEK

The number of seconds in 1 week

var SECONDS_30_DAY

The number of seconds in 30 days

get_time_window(redis_conn=None, host='localhost', port=6379, key='time_window_counter', cycle_time=5, start_time=None, window=SECONDS_1_HOUR, roll=True, keep_max=12)

Generates a new TimeWindow Counter. Useful for collecting number of hits generated between certain times

param redis_conn

A premade redis connection (overrides host and port)

param str host

the redis host

param int port

the redis port

param str key

the key for your stats collection

param int cycle_time

how often to check for expiring counts

param int start_time

the time to start valid collection

param int window

how long to collect data for in seconds (if rolling)

param bool roll

Roll the window after it expires, to continue collecting on a new date based key.

param bool keep_max

If rolling the static window, the max number of prior windows to keep

returns

A TimeWindow counter object.

get_rolling_time_window(redis_conn=None, host='localhost', port=6379, key='rolling_time_window_counter', cycle_time=5, window=SECONDS_1_HOUR)

Generates a new RollingTimeWindow. Useful for collect data about the number of hits in the past X seconds

param redis_conn

A premade redis connection (overrides host and port)

param str host

the redis host

param int port

the redis port

param str key

the key for your stats collection

param int cycle_time

how often to check for expiring counts

param int window

the number of seconds behind now() to keep data for

returns

A RollingTimeWindow counter object.

get_counter(redis_conn=None, host='localhost', port=6379, key='counter', cycle_time=5, start_time=None, window=SECONDS_1_HOUR, roll=True, keep_max=12, start_at=0)

Generate a new Counter. Useful for generic distributed counters

param redis_conn

A premade redis connection (overrides host and port)

param str host

the redis host

param int port

the redis port

param str key

the key for your stats collection

param int cycle_time

how often to check for expiring counts

param int start_time

the time to start valid collection

param int window

how long to collect data for in seconds (if rolling)

param bool roll

Roll the window after it expires, to continue collecting on a new date based key.

param int keep_max

If rolling the static window, the max number of prior windows to keep

param int start_at

The integer to start counting at

returns

A Counter object.

get_unique_counter(redis_conn=None, host='localhost', port=6379, key='unique_counter', cycle_time=5, start_time=None, window=SECONDS_1_HOUR, roll=True, keep_max=12)

Generate a new UniqueCounter. Useful for exactly counting unique objects

param redis_conn

A premade redis connection (overrides host and port)

param str host

the redis host

param int port

the redis port

param str key

the key for your stats collection

param int cycle_time

how often to check for expiring counts

param int start_time

the time to start valid collection

param int window

how long to collect data for in seconds (if rolling)

param bool roll

Roll the window after it expires, to continue collecting on a new date based key.

param int keep_max

If rolling the static window, the max number of prior windows to keep

returns

A UniqueCounter object.

get_hll_counter(redis_conn=None, host='localhost', port=6379, key='hyperloglog_counter', cycle_time=5, start_time=None, window=SECONDS_1_HOUR, roll=True, keep_max=12)

Generate a new HyperLogLogCounter. Useful for approximating extremely large counts of unique items

param redis_conn

A premade redis connection (overrides host and port)

param str host

the redis host

param int port

the redis port

param str key

the key for your stats collection

param int cycle_time

how often to check for expiring counts

param int start_time

the time to start valid collection

param int window

how long to collect data for in seconds (if rolling)

param bool roll

Roll the window after it expires, to continue collecting on a new date based key.

param int keep_max

If rolling the static window, the max number of prior windows to keep

returns

A HyperLogLogCounter object.

get_bitmap_counter(redis_conn=None, host='localhost', port=6379, key='bitmap_counter', cycle_time=5, start_time=None, window=SECONDS_1_HOUR, roll=True, keep_max=12)

Generate a new BitMapCounter. Useful for creating different bitsets about users/items that have unique indices.

param redis_conn

A premade redis connection (overrides host and port)

param str host

the redis host

param int port

the redis port

param str key

the key for your stats collection

param int cycle_time

how often to check for expiring counts

param int start_time

the time to start valid collection

param int window

how long to collect data for in seconds (if rolling)

param bool roll

Roll the window after it expires, to continue collecting on a new date based key.

param int keep_max

If rolling the static window, the max number of prior windows to keep

returns

A BitmapCounter object.

Each of the above methods generates a counter object that works in slightly different ways.

increment()

Increments the counter by 1.

value()

returns

The value of the counter

get_key()

returns

The string of the key being used

delete_key()

Deletes the key being used from Redis

increment()

Increments the counter by 1.

value()

returns

The value of the counter

get_key()

returns

The string of the key being used

delete_key()

Deletes the key being used from Redis

increment()

Increments the counter by 1.

value()

returns

The value of the counter

get_key()

returns

The string of the key being used

delete_key()

Deletes the key being used from Redis

increment(item)

Tries to increment the counter by 1, if the item is unique

param item

the potentially unique item

value()

returns

The value of the counter

get_key()

returns

The string of the key being used

delete_key()

Deletes the key being used from Redis

increment(item)

Tries to increment the counter by 1, if the item is unique

param item

the potentially unique item

value()

returns

The value of the counter

get_key()

returns

The string of the key being used

delete_key()

Deletes the key being used from Redis

increment(index)

Sets the bit at the particular index to 1

param item

the potentially unique item

value()

returns

The number of bits set to 1 in the key

get_key()

returns

The string of the key being used

delete_key()

Deletes the key being used from Redis

Usage

To use any counter, you should import the StatsCollector and use one of the static methods to generate your counting object. From there you can call increment() to increment the counter and value() to get the current count of the Redis key being used.

>>> from scutils.stats_collector import StatsCollector
>>> counter = StatsCollector.get_counter(host='scdev')
>>> counter.increment()
>>> counter.increment()
>>> counter.increment()
>>> counter.value()
3
>>> counter.get_key()
'counter:2016-01-31_19:00:00'

The key generated by the counter is based off of the UTC time of the machine it is running on. Note here since the default window time range is SECONDS_1_HOUR, the counter rounded the key down to the appropriate step.

Warning

When doing multi-threaded or multi-process counting on the same key, all counters operating on that key should be created with the counter style and the same parameters to avoid unintended behavior.

Example

In this example we are going count the number of times a user presses the Space bar while our program continuously runs.

Note

You will need the py-getch module from pip to run this example. pip install py-getch

import argparse
from getch import getch
from time import time
from scutils.stats_collector import StatsCollector

# set up arg parser
parser = argparse.ArgumentParser(
    description='Example key press stats collector.\n')
parser.add_argument('-rw', '--rolling-window', action='store_true',
                    required=False, help="Use a RollingTimeWindow counter",
                    default=False)
parser.add_argument('-r', '--redis-host', action='store', required=True,
                    help="The Redis host ip")
parser.add_argument('-p', '--redis-port', action='store', default='6379',
                    help="The Redis port")

args = vars(parser.parse_args())

the_window = StatsCollector.SECONDS_1_MINUTE

if args['rolling_window']:
    counter = StatsCollector.get_rolling_time_window(host=args['redis_host'],
                                                     port=args['redis_port'],
                                                     window=the_window,
                                                     cycle_time=1)
else:
    counter = StatsCollector.get_time_window(host=args['redis_host'],
                                                     port=args['redis_port'],
                                                     window=the_window,
                                                     keep_max=3)

print "Kill this program by pressing `ENTER` when done"

the_time = int(time())
floor_time = the_time % the_window
final_time = the_time - floor_time

pressed_enter = False
while not pressed_enter:
    print "The current counter value is " + str(counter.value())
    key = getch()

    if key == '\r':
        pressed_enter = True
    elif key == ' ':
        counter.increment()

    if not args['rolling_window']:
        new_time = int(time())
        floor_time = new_time % the_window
        new_final_time = new_time - floor_time

        if new_final_time != final_time:
            print "The counter window will roll soon"
            final_time = new_final_time

print "The final counter value is " + str(counter.value())
counter.delete_key()

This code either creates a TimeWindow counter, or a RollingTimeWindow counter to collect the number of space bar presses that occurs while the program is running (press Enter to exit). With these two different settings, you can view the count for a specific minute or the count from the last 60 seconds.

Save the above code snippet, or use the example at utils/examples/example_sc.py. When running this example you will get similar results to the following.

$ python example_sc.py -r scdev
Kill this program by pressing `ENTER` when done
The current counter value is 0
The current counter value is 1
The current counter value is 2
The current counter value is 3
The current counter value is 4
The current counter value is 5
The current counter value is 6
The current counter value is 7
The final counter value is 7

It is fairly straightforward to increment the counter and to get the current value, and with only a bit of code tweaking you could use the other counters that the StatsCollector provides.