Benchmarkit

Benchmark and analyze functions' time execution and results over the course of development.

Features

No boilerplate code
Saves history and additional info
Saves function output and parameters to benchmark data science tasks
Easy to analyze results
Disables garbage collector during benchmarking

Motivation

I need to benchmark execution time of my function
I don't want to memorize and write boilerplate code
I want to compare results with previous runs before some changes were introduced
I don't want to manually write down results somewhere
I want to know exact commits of my previous runs months ago
I want to benchmark accuracy, precision, recall of my models and keep track of hyperparameters

Installation

pip install benchmarkit

Usage

Benchmark execution times

Put @benchmark decorator over function with piece of code that should be timed

from benchmarkit import benchmark, benchmark_run

N = 10000
seq_list = list(range(N))
seq_set = set(range(N))

SAVE_PATH = '/tmp/benchmark_time.jsonl'


@benchmark(num_iters=100, save_params=True, save_output=False)
def search_in_list(num_items=N):
    return num_items - 1 in seq_list


@benchmark(num_iters=100, save_params=True, save_output=False)
def search_in_set(num_items=N):
    return num_items - 1 in seq_set

num_iters - how many times to repeat benchmarked function. Default 1
save_params - save parameters passed to the benchmarked function in the file with benchmark results. In the example above num_items will be saved. Default False
save_output - save benchmarked function output. Should return dict {'name': value}. Default False. See example how to benchmark model results.

Run benchmark:

benchmark_results = benchmark_run(
    [search_in_list, search_in_set],
    SAVE_PATH,
    comment='initial benchmark search',
    rows_limit=10,
    extra_fields=['num_items'],
    metric='mean_time',
    bigger_is_better=False,
)

functions - function or list of functions with benchmark decorator
save_file - path to file where to save results
comment - comment to save alongside the results
rows_limit - limit table rows in console output. Default 10
extra_fields - extra fields to include in console output
metric - metric which is used for comparison. Default mean_time
bigger_is_better - whether bigger value of metric indicates that result is better. For time benchmarks should be False, for model accuracy should be True. Default False

Prints to terminal and returns list of dictionaries with data for the last run.

Change N=1000000 and rerun

The same can be run from command line:

benchmark_run test_data/time/benchmark_functions.py --save_dir /tmp/ --comment "million items" --extra_fields num_items

Benchmark model results

from benchmarkit import benchmark, benchmark_run
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

MODEL_BENCHMARK_SAVE_FILE = '/tmp/benchmark_model.jsonl'

x, y = load_iris(return_X_y=True)

@benchmark(save_params=True, save_output=True)
def log_regression(C=1.0, fit_intercept=True):
    clf = LogisticRegression(
        random_state=0, 
        solver='lbfgs', 
        multi_class='multinomial', 
        C=C,
        fit_intercept=fit_intercept,
    )
    clf.fit(x, y)
    score = clf.score(x, y)
    return {'score': score}
    
model_benchmark_results = benchmark_run(
    log_regression,
    MODEL_BENCHMARK_SAVE_FILE,
    comment='baseline model',
    extra_fields=['C', 'fit_intercept'],
    metric='score',
    bigger_is_better=True,
)

Change hyperparameter C=0.5 and rerun. Output:

The same can be run from command line:

benchmark_run file_with_benchmark.py --save_dir /tmp/ --comment "stronger regularization" --extra_fields C fit_intercept --metric score --bigger_is_better

Analyze results from the file

from benchmarkit import benchmark_analyze

SAVE_PATH = '/tmp/benchmark_time.jsonl'

benchmark_df = benchmark_analyze(
    SAVE_PATH,
    func_name=None, 
    rows_limit=10,
    metric='mean_time',
    bigger_is_better=False,
    extra_fields=['num_items'],
)

input_path - path to .jsonl file or directory with .jsonl files with benchmark results
func_name - display statistics for particular function. If None then all functions, stored in file, are displayed. Default None
rows_limit - limit table rows in console output. Default 10
metric - metric which is used for comparison. Default mean_time
bigger_is_better - whether bigger value of metric indicates that result is better. For time benchmarks should be False, for model accuracy should be True. Default False
extra_fields - extra fields to include in console output

Prints to terminal and returns pandas DataFrame.

The same can be run from command line:

benchmark_analyze /tmp/benchmark_time.jsonl --extra_fields num_items

Other examples

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
benchmarkit		benchmarkit
img		img
notebooks		notebooks
requirements		requirements
test_data		test_data
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

License

stephematician/benchmarkit

Folders and files

Latest commit

History

Repository files navigation

Benchmarkit

Features

Motivation

Installation

Usage

Benchmark execution times

Benchmark model results

Analyze results from the file

About

Resources

License

Stars

Watchers

Forks

Languages