Skip to content

Latest commit

 

History

History
27 lines (21 loc) · 1.6 KB

README.md

File metadata and controls

27 lines (21 loc) · 1.6 KB

AutoToken

Right-sizing resource allocations for big-data queries, particularly in serverless environments, is critical for improving infrastructure operational efficiency, capacity availability, query performance predictability, and for reducing unnecessary wait times. For more details check the paper:

AutoToken: Predicting Peak Parallelism for Big Data Analytics at Microsoft
Rathijit Sen, Alekh Jindal, Hiren Patel, Shi Qiao. VLDB 2020.

AutoToken is a simple and effective predictor for estimating the peak resource usage of recurring analytical jobs. It uses multiple signatures to identify recurring job templates and learns simple, per-signature models with the goal of reducing over-allocations for future instances of those jobs. AutoToken is computationally light, for both training and scoring, is easily deployable at scale, and is integrated with the Peregrine workload optimization infrastructure.

AutoToken Video

Dataset Simulator. This directory includes a dataset simulator to synthesize datasets of arbitrary size for AutoToken. It contains the following sub-directories.

Folder Contents
datagen This contains scripts for analyzing distributions of feature values for an input dataset and generating synthetic datasets of a desired size with similar distributions.
distributions This contains distributions from which training and testing datasets can be generated for the AutoToken models.