AutoToken

Right-sizing resource allocations for big-data queries, particularly in serverless environments, is critical for improving infrastructure operational efficiency, capacity availability, query performance predictability, and for reducing unnecessary wait times. For more details check the paper:

AutoToken: Predicting Peak Parallelism for Big Data Analytics at Microsoft
Rathijit Sen, Alekh Jindal, Hiren Patel, Shi Qiao. VLDB 2020.

AutoToken is a simple and effective predictor for estimating the peak resource usage of recurring analytical jobs. It uses multiple signatures to identify recurring job templates and learns simple, per-signature models with the goal of reducing over-allocations for future instances of those jobs. AutoToken is computationally light, for both training and scoring, is easily deployable at scale, and is integrated with the Peregrine workload optimization infrastructure.

Dataset Simulator. This directory includes a dataset simulator to synthesize datasets of arbitrary size for AutoToken. It contains the following sub-directories.

Folder	Contents
datagen	This contains scripts for analyzing distributions of feature values for an input dataset and generating synthetic datasets of a desired size with similar distributions.
distributions	This contains distributions from which training and testing datasets can be generated for the AutoToken models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AutoToken

Files

README.md

Latest commit

History

README.md

File metadata and controls

AutoToken