Neuromorphic vision systems process asynchronous event streams and offer transformative potential for low-latency real-time applications like robotics. However, their evaluation remains tethered to methodologies derived from RGB imaging. These traditional approaches convert continuous event streams into fixed-rate frames and ignore perception latency, creating a critical gap between benchmarks and real-world performance. We introduce the STream-based lAtency-awaRe Evaluation (STARE) framework, designed to align with the intrinsic continuity of event-driven perception. STARE integrates two core components: Continuous Sampling and Latency-Aware Evaluation. Continuous Sampling processes new events immediately after the prior inference cycle to maximize throughput. Latency-Aware Evaluation quantifies latency-induced performance loss by matching high-frequency ground truth to the latest perception model output. To enable rigorous validation of STARE, we developed ESOT500, a dataset with 500 Hz annotations that captures high-dynamic object motion without temporal aliasing. Experiments reveal that perception latency severely degrades online performance, with accuracy dropping by over 50% compared to the traditional framework. This finding is further confirmed by our event-driven robotic ping-pong experiments, with a 55% increase in latency leading to complete task failure. We further propose two model enhancement strategies to mitigate this degradation: Asynchronous Tracking and ContextAware Sampling. Asynchronous Tracking boosts throughput using a dual lightweight-heavyweight architecture. Context-Aware Sampling adapts input based on target-specific event density. Together, these strategies reduce latency-induced accuracy loss by 61% while increasing model speed by 78%. Our work establishes a new paradigm that prioritizes temporal congruence in neuromorphic system evaluation, bridging the gap between theoretical potential and real-world deployment.
Table of Contents
STARE is an abbreviation for STream-based lAtency-awaRe Evaluation.
Please refer to the paper for more details.
STream-based lAtency-awaRe Evaluation (STARE) for event-driven perception. a, Traditional RGB perception pipeline. Visual information is captured as discrete, fixed-rate frames. The perception model processes frames sequentially, introducing perceptual delays as inference is gated by the arrival of each new frame (i.e., waiting for the next frame to start processing). b, Frame-based, latency-ignored evaluation of event vision. Mirroring RGB paradigms, the continuous event stream is preprocessed into fixed-rate event frames. Model outputs (orange bounding box) are evaluated against ground truth within the corresponding input frame, ignoring the impact of perception latency on real-time accuracy. c, The proposed STream-based lAtency-awaRe Evaluation (STARE) framework. STARE operates directly on the continuous event stream: inference initiates immediately after the prior cycle concludes (at timestamps $t_i$, $t_{i+1}$, $t_{i+2}$). Latency-Aware Evaluation matches each high-frequency ground truth (e.g., at $t_j$) to the latest model prediction, directly penalizing stale outputs. d, Example of event-driven perception under STARE. Compared to the frame-based approach in (b), STARE’s Continuous Sampling enables higher throughput, reducing temporal misalignment between model predictions (blue bounding box) and ground truth (green bounding box).
To enable rigorous validation of STARE, we present ESOT500, a new dataset for event-based visual object tracking, a classical perception task that places high demands on the real-time capabilities of models. ESOT500 features high-frequency (500 Hz) and time-aligned annotations.
Please refer to the paper for more details.
The ESOT500 dataset for high-dynamic event-driven perception. a, Representative event stream samples from the low-resolution ESOT500-L subset (346 × 260, top row) and high-resolution ESOT500-H subset (1280 × 720, bottom row), showcasing diverse high-dynamic scenarios (e.g., flag waving, bicycle riding, car driving, cap shaking, football playing, monkey swinging, fan rotating, pigeon taking off, bottle spinning, breakdancing). b, Scene category distribution for ESOT500-L (left) and ESOT500-H (right), quantifying the percentage of sequences across attributes like object similarity, background complexity, deformation, occlusion, motion speed, and indoor/outdoor setting. c, Comparative of event-based object tracking datasets. ESOT500-L and ESOT500-H stand out with 500 Hz time-aligned annotations, high resolution (up to 1280×720 for ESOT500-H), and diverse scene coverage, addressing gaps in prior datasets (e.g., low annotation frequency, lack of time-aligned labels).
Temporal aliasing in event-driven perception and ESOT500’s solution. a, Conceptual illustration of temporal aliasing. The green solid line denotes the true continuous object state over time; black dots are low-frequency periodic annotations; the yellow dashed line is the trajectory reconstructed from sparse annotations. For simple motion (top), reconstruction approximates the truth, but for complex motion (bottom), low-frequency sampling causes distorted reconstruction (temporal aliasing). b, Reconstruction Error (RE, mean ± STD) for ESOT500-H (blue) and ESOT500-L (orange) as a function of annotation frequency. RE is substantial at low frequencies and gradually decreases when approaching 500 Hz, validating ESOT500’s ability to mitigate aliasing via high-frequency annotation. c, RE curves for representative sequences in ESOT500-L (airplane, book, cap, bike, umbrella). d, RE curves for representative sequences in ESOT500-H (bird, bottle, badminton racket, skate player, fan). e, Visual comparison of high-frequency ground truth annotations (green boxes) and sparse 20 Hz interpolated boxes (yellow) for three objects (fan, bottle, badminton racket). Interpolated boxes deviate significantly from ground truth, emphasizing the need for dense annotation to address temporal aliasing.
To quantify the impact of perception latency on event-driven perception, we leverage the STARE framework to evaluate various visual object tracking models. This evaluation centers on the ESOT500 dataset, tailored for high-dynamic, low-latency assessment, and extends to external benchmarks to verify generalizability.
To examine the real-world impact of perception latency, we further conducted an event-driven robotic ping-pong experiment.
Please refer to the paper for more details.
Impact of perception latency on event-driven trackers across datasets and hardware. a, Tracking accuracy (AUC) under STARE (-S, solid lines) vs. traditional frame-based evaluation (-F, dashed lines) on ESOT500-L. M_FETV2 stands for Mamba FETrackV2. Larger markers indicate faster inference. STARE reveals the impact of perception latency on accuracy hidden by traditional methods. Accuracy generally peaks at an optimal sampling window size under STARE. b, Same as (a) but on ESOT500-H, validating STARE’s consistency across dataset resolutions. The unimodal AUC vs. sampling window size trend persists. c, STARE performance on FE108: AUC vs. sampling window size for diverse trackers. The consistent unimodal trend across datasets (vs. ESOT500) highlights STARE’s generalizability to external benchmarks. d, STARE performance on VisEvent: AUC vs. sampling window size. Results mirror (a–c), reinforcing the unimodal trend. e, Ablation of sampling methods with HDETrack: Continuous Sampling (STARE, “Cont.”) vs. fixed-rate sampling from preprocessed frames (“Prep.”), using EventFrame (EF) and VoxelGrid (VG) representations. Continuous Sampling outperforms, leveraging event stream temporal continuity. f–h, Performance ranking reversals under STARE at 2ms (f), 20ms (g), and 50ms (h) sampling window sizes. Faster models (e.g., DiMP50, MixFormer) outperform slower counterparts with high traditional accuracy, demonstrating STARE’s bias toward throughput. i, Accuracy degradation with simulated inference latency (speed multiplier < 1 slows inference). All models show monotonic AUC drops, quantifying latency’s direct impact. j–l, STARE performance on hardware with varying configurations: (j) RTX 3090 (high-power), (k) RTX 3080Ti (lower-power, accuracy drop vs. 3090), (l) RTX 3080Ti with parallel task contention (further degradation), illustrating how different hardware configurations impact to latency.
Real-world robotic ping-pong experiment. a, Perception-action loop experimental setup. An event camera captures the ping-pong ball trajectory (event stream, red/blue points) at 1 MHz. An upstream tracker (orange) runs at tens of Hz, outputting bounding boxes (blue) to a downstream robotic policy (gray). The policy queries for target positions at 2 kHz and sends control actions to the robotic arm at 200 Hz. b, An example of a successful robot hitting back a ping-pong ball. c, Robotic ping-pong task success rate across perception models. Metrics include: ESOT500-L performance (AUC, latency-ignored vs. latency-aware), model speed (Hz, mean, min/max, STD), and success rate (successful hits / 20 trials). Underlined values indicate column maxima. MixFormer (high speed) achieves the highest success rate (7/20), while frame-based variants (e.g., MixFormer∗) or high-offline-accuracy/low-speed models (KeepTrack) show reduced success, validating latency’s critical role in real-time task.
We derive two model enhancement strategies: Asynchronous Tracking and Context-Aware Sampling.
Please refer to the paper for more details.
Model enhancement strategies: Asynchronous Tracking and Context-Aware Sampling. a, Architecture of Asynchronous Tracking. A slow, high-fidelity base model (orange) performs full inference on event segments, generating initial bounding boxes and sharing features with a fast residual model (green). The residual model recursively updates predictions using shared features and new events, producing high-frequency outputs between base model cycles, leveraging temporal continuity of event stream to boost throughput. b, Qualitative example of Context-Aware Sampling in sparse-event scenarios. Top row: Baseline model fails to localize the target (red box) as event density drops. Bottom row: Enhanced model detects sparse events, enters an inactive state, and reuses the last correct prediction (dashed green box) until dense events trigger accurate inference, preventing error accumulation. c, Context-Aware Sampling preventing target loss during prolonged inactivity. Top row: Baseline tracker accumulates errors over time and loses the target. Bottom row: Enhanced tracker uses a timer to force reactivation after prolonged inactivity, re-localizing the target before drift causes target loss, balancing efficiency and accuracy.
Quantitative evaluation of model enhancement strategies under STARE. a, Accuracy (AUC) of OSTrack-based (68) variants across sampling window sizes. Curves compare: Baseline (green), +Predictive Motion Extrapolation (+Pred, orange), +Context-Aware Sampling (+C, light blue), +Asynchronous Tracking (trained on 500 Hz annotations, yellow), and +Asynchronous Tracking (trained on 20 Hz annotations, gray). Asynchronous Tracking (500 Hz) combined with Context-Aware Sampling (+Async+C, dark blue) consistently outperforms other strategies. b, Motion dynamism vs. Asynchronous Tracking effectiveness. Blue solid line: high-dynamism score (left y-axis, negative performance gain from Predictive Motion Extrapolation, also defined as the unpredictability score) for ESOT500-L sequences. Orange dashed line: Accuracy improvement (right y-axis) from Asynchronous Tracking. Higher dynamism poses greater challenge. c, Context-Aware Sampling performance in sparse-event scenarios. Blue line: Baseline accuracy (left y-axis). Orange line: Accuracy with Context-Aware Sampling (left y-axis). Green dashed line: Sparsity Rate (right y-axis, percentage of model inactivity). Context-Aware Sampling demonstrates robustness in low-motion contexts.
The code is based on the PyTracking and other similar frameworks. These frameworks, including PyTracking, are designed for visual object tracking, which is the main task we adopted to validate STARE in our work.
To support more perception tasks and make it easier for users to integrate them into STARE, we developed a streamlined STARE framework repository. Please refer to STARE (streamlined).
The code is compatible with Linux systems equipped with NVIDIA GPUs. The software versions of the base experimental environment used for testing are:
-
Ubuntu 20.04
-
Python 3.8
-
CUDA 11.3
-
PyTorch 1.10.0
For more detailed information about Python dependencies, their versions, and other details, please refer to the exported requirement file lib/stare_conda_env.yml.
1. Download
-
Please refer to Demo if you wanna quickstart.
-
Download ESOT500 from our [Hugging Face] datasets repository. The compressed dataset files are about 13.4 GB & 25.7 GB in size, and downloading them at a speed of 3 MB/s is expected to take approximately 3.7 hours.
|-- ESOT500 |-- ESOT500-L |-- aedat4 | |-- sequence_name1.aedat4 | |-- sequence_name2.aedat4 | : : | |-- anno_t | |-- sequence_name1.txt | |-- sequence_name2.txt | : : | |-- [{FPS}_w{window}ms] # For frame-based latency-free evaluation, need a pre-slice preprocessing. : : | |-- test.txt |-- train.txt |-- [more splits].txt : : |-- ESOT500-H : : # Similar as ESOT500-L -
The aedat4 directory contains the raw event data (event stream and corresponding RGB frames), the DV and dv-python is recommended for visualization and processing in python respectively.
-
You can find the metadata file at
data/esot500_metadata.json, or download it from our dataset page in [Hugging face]. -
We also provide some checkpoint files of trackers in [Hugging face] to download.
2. Preparation for Frame-Based Latency-Free Evaluation
For frame-based latency-free evaluation, you need to perform a pre-slice preprocessing, as described in the original paper. Just run the following python command:
python [/PATH/TO/STARE]/lib/event_utils_new/esot500_preprocess.py --path_to_data [/PATH/TO/ESOT500] --fps [FPS] --window [MS]
the arguments FPS and MS should follow the chart bellow, as shown in the Table. 2 of the paper:
| 500/2 | 250/2 | 20/2 | 500/50 | 250/50 | 20/50 | 500/100 | 250/100 | 20/100 | 500/150 | 250/150 | 20/150 |
|---|
3. Preparation for STARE
To prepare data for STARE experiments, you need to do the following:
python [/PATH/TO/STARE]/lib/event_utils_new/esot500_preprocess.py --path_to_data [/PATH/TO/ESOT500] --fps 500 --window 2
ln -s [/PATH/TO/ESOT500]/500_w2ms [/PATH/TO/ESOT500]/500
As mentioned at the beginning of the Usage section, the code is based on the PyTracking and other similar frameworks.
Below are the instructions to configure and run the tracker under PyTracking.
1. Create a virtual environment and install required libraries.
You can use the requirements file lib/stare_conda_env.yml we exported to build the environment. The entire installation process takes about 0.5h to 1h, depending on the network environment.
cd [/PATH/TO/STARE]
conda env create -f ./lib/stare_conda_env.yml --verbose --debug
conda activate stare
Besides, our code is mainly built based on PyTracking, and you can also refer to lib/pytracking/INSTALL.md for detailed installation and configuration.
conda create -n stare python=3.8
conda activate stare
[pip/conda install ...]
2. Preprare the dataset.
ln -s [/PATH/TO/ESOT500] ./data/ESOT500
3. Go to the working directory of pytracking.
cd ./lib/pytracking
4. Set environment for pytracking.
python -c "from pytracking.evaluation.environment import create_default_local_file; create_default_local_file()"
python -c "from ltr.admin.environment import create_default_local_file; create_default_local_file()"
5. Modify the dataset path settings.esot500_dir in generated environment setting files.
- for training:
ltr/admin/local.py - for testing:
pytracking/evaluation/local.py - please directly place the pre-trained tracker checkpoints files in:
settings.network_path
6. Run frame-based evaluation demo.
(Experiment settings are in folder pytracking/experiments and pytracking/stream_settings)
# pre-slice the '20_w50ms'(fps=20 & windows=50ms) subset
python [/PATH/TO/STARE]/lib/event_utils_new/esot500_preprocess.py --path_to_data [/PATH/TO/ESOT500] --fps 20 --window 50
# run three trackers(atom, dimp18 and kys) for 'fps=20 & windows=50ms' settings
python pytracking/run_experiment.py exp_frame fast_test_offline
Note:
The details of fast_test_offline setting are as follows:
def fast_test_offline():
trackers = trackerlist('atom', 'default', range(1)) + \
trackerlist('dimp', 'dimp18', range(1)) + \
trackerlist('kys', 'default', range(1))
dataset = get_dataset('esot_20_50')
return trackers, dataset
range(1)meansrun_id=0, and the tracking results will be saved inpytracking/output/tracking_results/{tracker_name}/{tracker_params}_{run_id}/, e.g.pytracking/output/tracking_results/atom/default_000/- if you set
run_id=None, the tracking results will be saved inpytracking/output/tracking_results/{tracker_name}/{tracker_params}/, e.g.pytracking/output/tracking_results/atom/default/ - you can change the paths by modifying the relevant variables in
local.py
7. Run stream-based latency-aware evaluation demo.
(Experiment settings are in folder pytracking/experiments and pytracking/stream_settings.)
# prepare data for STARE experiments (If you have done this before, you can skip this step.)
python [/PATH/TO/STARE]/lib/event_utils_new/esot500_preprocess.py --path_to_data [/PATH/TO/ESOT500] --fps 500 --window 2
ln -s [/PATH/TO/ESOT500]/500_w2ms [/PATH/TO/ESOT500]/500
# run three trackers(atom, dimp18 and kys) for 'real streaming & windows=20ms' settings
python pytracking/run_experiment_streaming.py exp_stare fast_test_stare
# align the prediction with GT timestamp
python eval/streaming_eval_v3.py exp_stare fast_test_stare
The instructions given are for real-time testing on your own hardware.
If you want to reproduce the results in our paper, please refer to pytracking/stream_settings/s100.
Note:
The details of fast_test_stare setting are as follows:
trackers_fast_test = trackerlist('atom', 'default') + \
trackerlist('dimp', 'dimp18') + \
trackerlist('kys', 'default')
def fast_test_stare():
trackers = trackers_fast_test
dataset = get_dataset('esot500s')
stream_setting_id = 100 # Default streaming setting, for real-time testing on your own hardware.
stream_setting = load_stream_setting(f's{stream_setting_id}')
return trackers, dataset, stream_setting
- currently, default
run_idisNoneandstream_setting_id=100, the tracking results will eventually be saved inpytracking/output/tracking_results_rt_final/{tracker_name}/{tracker_params}/{stream_setting_id}/, e.g.pytracking/output/tracking_results_rt_final/atom/default/100/ - if you set
run_id=0, the tracking results will be saved inpytracking/output/tracking_results_rt_final/{tracker_name}/{tracker_params}_{run_id}/{stream_setting_id}/, e.g.pytracking/output/tracking_results_rt_final/atom/default_000/100/ - you can change the paths by modifying the relevant variables in
local.py
8. To evaluate the results, use pytracking/analysis/analysis_results_demo.ipynb.
You can also refer to it to write the analysis scripts of your own style.
Note: For tracker enhancement, please see the follow-up section.
These trackers use a similar framework to PyTracking, but are not fully integrated into it. Here we take OSTrack and pred_OSTrack as examples to illustrate the usage, including that of the enhancement.
1. Go to the working directory.
cd lib/sotas/[OSTrack or pred_OSTrack]
2. Activate the virtual environment.
conda activate stare
3. Install the missing libraries.
(If you use the requirements file we provide, you can skip this step.)
[pip/conda install ...]
In fact, if you have PyTracking installed, you can directly find and install the missing packages according to the error by running the subsequent scripts. Only a few dependencies are different, and it takes a few minutes to install.
4. Set environment for the tracker.
python -c "from lib.test.evaluation.environment import create_default_local_file; create_default_local_file()"
python -c "from lib.train.admin.environment import create_default_local_file; create_default_local_file()"
5. Modify the dataset path settings.esot500_dir in generated environment setting files.
- for training:
lib/train/admin/local.py - for testing:
lib/test/evaluation/local.py - please place the pre-trained tracker checkpoints in:
settings.network_path
6. Run frame-based evaluation demo.
python tracking/test.py ostrack esot500mix --dataset_name esot_20_50
Similar as Trackers under PyTracking, the results are by default in the folders lib/test/tracking_results.
Note:
- This doesn't work for pred_OSTrack.
- The available
dataset_namecan refer to the experiment results listed in our paper.
7. Run stream-based latency-aware evaluation demo without predictive module.
python tracking/test_streaming.py ostrack esot500_baseline s100 --dataset_name esot500s [--runid 66 --use_aas]
python tracking/streaming_eval_v4.py ostrack esot500_baseline s100 --dataset_name esot500s [--runid 66]
Similar as Trackers under PyTracking, the results are by default in the folders lib/test/tracking_results_rt_final.
Note:
--use_aasoption is currently only available to OSTrack and pred_OSTrack.- you can change the relevant parameters in
streaming_eval_v4.pyto make it fit your own style
8. Run stream-based latency-aware evaluation demo with predictive module.
# under pred_OSTrack dir
python tracking/test_streaming.py ostrack pred_esot500_4step s100 --dataset_name esot500s --pred_next 1 [--runid 66 --use_aas]
python tracking/streaming_predspeed.py ostrack pred_esot500_4step s100 [--runid 66]
Similar as Trackers under PyTracking, the results are by default in the folders lib/test/tracking_results_rt_final.
Note:
--pred_next 1option is currently only available to pred_OSTrack.- you can change the relevant parameters in
streaming_predspeed.pyto make it fit your own style
9. To evaluate the results, use lib/test/analysis/analysis_results_demo.ipynb.
You can also refer to it to write the analysis scripts of your own style.
Note: The entire process of downloading and running takes approximately 10 minutes.
1. Preparation of Tracker Checkpoints and Sample Data
- Download the
dimp18tracker checkpoints and place them in thesettings.network_pathdirectory (as detailed in Usage/trackers-under-pytracking). - Download the demo sequence
airplane5(which belongs to the test split of the ESOT500 dataset) and place all the downloaded directories/files in the../data/ESOT500directory (as detailed in Usage/trackers-under-pytracking) .
2. Go to the working directory of pytracking.
cd lib/pytracking
3. Streaming-Based Latency-Aware Tracking
- Then run
python pytracking/run_tracker_streaming.pyto track the target object usingdimp18.
4. Visualization
- Then run
python pytracking/visualize_stare_result.pyto visualize the tracking results.
We also provide a bash script to reproduce almost all the results mentioned in the paper. To use the script, you need first build the stare conda environment and download the ESOT500 dataset and all the tracker checkpoints we provide, then execute the following command:
cd [your/path/to/STARE]
export ESOT500_DIR='/your/path/to/ESOT500'
export STARE_CKPTS_DIR='/your/path/to/stare_ckpts'
bash lib/stare.sh 2>&1 | tee stare.log
After this, you can use the corresponding Jupyter Notebook (Pytracking tracker:/pytracking/analysis/analysis_results_all.ipynb or SOTAs tracker:/lib/test/analysis/analysis_results_demo.ipynb) to evaluate the results.
If you encounter any issues while using our code or dataset, please feel free to contact us.
- The released code is under GPL-3.0 license following the PyTracking.
- The released dataset is under CC-BY 4.0 license.
- The benchmark is built on top of the great PyTracking library.
- Thanks for the great works including Stark, MixFormer, OSTrack and Event-tracking.






