Skip to content
Meylis Matiyev edited this page Dec 12, 2022 · 3 revisions

Introduction

Applications that run on unstructured or semi-structured data spend considerable amount of their execution time parsing the data. Sparser strives to address this bottleneck by introducing the concept of filtering data before parsing.

There are two key observations that reinforce the idea of filtering:

  • High selectivity: in Sparser's paper, authors show that queries most of the time have high selectivity. Not having to consider a large portion of data that you're querying can truly bring some performance gain.
  • Modern hardware: vectorized instructions of modern hardware can be utilized to make filtering/parsing faster. This observation is irrelevant to environments like JMV where (at least currently) you don't have low-level control of hardware.

Limitations

Limitations on predicate support:

  • Doesn't support equality for data types which can be encoded in different ways. For example, in JSON integer equality is not supported if an integer can be both "3.4" and "34e-1".
  • Doesn't support inequality for string values(???).
  • Key-Value Match filter is only valid for data formats such as JSON where keys explicitly exist in the record.