Some experiments in grafting a Berkeley Packet Filter (BPF) onto a Column-oriented Database and providing an SQL like interface to the user.
Of course this results in some constraints.
Treat the data set as a spreadsheet where each column is an integer metric (use a map for strings) and represented by a separate file. Each row is a record and its metrics are found at the same location in each file, thus the files all have the same length and record count.
Preflight
sudo apt-get install build-essential
git clone https://github.com/jimdigriz/bpf-sql.git
cd bpf-sql
ln -f -s programs/filter-by-hour.c program.c
make help
make
Data Preparation
The on-disk format used for each column file is just a raw list of 64bit signed integers stored in big-endian format, so when converting your own data set you can use a tab separated input and some Perl to generate each metric file (named metricX.bin):
cat input_data | perl -ane 'foreach (0..$#F) { open($f[$_], ">", "metric".$_.".bin") if ($.==1); print {$f[$_]} pack "q>", $F[$_] }'
Generating Fake Data
Random
The following will generate you a column of data:
SIZE=8 # 1=8bit, 2=16bit, 4=32bit, 8=64bit
TYPE=d # d=integer, u=unsigned int
NREC=10**8 # number of records you want (100m)
od -v -An -w$SIZE -t $TYPE$SIZE -N $((SIZE*NREC)) /dev/urandom
Engine
The engine closely resembles the BPF filtering engine described in the BSD BPF manpage which is closely modelled after the Steven McCanne's and Van Jacobson's BPF paper.
Element Description
A 64 bit wide accumulator
X 64 bit wide X register
M[] BPF_MEMWORDS x 64 bit wide misc registers aka "scratch
memory store", addressable from 0 BPF_MEMWORDS-1
C[] NCOL x 64bit wide read-only column registers that
have the current row record data (akin to BPF's P[])
G Non-accessible global storage containing results
R[] (nkeys+width) x 64bit wide results registers (see below)
The following instructions have had their action slightly amended:
BPF_LD+BPF_ABS A <- C[k]
BPF_LD+BPF_IND A <- C[X + k]
BPF_RET Return code unused, for now always zero
N.B. all ALU operations are as 64bit signed integer math. Any unsigned 64bit integers operations as signed 64bit integer is usually safe in that you should be able to cast them back correctly, though only as long as you use the arithmetic operators wisely (ie. avoid JMP_JG[TE] and BPF_NEG)
The following load/store/find BPF_REC (record) instructions have been added:
BPF_LD+BPF_REC A <- R[k]
BPF_ST+BPF_REC R[k] <- A
BPF_MISC+BPF_LDR R <- G
BPF_MISC+BPF_STR G <- R
R Register Usage
The R[] register is special as it stores records loaded in and from G, and is the set of registers you use to get data into G. The first nkeys registers group together to form an unique tuple that is used to reference a single record whilst the following remaining width registers make up the data portion.
For example, if nkeys is 2 and width is 3, then the following plays out as such:
- set the registers
R[0]=7andR[1]=2 - call
BPF_MISC+BPF_LDRto fetch the record, as it does not exist, it is created and initialised to zero - set
R[2]=-10, setR[3]=3andR[4]=18 - call
BPF_MISC+BPF_STRto store the record, as it exists, it is updated - call
BPF_RET
This will in the output create the resulting row:
7,2,-10,3,18
Where 7,2 make up your key, and -10,3,18 is the result data associated to it.
Notes
- if
BPF_MISC+BPF_LDRcannot find the record, then the data portion ofRis initialised to zero - if you call
BPF_RETbefore callingBPF_MISC+BPF_STR, your changes are discarded - records cannot be deleted once created
- you can create and update several records for a single run of your program over
C[]
TODO
In roughly order of importance:
- only create records in
GwhenBPF_MISC+BPF_STRis called, not whenBPF_MISC+BPF_LDRis - would it be helpful to be able to distingush between record existing or not, and how that would be done
- I need a way better data structure
- tool to generate mock data to experiment with
- add stepping debugging support
- improve the profiling visibility
- frequency analysis
- element distinctness/uniqueness
- intersection analysis (Venn)
- INT32-C. Ensure that operations on signed integers do not result in overflow - maybe look to OS X's checkint(3)
- alternative engine primitives, BPF not well suited due to all the indirect pointer dereferencing everywhere maybe?
- SQL to BPF converter
- BPF checker to simplify the engine
- BPF optimiser
- steroids:
- What Every Programmer Should Know About Memory (and What Every Computer Scientist Should Know About Floating Point Arithmetic)
malloc()tuningposix_madvise()- GCC Optimization's
- Profile Guided Optimisations (PGO) - using
-fprofile-generateand-fprofile-use __builtin_prefetch- Auto-vectorization with gcc 4.7
- investigate Blosc and its c-blosc library
- support an approximation 'turbo' Zipfian mode and use sketches: