-
Notifications
You must be signed in to change notification settings - Fork 8
Semantic Query Support
Native semantic query support is missing in mainstream file systems. Users who want to do semantic query have to index the file system metadata or extended attributes by themselves.
For example, if you want to do a query "give me a file list that was created in last week", you may have to do brute-force search on the whole file system. Think if the file system supports native semantic query, you may just do query as "search whole_fs 'range: ctime in LAST_WEEK'". A semantic query engine do this search by distributed searching in 'ctime' indexs.
Native semantic query in PomegranateFS has the following features:
- Stream indexer which can by re-configured at any time;
- User defined indexer which can index many different standard or extended attributes;
- Some predefined analysis operators which provides statistical information;
- Integrated framework built with file system to index files automatically;
There is no standard on how to do semantic query in file systems. Thus, we have to build our 'standard'. Basically, we decide to reuse POSIX interface of extend attributes. In detail, we build a special (operational) namespace "pfs" in extend attributes. Operations in this namespace are transformed to semantic queries automatically.
Class | Column | Operation | Other region | Note or Example |
---|---|---|---|---|
native | [0-5] | read | .offset.len | If len == -1, read whole content. |
write | [.len] | Length is optional. | ||
lookup |
{return column info} triple: "itbid.len.offset" |
|||
dt | ignore | create | .type.where.priority.local_path | |
cat | ||||
clear | ||||
branch | ignore | create | .name.tag.level.op_list | |
delete | .name | |||
tag | [0-5] | set | .B.kv_list | |
delete | .B.key | |||
update | .B.key.value | |||
test | .B.key | |||
search | .B.dbname.prefix.search_expr |
Some of the atomic placeholder (B, search_expr, etc) are defined in the following table:
Name | Definition | Examples |
---|---|---|
op_list |
filter:id:rid:[l|r]:reg sum:id:rid:[l|r]:reg:[left|right|all|match] count:id:rid:[l|r]:reg:[left|right|all|match] avg:id:rid:[l|r]:reg:[left|right|all|match] max:id:rid:[l|r]:reg:[left|right|all|match] min:id:rid:[l|r]:reg:[left|right|all|match] knn:id:rid:[l|r]:reg:[left|right|all|match]:[linear|xlinear]:center:+/-distance groupby:id:rid:[l|r]:reg:[left|right|all|match]:sum/avg/max/min/count indexer:id:rid:[l|r]:[plain|bdb]:dbname:prefix |
filter:1:0:l:.*; count:2:0:r:.*:all; avg:3:1:l:.*:all; sum:4:1:r:.*:all; max:5:2:l:.*:all; min:6:2:r:.*:all; knn:7:3:l:.*:linear:100:+-10; groupby:8:3:r:.*:all:sum/avg/max/min; indexer:9:4:l:bdb:DB:00; |
kv_list | key=value;key2=value2;... | type=png;@ctime=10000; |
B | B[:branch_name[:key1[:key2[...]]]] | B:hello_branch:type:@ctime |
search_expr |
[r|p]: key [=<>] value [&|] key2 [=<>] value2 r => range query; p => point query |
r: type=png & @ctime > 100 |
Analyzing, Indexing and Searching Streams of File System triggered Events