Semantic Query Support

Features

Native semantic query support is missing in mainstream file systems. Users who want to do semantic query have to index the file system metadata or extended attributes by themselves.

For example, if you want to do a query "give me a file list that was created in last week", you may have to do brute-force search on the whole file system. Think if the file system supports native semantic query, you may just do query as "search whole_fs 'range: ctime in LAST_WEEK'". A semantic query engine do this search by distributed searching in 'ctime' indexs.

Native semantic query in PomegranateFS has the following features:

Stream indexer which can by re-configured at any time;
User defined indexer which can index many different standard or extended attributes;
Some predefined analysis operators which provides statistical information;
Integrated framework built with file system to index files automatically;

Query Interface

There is no standard on how to do semantic query in file systems. Thus, we have to build our 'standard'. Basically, we decide to reuse POSIX interface of extend attributes. In detail, we build a special (operational) namespace "pfs" in extend attributes. Operations in this namespace are transformed to semantic queries automatically.

Class	Column	Operation	Other region	Note or Example
native	[0-5]	read	.offset.len	If len == -1, read whole content.
		write	[.len]	Length is optional.
		lookup		{return column info} triple: "itbid.len.offset"
dt	ignore	create	.type.where.priority.local_path
		cat
		clear
branch	ignore	create	.name.tag.level.op_list
		delete	.name
tag	[0-5]	set	.B.kv_list
		delete	.B.key
		update	.B.key.value
		test	.B.key
		search	.B.dbname.prefix.search_expr

Some of the atomic placeholder (B, search_expr, etc) are defined in the following table:

Name	Definition	Examples
op_list	filter:id:rid:[l\|r]:reg sum:id:rid:[l\|r]:reg:[left\|right\|all\|match] count:id:rid:[l\|r]:reg:[left\|right\|all\|match] avg:id:rid:[l\|r]:reg:[left\|right\|all\|match] max:id:rid:[l\|r]:reg:[left\|right\|all\|match] min:id:rid:[l\|r]:reg:[left\|right\|all\|match] knn:id:rid:[l\|r]:reg:[left\|right\|all\|match]:[linear\|xlinear]:center:+/-distance groupby:id:rid:[l\|r]:reg:[left\|right\|all\|match]:sum/avg/max/min/count indexer:id:rid:[l\|r]:[plain\|bdb]:dbname:prefix	filter:1:0:l:.; count:2:0:r:.:all; avg:3:1:l:.:all; sum:4:1:r:.:all; max:5:2:l:.:all; min:6:2:r:.:all; knn:7:3:l:.:linear:100:+-10; groupby:8:3:r:.:all:sum/avg/max/min; indexer:9:4:l:bdb:DB:00;
kv_list	key=value;key2=value2;...	type=png;@ctime=10000;
B	B[:branch_name[:key1[:key2[...]]]]	B:hello_branch:type:@ctime
search_expr	[r\|p]: key [=<>] value [&\|] key2 [=<>] value2 r => range query; p => point query	r: type=png & @ctime > 100

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantic Query Support

Features

Query Interface

How It Works

Gather Information

Process Events

Index Events

Reference

Clone this wiki locally