We've proposed the flexible group spatial keyword query and algorithms to process three variants of the query in the spatial textual domain:
- The group nearest neighbor with keywords query, which finds the data object that optimizes the aggregate cost function for the whole group
Q
of sizen
query objects - The subgroup nearest neighbor with keywords query, which finds the optimal subgroup of query objects and the data object that optimizes the aggregate cost function for a given subgroup size
m ( m≤n )
, and - The multiple subgroup nearest neighbor with keywords query, which finds optimal subgroups and corresponding data objects for each of the subgroup sizes in the range
[m, n]
.
We've designed query processing algorithms based on branch-and-bound and best-first paradigms and conducted extensive experiments with two real datasets to show the efficiency of the proposed algorithms.
The publication is available at Springer and at arxiv.org.
-
Location File: Each line contains a location of a data object. Example:
0,-81.804885,24.550558 1,-73.985495,40.740067 2,-71.047843,42.33719 3,-0.384016,39.474441 4,-109.4995,38.737861 ...
The first column contains the object ID and the second and third column contains the latitude and longitude of the data object. In our dataset,
loc.txt
is the location file. -
Keyword file: Each line contains the keywords of a data object. Example:
0,0,1,2,3,4,5,6,7,8,9 1,10,11,12 2,13,14,15 3,16,17,18,19,20 4,21,22
The first column contains the object ID and the subsequent columns contains the keywords. In our dataset,
words.txt
is the keyword file.
-
Run
WeightCompute
to generate a weighted keyword file.Weightcompute words.txt wwrods.txt
This will create a file named
wwrods.txt
where each of the keyword would have a weight (generated by Language Model). It will also print the maximum weight and the number of unique keywords in the keyword file (which you might need later to change the parameter file). -
The source file
utils.Parameters
continas some variables specific to the dataset. Edit these parameters if necessary. -
Run
evaluate.bash
file with the following parameters:- Input directory (which should contain the
loc.txt
andwwords.txt
file) - Output directory
- Aggregate Function Name (
MAX
for max andSUM
and sum)
Example:
./evaluate.bash ../annk-data/yelp/ ~/Dropbox/Thesis/Results/yelp/max MAX
. This will create a set of output files (with.dat
extension) in the output directory. Each file has three columns. The first one is the query parameter, second one is the time or I/O spent on the proposed algorithm and the third one is the time or I/O spent on the baseline algorithm.Inside the
evaluate.bash
file, we generate query files for GNNK (gnnk.txt
) and SGNNK (sgnnk.txt
) usingtest.QueryGenerator
and then runtest.Main
to generate CPU time and I/O spent in the experiment. You might need to editevaluate.bash
. - Input directory (which should contain the
-
Run
generate-graph.bash
. The input and output directories are hardcoded inside the file (Sorry!). This runs theplot.gpl
over the.dat
files and generates.tex
and.eps
files as the output. You can use these files in latex for showing graphs.