GitHub - rafi-kamal/Aggregate-Spatio-Textual-Query

We've proposed the flexible group spatial keyword query and algorithms to process three variants of the query in the spatial textual domain:

The group nearest neighbor with keywords query, which finds the data object that optimizes the aggregate cost function for the whole group Q of size n query objects
The subgroup nearest neighbor with keywords query, which finds the optimal subgroup of query objects and the data object that optimizes the aggregate cost function for a given subgroup size m ( m≤n ), and
The multiple subgroup nearest neighbor with keywords query, which finds optimal subgroups and corresponding data objects for each of the subgroup sizes in the range [m, n].

We've designed query processing algorithms based on branch-and-bound and best-first paradigms and conducted extensive experiments with two real datasets to show the efficiency of the proposed algorithms.

The publication is available at Springer and at arxiv.org.

Input File Formats

Location File: Each line contains a location of a data object. Example:
```
 0,-81.804885,24.550558
 1,-73.985495,40.740067
 2,-71.047843,42.33719
 3,-0.384016,39.474441
 4,-109.4995,38.737861
 ...
```
The first column contains the object ID and the second and third column contains the latitude and longitude of the data object. In our dataset, loc.txt is the location file.
Keyword file: Each line contains the keywords of a data object. Example:
```
 0,0,1,2,3,4,5,6,7,8,9
 1,10,11,12
 2,13,14,15
 3,16,17,18,19,20
 4,21,22
```
The first column contains the object ID and the subsequent columns contains the keywords. In our dataset, words.txt is the keyword file.

Running the Program

Run WeightCompute to generate a weighted keyword file.

Weightcompute words.txt wwrods.txt

This will create a file named wwrods.txt where each of the keyword would have a weight (generated by Language Model). It will also print the maximum weight and the number of unique keywords in the keyword file (which you might need later to change the parameter file).
The source file utils.Parameters continas some variables specific to the dataset. Edit these parameters if necessary.
Run evaluate.bash file with the following parameters:
- Input directory (which should contain the loc.txt and wwords.txt file)
- Output directory
- Aggregate Function Name (MAX for max and SUM and sum)
Example: ./evaluate.bash ../annk-data/yelp/ ~/Dropbox/Thesis/Results/yelp/max MAX. This will create a set of output files (with .dat extension) in the output directory. Each file has three columns. The first one is the query parameter, second one is the time or I/O spent on the proposed algorithm and the third one is the time or I/O spent on the baseline algorithm.

Inside the evaluate.bash file, we generate query files for GNNK (gnnk.txt) and SGNNK (sgnnk.txt) using test.QueryGenerator and then run test.Main to generate CPU time and I/O spent in the experiment. You might need to edit evaluate.bash.
Run generate-graph.bash. The input and output directories are hardcoded inside the file (Sorry!). This runs the plot.gpl over the .dat files and generates .tex and .eps files as the output. You can use these files in latex for showing graphs.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.idea		.idea
data		data
dataset-parser		dataset-parser
irtree		irtree
lib		lib
src		src
.classpath		.classpath
.gitignore		.gitignore
.project		.project
ANNK.iml		ANNK.iml
README.md		README.md
buildtree.bash		buildtree.bash
evaluate.bash		evaluate.bash
experiment.bash		experiment.bash
generate-graph.bash		generate-graph.bash
plot-dropping.gpl		plot-dropping.gpl
plot.gpl		plot.gpl

rafi-kamal/Aggregate-Spatio-Textual-Query

Folders and files

Latest commit

History

Repository files navigation

Input File Formats

Running the Program

About

Resources

Stars

Watchers

Forks

Languages