Skip to content

rafi-kamal/Aggregate-Spatio-Textual-Query

Repository files navigation

We've proposed the flexible group spatial keyword query and algorithms to process three variants of the query in the spatial textual domain:

  1. The group nearest neighbor with keywords query, which finds the data object that optimizes the aggregate cost function for the whole group Q of size n query objects
  2. The subgroup nearest neighbor with keywords query, which finds the optimal subgroup of query objects and the data object that optimizes the aggregate cost function for a given subgroup size m ( m≤n ), and
  3. The multiple subgroup nearest neighbor with keywords query, which finds optimal subgroups and corresponding data objects for each of the subgroup sizes in the range [m, n].

We've designed query processing algorithms based on branch-and-bound and best-first paradigms and conducted extensive experiments with two real datasets to show the efficiency of the proposed algorithms.

The publication is available at Springer and at arxiv.org.

Input File Formats

  • Location File: Each line contains a location of a data object. Example:

     0,-81.804885,24.550558
     1,-73.985495,40.740067
     2,-71.047843,42.33719
     3,-0.384016,39.474441
     4,-109.4995,38.737861
     ...
    

    The first column contains the object ID and the second and third column contains the latitude and longitude of the data object. In our dataset, loc.txt is the location file.

  • Keyword file: Each line contains the keywords of a data object. Example:

     0,0,1,2,3,4,5,6,7,8,9
     1,10,11,12
     2,13,14,15
     3,16,17,18,19,20
     4,21,22
    

    The first column contains the object ID and the subsequent columns contains the keywords. In our dataset, words.txt is the keyword file.

Running the Program

  • Run WeightCompute to generate a weighted keyword file.

    Weightcompute words.txt wwrods.txt

    This will create a file named wwrods.txt where each of the keyword would have a weight (generated by Language Model). It will also print the maximum weight and the number of unique keywords in the keyword file (which you might need later to change the parameter file).

  • The source file utils.Parameters continas some variables specific to the dataset. Edit these parameters if necessary.

  • Run evaluate.bash file with the following parameters:

    • Input directory (which should contain the loc.txt and wwords.txt file)
    • Output directory
    • Aggregate Function Name (MAX for max and SUM and sum)

    Example: ./evaluate.bash ../annk-data/yelp/ ~/Dropbox/Thesis/Results/yelp/max MAX. This will create a set of output files (with .dat extension) in the output directory. Each file has three columns. The first one is the query parameter, second one is the time or I/O spent on the proposed algorithm and the third one is the time or I/O spent on the baseline algorithm.

    Inside the evaluate.bash file, we generate query files for GNNK (gnnk.txt) and SGNNK (sgnnk.txt) using test.QueryGenerator and then run test.Main to generate CPU time and I/O spent in the experiment. You might need to edit evaluate.bash.

  • Run generate-graph.bash. The input and output directories are hardcoded inside the file (Sorry!). This runs the plot.gpl over the .dat files and generates .tex and .eps files as the output. You can use these files in latex for showing graphs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages