|
|
@@ -0,0 +1,398 @@ |
|
|
<?xml version="1.0" encoding="UTF-8"?> |
|
|
<!-- |
|
|
~ Hibernate, Relational Persistence for Idiomatic Java |
|
|
~ |
|
|
~ Copyright (c) 2010, Red Hat, Inc. and/or its affiliates or third-party contributors as |
|
|
~ indicated by the @author tags or express copyright attribution |
|
|
~ statements applied by the authors. All third-party contributions are |
|
|
~ distributed under license by Red Hat, Inc. |
|
|
~ |
|
|
~ This copyrighted material is made available to anyone wishing to use, modify, |
|
|
~ copy, or redistribute it subject to the terms and conditions of the GNU |
|
|
~ Lesser General Public License, as published by the Free Software Foundation. |
|
|
~ |
|
|
~ This program is distributed in the hope that it will be useful, |
|
|
~ but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY |
|
|
~ or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License |
|
|
~ for more details. |
|
|
~ |
|
|
~ You should have received a copy of the GNU Lesser General Public License |
|
|
~ along with this distribution; if not, write to: |
|
|
~ Free Software Foundation, Inc. |
|
|
~ 51 Franklin Street, Fifth Floor |
|
|
~ Boston, MA 02110-1301 USA |
|
|
--> |
|
|
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" |
|
|
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [ |
|
|
<!ENTITY % BOOK_ENTITIES SYSTEM "../hsearch.ent"> |
|
|
%BOOK_ENTITIES; |
|
|
]> |
|
|
<chapter id="spatial"> |
|
|
<title>Spatial</title> |
|
|
|
|
|
<para>The spatial functionalities of Hibernate enable indexed entities with |
|
|
latitude and longitude fields to be spatialy indexed then searched and |
|
|
filtered by defining a search discus, providing a center and a radius.</para> |
|
|
<para>Typical usage is to find nearby hotels (between 500m and 5 km) in a 5 |
|
|
millions worldwide data set.</para> |
|
|
|
|
|
<section id="spatial-goals"> |
|
|
<title>Goals</title> |
|
|
|
|
|
<para>The spatial support of Hibernate Search has a few goals :</para> |
|
|
|
|
|
<itemizedlist> |
|
|
<listitem> |
|
|
<para>Enable spatial search on entities (find entities within x km |
|
|
from location (latitude, longitude) on earth)</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>Provide simple method for spatial index via annotations</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>Provide simple methods for querying (QueryBuilder + DSL)</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>Hide geographical complexity</para> |
|
|
</listitem> |
|
|
</itemizedlist> |
|
|
</section> |
|
|
|
|
|
<section id="spatial-strategy"> |
|
|
<title>Choosing an indexing strategy</title> |
|
|
|
|
|
<para>There are two strategy you can choose in HS spatial to index the |
|
|
latitude,longitude) properties of your entities :</para> |
|
|
|
|
|
<itemizedlist> |
|
|
<listitem> |
|
|
<para>like numbers formatted for range queries</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>in a grid (Quad-Tree) for two stage spatial queries</para> |
|
|
</listitem> |
|
|
</itemizedlist> |
|
|
|
|
|
<section id="spatial-strategy-range"> |
|
|
<title>Double range queries</title> |
|
|
|
|
|
<para>Pros :</para> |
|
|
<itemizedlist> |
|
|
<listitem> |
|
|
<para>Quick on small data sets (< 100k entities)</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>Straigh forward to debug/analyze</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>Index size is moderate</para> |
|
|
</listitem> |
|
|
</itemizedlist> |
|
|
|
|
|
<para>Cons :</para> |
|
|
<itemizedlist> |
|
|
<listitem> |
|
|
<para>Poor performance on large data sets</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>Poor performance on worldwide data set (when data on US |
|
|
/ Europe / Asia collide because of same latitude)</para> |
|
|
</listitem> |
|
|
</itemizedlist> |
|
|
</section> |
|
|
|
|
|
<section id="spatial-strategy-grid"> |
|
|
<title>Two stages spatial queries on grid</title> |
|
|
|
|
|
<para>Pros :</para> |
|
|
<itemizedlist> |
|
|
<listitem> |
|
|
<para>Good performance even with large data sets</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>World wide data distribution independant</para> |
|
|
</listitem> |
|
|
</itemizedlist> |
|
|
|
|
|
<para>Cons :</para> |
|
|
<itemizedlist> |
|
|
<listitem> |
|
|
<para>Index size is big (de normalization)</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>Index size is big (de normalization)</para> |
|
|
</listitem> |
|
|
</itemizedlist> |
|
|
|
|
|
</section> |
|
|
</section> |
|
|
|
|
|
<section id="spatial-range-queries"> |
|
|
<title>Range query indexing and querying</title> |
|
|
|
|
|
<para>To index your entities for range querying you :</para> |
|
|
<itemizedlist> |
|
|
<listitem> |
|
|
<para>have two double properties latitude and longitude marked with @Field</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>implement the Coordinates interface </para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>add the @Spatial annotation at class level</para> |
|
|
</listitem> |
|
|
</itemizedlist> |
|
|
|
|
|
<example> |
|
|
<title>Coordinates interface</title> |
|
|
|
|
|
<programlisting language="JAVA" role="JAVA">public interface Coordinates { |
|
|
double getLatitude(); |
|
|
double getLongitude(); |
|
|
} |
|
|
</programlisting> |
|
|
</example> |
|
|
|
|
|
<example> |
|
|
<title>SimpleHotel class sample</title> |
|
|
|
|
|
<programlisting language="JAVA" role="JAVA">@Entity |
|
|
@Indexed |
|
|
@Spatial |
|
|
public class SimpleHotel implements Coordinates { |
|
|
@Field |
|
|
double latitude; |
|
|
@Field |
|
|
double longitude; |
|
|
[..]</programlisting> |
|
|
</example> |
|
|
|
|
|
<para>To query your entities, you :</para> |
|
|
<itemizedlist> |
|
|
<listitem> |
|
|
<para>get a queryBuilder for your entity</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>pass it to SpatialQueryBuilder.buildSimpleSpatialQuery() |
|
|
with your search center and radius</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>eventually combine the resulting Qiuery with other filters</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>call the createFullTextQuery() and use the results</para> |
|
|
</listitem> |
|
|
</itemizedlist> |
|
|
|
|
|
<example> |
|
|
<title>SimpleHotel class sample</title> |
|
|
|
|
|
<programlisting language="JAVA" role="JAVA">QueryBuilder queryBuilder = fullTextSession.getSearchFactory().buildQueryBuilder().forEntity( SimpleHotel.class ).get(); |
|
|
|
|
|
org.apache.lucene.search.Query luceneQuery = SpatialQueryBuilder.buildSimpleSpatialQuery( |
|
|
centerLatitude, centerLongitude, radius, |
|
|
queryBuilder, |
|
|
SimpleHotel.class |
|
|
);</programlisting> |
|
|
</example> |
|
|
|
|
|
<para>Sample code is at SpatialIndexingTest.testSimpleSpatialAnnotationOnClassLevel() |
|
|
and in SimpleHotel class in tests.</para> |
|
|
|
|
|
</section> |
|
|
|
|
|
<section id="spatial-two-stages-grid"> |
|
|
<title>Two stages spatial queries with grid</title> |
|
|
|
|
|
<para>To index your entities you :</para> |
|
|
<itemizedlist> |
|
|
<listitem> |
|
|
<para>implement the Coordinates interface </para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>add the @Spatial annotation at class level with a name for referencing the grid |
|
|
field in the index and with gridmode set to true : @Spatial(name = "location", gridMode = true)</para> |
|
|
</listitem> |
|
|
</itemizedlist> |
|
|
|
|
|
<example> |
|
|
<title>Querying range queries indexed entities</title> |
|
|
|
|
|
<programlisting language="JAVA" role="JAVA">@Entity |
|
|
@Indexed |
|
|
@Spatial(name = "location", gridMode = true) |
|
|
public class Hotel implements Coordinates { |
|
|
double latitude; |
|
|
double longitude; |
|
|
[...] |
|
|
</programlisting> |
|
|
</example> |
|
|
|
|
|
<para>To query your entities, you :</para> |
|
|
<itemizedlist> |
|
|
<listitem> |
|
|
<para>call SpatialQueryBuilder.buildSpatialQuery with your center, radius and the name |
|
|
of the grid fields (location passed above)</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>eventually combine the resulting Qiuery with other filters</para> |
|
|
</listitem> |
|
|
<listitem> |
|
|
<para>call the createFullTextQuery() and use the results</para> |
|
|
</listitem> |
|
|
</itemizedlist> |
|
|
|
|
|
<example> |
|
|
<title>Querying grid indexed entities</title> |
|
|
|
|
|
<programlisting language="JAVA" role="JAVA"> org.apache.lucene.search.Query luceneQuery = SpatialQueryBuilder.buildSpatialQuery( |
|
|
centerLatitude, |
|
|
centerLongitude, |
|
|
50, |
|
|
"location" |
|
|
); |
|
|
org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery( luceneQuery, Hotel.class ); |
|
|
List results = hibQuery.list();</programlisting> |
|
|
</example> |
|
|
|
|
|
<para>Sample code is at SpatialIndexingTest.testSpatialAnnotationOnClassLevel() and in Hotel class.</para> |
|
|
|
|
|
</section> |
|
|
|
|
|
<section id="spatial-behind -curtain"> |
|
|
<title>Grid - Behind the curtain</title> |
|
|
|
|
|
<para>Here comes the gory details of the grid based indexation.</para> |
|
|
|
|
|
<section> |
|
|
<title>At indexation level</title> |
|
|
|
|
|
<para>When Hibernate search index the entity annotated with @Spatial, it instantiate a SpatialFieldBridge |
|
|
to transform the latitude and longitude fields accessed via the Coordiantes interface to the multiple index |
|
|
fields stored in the Lucene index.</para> |
|
|
|
|
|
<para>Principle of the spatial index : the spatial index used in Hibernate Search is a QuadTree type |
|
|
(http://en.wikipedia.org/wiki/Quadtree).</para> |
|
|
|
|
|
<para>To make computation in a flat coordinates system the latitude and longitude field values will be projected |
|
|
with a sinusoidal projection (http://en.wikipedia.org/wiki/Sinusoidal_projection). |
|
|
Origin values space is :</para> |
|
|
<para>[-90 -> +90],]-180 -> 180]</para> |
|
|
<para>for latitude,longitude coordinates and projected space is :</para> |
|
|
<para>]-pi -> +pi],[-pi/2 -> +pi/2]</para> |
|
|
<para>for cartesian x,y coordinates (beware of fields order inversion : x is longitude and y is latitude).</para> |
|
|
|
|
|
<para>The index is divided into n levels labeled from 0 to n-1.</para> |
|
|
|
|
|
<para>At the 0 level the projected space is the whole earth. |
|
|
At the 1 level the projected space is devided into 4 rectangles ( called boxes as in bouding box):</para> |
|
|
|
|
|
<para>[-pi,-pi/2]->[0,0], [-pi,0]->[0,+pi/2], [0,-pi/2]->[+pi,0] and [0,0]->[+pi,+pi/2]</para> |
|
|
|
|
|
<para>At level n+1 each box of level n is divided into 4 new boxes and so on. The numbers of boxes at a |
|
|
given level is 4^n.</para> |
|
|
|
|
|
<para>Each box is given an id, in this format : [Box index on the X axis]|[Box index on the Y axis] |
|
|
To calculate the index of a box on an axis we divide the axis range in 2^n slots and find the slot the box |
|
|
belongs to. |
|
|
At the n level the indexes on an axis are from -(2^n)/2 to (2^n)/2. |
|
|
For instance, the 5th level has 4^5 = 1024 boxes with 32 indexes on each axis (32x32 is 1024) and the box of |
|
|
Id "0|8" is covering the [0,8/32*pi/2]->[1/32*pi,9/32*pi/2] rectangle is projected space.</para> |
|
|
|
|
|
<para>Beware ! The boxes are rectangles in projected space but the related area on earth is not !</para> |
|
|
|
|
|
<para>Now that we have all these boxes at all these levels will be indexing points "into" them.</para> |
|
|
|
|
|
<para>For a point (lat,long) we calculate its projection (x,y) and then we calculate for each level of the |
|
|
spatial index, the ids of the boxes it belongs to.</para> |
|
|
|
|
|
<para>At each level the point is in one and only one box. For points on the edges the box are considered exclusive |
|
|
n the left side and inclusive on the right i-e ]start,end] (the points are normalized before projection to |
|
|
[-90,+90],]-180,+180]).</para> |
|
|
|
|
|
<para>We store in the lucene document corresponding to the entity to index one field for each level of the grid. |
|
|
The field is named : [spatial index fields name]_HSSI_[n] where [spatial index fields name] is the name given |
|
|
either by the parameter at class level annotation or derived from the name of the spatial annoted method of |
|
|
he entitiy, HSSI stands for Hibernate Search Spatial Index and n is the level of the grid.</para> |
|
|
|
|
|
<para>We also store the latitude and longitude in numeric field under [spatial index fields name]_HSSI_Latitude |
|
|
and [spatial index fields name]_HSSI_Longitude fields. |
|
|
They will be used to filter precisly results by distance in the second stage of the search.</para> |
|
|
</section> |
|
|
|
|
|
<section> |
|
|
<title>At search level</title> |
|
|
|
|
|
<para>Now that we have all these fields, what are they here for ?</para> |
|
|
|
|
|
<para>When you ask for a spatial search by providing a search discus (center+radius) we will calculate the boxes |
|
|
ids that do cover the search discus in the projected space, fetch all the documents that belong to there boxes |
|
|
(thus narrowing the number of documents for which we will have to calculate distance to the center) and then |
|
|
filter this subset with a real distance calculation. This is called two level spatial filtering.</para> |
|
|
|
|
|
<section> |
|
|
<title>Step 1 : compute the best grid level for the search discus</title> |
|
|
|
|
|
<para>For a given search radius there is an optimal grid level where the number of boxes to retrieve |
|
|
hall be minimal without bringing back to many documents (level 0 has only 1 box but retrieve all documents). |
|
|
he optimal grid level is maximum level where the boxes width (on the X axis) is larger than the search area. |
|
|
Near the equator line where projection deformation is minimal, this will lead to the retrieval of at most |
|
|
4 boxes. Towards the poles with deformation, it may bring more boxes but as the sinusoidal projection has a |
|
|
Tissot's indicatrix quite simple (see http://en.wikipedia.org/wiki/Sinusoidal_projection) in populated area, |
|
|
the overhead is minimal. (This should be tested, sampled and benchmarked).</para> |
|
|
|
|
|
</section> |
|
|
|
|
|
<section> |
|
|
<title>Step 2 : compute the corresponding covering boCellIds at that level</title> |
|
|
|
|
|
<para>Now we have the optimal level chosen, we can compute the ids of the boxes covering the search |
|
|
iscus (which is not a discus in projected space anymore).</para> |
|
|
|
|
|
<para>This is done by org.hibernate.search.spatial.impl.GridHelper.getGridCellsIds(Point center, double |
|
|
radius, int gridLevel)</para> |
|
|
|
|
|
<para>It will calculate the bouding box of the search discus and then call |
|
|
org.hibernate.search.spatial.impl.GridHelper.getGridCellsIds(Point lowerLeft, Point upperRight, int gridLevel) |
|
|
that will do the actual computation. If the bouding box crosses the merdian line it will cut the search in two |
|
|
and make two calls to getGridCellsIds(Point lowerLeft, Point upperRight, int gridLevel) with left and right |
|
|
parts of the bouding box.</para> |
|
|
|
|
|
<para>There are some geo related hacks (search radius too large, search radius crossing the poles) that are |
|
|
handled in bounding box computations done by Rectangle.fromBoundingCircle(Point center, double radius) (see |
|
|
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates for reference on those subjects).</para> |
|
|
|
|
|
<para>The GridHelper.getGridCellsIds(Point lowerLeft, Point upperRight, int gridLevel) project the defining |
|
|
points of the bouding box and compute the boxes they belong to. It returns all the box Ids between the lower |
|
|
eft to the upper right corners, thus covering the area.</para> |
|
|
|
|
|
</section> |
|
|
|
|
|
<section> |
|
|
<title>Step 3 : Lucene index lookup</title> |
|
|
|
|
|
<para>The Query is build with theses Ids to lookup for documents having a [spatial index fields name]_HSSI_[n] |
|
|
(n the level found at Step 1) field valued with one of the ids of Step 2. </para> |
|
|
|
|
|
<para>See org.hibernate.search.spatial.impl.GridFilter implentation.</para> |
|
|
|
|
|
<para>This Query will return all documents in the boxes covering the projected bouding box of the search |
|
|
discus. So it is too large and need refining. But we have narrowed the distance calculation problems to |
|
|
a subet of our datas.</para> |
|
|
|
|
|
</section> |
|
|
|
|
|
<section> |
|
|
<title>Step 4 : refine </title> |
|
|
|
|
|
<para>A distance calulation filter is set after the Lucene index lookup query of Step 3 to exclude false |
|
|
candidate from the result list.</para> |
|
|
|
|
|
<para>See SpatialQueryBuilderFromPoint.buildSpatialQuery(Point center, double radius, String fieldName)</para> |
|
|
|
|
|
</section> |
|
|
|
|
|
</section> |
|
|
</section> |
|
|
</chapter> |
|
|
|