Skip to content

Latest commit

 

History

History
238 lines (174 loc) · 13.8 KB

indexing.adoc

File metadata and controls

238 lines (174 loc) · 13.8 KB

Configuring an Index

Pivotal GemFire allows Indexes (or Indices) to be created on Region data to improve the performance of OQL queries.

In Spring Data GemFire (SDG), Indexes are declared with the index element:

<gfe:index id="myIndex" expression="someField" from="/SomeRegion" type="HASH"/>

In Spring Data GemFire’s XML schema (a.k.a. SDG namespace), Index bean declarations are not bound to a Region, unlike GemFire’s native cache.xml. Rather, they are top-level elements just like <gfe:cache>. This allows a developer to declare any number of Indexes on any Region whether they were just created or already exist, a significant improvement over GemFire’s native cache.xml format.

An Index must have a name. A developer may give the Index an explicit name using the name attribute, otherwise the bean name (i.e. value of the id attribute) of the Index bean definition is used as the Index name.

The expression and from clause form the main components of an Index, identifying the data to index (i.e. the Region identified in the from clause) along with what criteria (i.e. expression) is used to index the data. The expression should be based on what application domain object fields are used in the predicate of application-defined OQL queries used to query and lookup the objects stored in the Region.

For example, if I have a Customer that has a lastName property…​

@Region("Customers")
class Customer {

  @Id
  Long id;

  String lastName;
  String firstName;

  ...
}

And, I also have an application defined SD[G] Repository to query for Customers…​

interface CustomerRepository extends GemfireRepository<Customer, Long> {

  Customer findByLastName(String lastName);

  ...
}

Then, the SD[G] Repository finder/query method would result in the following OQL statement being executed…​

SELECT * FROM /Customers c WHERE c.lastName = '$1'

Therefore, I might want to create an Index like so…​

<gfe:index id="myIndex" name="CustomersLastNameIndex" expression="lastName" from="/Customers" type="HASH"/>

The from clause must refer to a valid, existing Region and is how an Index gets applied to a Region. This is not Sprig Data GemFire specific; this is a feature of Pivotal GemFire.

The Index type maybe 1 of 3 enumerated values defined by Spring Data GemFire’s IndexType enumeration: FUNCTIONAL, HASH and PRIMARY_KEY.

Each of the enumerated values correspond to one of the QueryService create[|Key|Hash]Index methods invoked when the actual Index is to be created (or "defined"; more on "defining" Indexes below). For instance, if the IndexType is PRIMARY_KEY, then the QueryService.createKeyIndex(..) is invoked to create a KEY Index.

The default is FUNCTIONAL and results in one of the QueryService.createIndex(..) methods being invoked.

See the Spring Data GemFire XML schema for a full set of options.

For more information on Indexing in Pivotal GemFire, see Working with Indexes in Pivotal GemFire’s User Guide.

Defining Indexes

In addition to creating Indexes upfront as Index bean definitions are processed by Spring Data GemFire on Spring container initialization, you may also define all of your application Indexes prior to creating them by using the define attribute, like so…​

<gfe:index id="myDefinedIndex" expression="someField" from="/SomeRegion" define="true"/>

When define is set to true (defaults to false), this will not actually create the Index right then and there. All "defined" Indexes are created all at once, when the Spring ApplicationContext is "refreshed", or, that is, when a ContextRefreshedEvent is published by the Spring container. Spring Data GemFire registers itself as an ApplicationListener listening for the ContextRefreshedEvent. When fired, Spring Data GemFire will call QueryService.createDefinedIndexes().

Defining Indexes and creating them all at once helps promote speed and efficiency when creating Indexes.

See Creating Multiple Indexes at Once for more details.

IgnoreIfExists and Override

Two Spring Data GemFire Index configuration options warrant special mention here: ignoreIfExists and override.

These options correspond to the ignore-if-exists and override attributes on the <gfe:index> element in Spring Data GemFire’s XML schema, respectively.

Warning
Make sure you absolutely understand what you are doing before using either of these options. These options can affect the performance and/or resources (e.g. memory) consumed by your application at runtime. As such, both of these options are disabled (i.e. set to false) in SDG by default.
Note
These options are only available in Spring Data GemFire and exist to workaround known limitations with Pivotal GemFire; there are no equivalent options or functionality available in GemFire itself.

Each option significantly differs in behavior and entirely depends on the type of GemFire Index Exception thrown. This also means that neither option has any effect if a GemFire Index-type Exception is not thrown. These options are meant to specifically handle GemFire IndexExistsExceptions and IndexNameConflictExceptions, which can occur for various, sometimes obscure reasons. But, in general…​

  • An IndexExistsException is thrown when there exists another Index with the same definition but different name when attempting to create an Index.

  • An IndexNameConflictException is thrown when there exists another Index with the same name but possibly different definition when attempting to create an Index.

Spring Data GemFire’s default behavior is to fail-fast, always! So, neither Index Exception will be "handled" by default; these Index Exceptions are simply wrapped in a SDG GemfireIndexException and rethrown. If you wish for Spring Data GemFire to handle them for you, then you can set either of these Index bean definition options.

IgnoreIfExists always takes precedence over Override, primarily because it uses less resources given it returns the "existing" Index in both exceptional cases.

IgnoreIfExists Behavior

When an IndexExistsException is thrown and ignoreIfExists is set to true (or <gfe:index ignore-if-exists="true">), then the Index that would have been created by this Index bean definition / declaration will be "ignored", and the "existing" Index will be returned.

There is very little consequence in returning the "existing" Index since the Index "definition" is the same, as deemed by GemFire itself, not SDG.

However, this also means that no Index with the “name” specified in your Index bean definition / declaration will "actually" exist from GemFire’s perspective either (i.e. with QueryService.getIndexes()). Therefore, you should be careful when writing OQL query statements that use Query Hints, especially Hints that refer to the application Index being "ignored". Those Query Hints will need to be changed.

Now, when an IndexNameConflictException is thrown and ignoreIfExists is set to true (or <gfe:index ignore-if-exists="true">), then the Index that would have been created by this Index bean definition / declaration will also be "ignored", and the "existing" Index will be returned, just like when an IndexExistsException is thrown.

However, there is more risk in returning the "existing" Index and "ignoring" the application’s definition of the Index when an IndexNameConflictException is thrown since, for a IndexNameConflictException, while the "names" of the conflicting Indexes are the same, the "definitions" could very well be different! This obviously could have implications for OQL queries specific to the application, where you would presume the Indexes were defined specifically with the application data access patterns and queries in mind. However, if like named Indexes differ in definition, this might not be the case. So, make sure you verify.

Note
SDG makes a best effort to inform the user when the Index being ignored is significantly different in its definition from the "existing" Index. However, in order for SDG to accomplish this, it must be able to "find" the existing Index, which is looked up using the GemFire API (the only means available).

Override Behavior

When an IndexExistsException is thrown and override is set to true (or <gfe:index override="true">), then the Index is effectively "renamed". Remember, IndexExistsExceptions are thrown when multiple Indexes exist, all having the same "definition" but different "names".

Spring Data GemFire can only accomplish this using GemFire’s API, by first "removing" the "existing" Index and then "recreating" the Index with the new name. It is possible that either the remove or subsequent create invocation could fail. There is no way to execute both actions atomically and rollback this joint operation if either fails.

However, if it succeeds, then you have the same problem as before with the "ignoreIfExists" option. Any existing OQL query statement using "Query Hints" referring to the old Index by name must be changed.

Now, when an IndexNameConflictException is thrown and override is set to true (or <gfe:index override="true">), then potentially the "existing" Index will be "re-defined". I say "potentially", because it is possible for the "like-named", "existing" Index to have exactly the same definition and name when an IndexNameConflictException is thrown.

If so, SDG is smart and will just return the "existing" Index as is, even on override. There is no harm in this since both the "name" and the "definition" are exactly the same. Of course, SDG can only accomplish this when SDG is able to "find" the "existing" Index, which is dependent on GemFire’s APIs. If it cannot find it, nothing happens and a SDG GemfireIndexException is thrown wrapping the IndexNameConflictException.

However, when the "definition" of the "existing" Index is different, then SDG will attempt to "recreate" the Index using the Index definition specified in the Index bean definition /declaration. Make sure this is what you want and make sure the Index definition matches your expectations and application requirements.

How does IndexNameConflictExceptions actually happen?

It is probably not all that uncommon for IndexExistsExceptions to be thrown, especially when multiple configuration sources are used to configure GemFire (e.g. Spring Data GemFire, GemFire Cluster Config, maybe GemFire native cache.xml, the API, etc, etc). You should definitely prefer 1 configuration method here and stick with it.

However, when does an IndexNameConflictException get thrown?

One particular case is an Index defined on a PARTITION Region (PR). When an Index is defined on a PARTITION Region (e.g. "X"), GemFire distributes the Index definition (and name) to other peer members in the cluster that also host the same PARTITION Region (i.e. "X"). The distribution of this Index definition to and subsequent creation of this Index by peer members on a "need-to-know" basis (i.e. those hosting the same PR) is performed asynchronously.

During this window of time, it is possible that these "pending" PR Indexes will not be identifiable by GemFire, such as with a call to QueryService.getIndexes() or with QueryService.getIndexes(:Region), or even with QueryService.getIndex(:Region, indexName:String).

As such, the only way for SDG or other GemFire cache client applications (not involving Spring) to know for sure, is to just attempt to create the Index. If it fails with either an IndexNameConflictException, or even an IndexExistsException, then you will know. This is because the QueryService Index creation waits on "pending" Index definitions, where as the other GemFire API calls do not.

In any case, SDG makes a best effort and attempts to inform the user what has or is happening along with the corrective action. Given all GemFire QueryService.createIndex(..) methods are synchronous, "blocking" operations, then the state of GemFire should be consistent and accessible after either of these Index-type Exceptions are thrown, in which case, SDG can inspect the state of the system and respond/act accordingly, based on the user’s desired configuration.

In all other cases, SDG will simply fail-fast!