This repository has been archived by the owner. It is now read-only.

Support Secondary Indexes #4

Closed
jtaylor-sfdc opened this Issue Jan 25, 2013 · 19 comments

Comments

Projects
None yet
10 participants
@jtaylor-sfdc
Contributor

jtaylor-sfdc commented Jan 25, 2013

Allow users to create indexes through a new CREATE INDEX DDL command and then behind the scenes build multiple projections of the table (i.e. a copy of the table using re-ordered or different row key columns). Phoenix will take care of maintaining the indexes when DML commands are issued and will choose the best table to use at query time.

@ghost ghost assigned jtaylor-sfdc Jan 25, 2013

@guotielong

This comment has been minimized.

Show comment
Hide comment
@guotielong

guotielong Feb 22, 2013

What time is available?

guotielong commented Feb 22, 2013

What time is available?

@testn

This comment has been minimized.

Show comment
Hide comment
@testn

testn Mar 13, 2013

Can you mention a bit more how we can implement this functionality? From what I read http://hadoop-hbase.blogspot.com/2012/10/musings-on-secondary-indexes.html, it looks like it is not easy to maintain a consistent index globally. I noticed that Jesse did some work with Culvert. Is that what you planned to incorporate?

testn commented Mar 13, 2013

Can you mention a bit more how we can implement this functionality? From what I read http://hadoop-hbase.blogspot.com/2012/10/musings-on-secondary-indexes.html, it looks like it is not easy to maintain a consistent index globally. I noticed that Jesse did some work with Culvert. Is that what you planned to incorporate?

@jtaylor-sfdc

This comment has been minimized.

Show comment
Hide comment
@jtaylor-sfdc

jtaylor-sfdc Mar 13, 2013

Contributor

It'll be a phased approach:
Phase 1 will include:

  1. SQL command for creating/dropping an index. An index will just be another HBase table with the row key being the indexed column values plus primary row pk.
  2. Time permitting, we'll support defining partially covered indexes, but at a minimum completely covered indexes (essentially multiple projects of the same table with a different row key).
  3. Query processor enhancements to choose the best table to use based on the query (i.e. primary versus index table).
  4. Incremental index maintenance as writes are done to the primary table. This will not be guaranteed consistent yet, so the client will need to deal with failures either through retry logic, invalidating the index and rebuilding it (time permitting from a given known good time stamp).

We've got some immutable data use cases where Map/Reduce jobs generate a Phoenix-compatible HFile and then the HFile is handed off to HBase. The above works fine for this scenario, since there's no incremental updates (i.e. the guaranteed consistency issue doesn't matter). We'd just generate one primary table HFile plus one HFile per index.

Phase 2 would include guaranteed consistent index maintenance. Jesse, Lars, and I have been talking about how to do this. We did some perf testing of their initial idea, but the batched gets to get back the data rows limits its usefulness (requiring a maximum selectivity of maybe 2-3%). Instead, Jesse and Lars have come up with a way of guaranteeing consistency at write time so that the reads aren't required to join back to the data table. They're still hashing through it a bit, but it seems to be gelling. Maybe we can get them to blog about it? @jyates @lhofhansl

Contributor

jtaylor-sfdc commented Mar 13, 2013

It'll be a phased approach:
Phase 1 will include:

  1. SQL command for creating/dropping an index. An index will just be another HBase table with the row key being the indexed column values plus primary row pk.
  2. Time permitting, we'll support defining partially covered indexes, but at a minimum completely covered indexes (essentially multiple projects of the same table with a different row key).
  3. Query processor enhancements to choose the best table to use based on the query (i.e. primary versus index table).
  4. Incremental index maintenance as writes are done to the primary table. This will not be guaranteed consistent yet, so the client will need to deal with failures either through retry logic, invalidating the index and rebuilding it (time permitting from a given known good time stamp).

We've got some immutable data use cases where Map/Reduce jobs generate a Phoenix-compatible HFile and then the HFile is handed off to HBase. The above works fine for this scenario, since there's no incremental updates (i.e. the guaranteed consistency issue doesn't matter). We'd just generate one primary table HFile plus one HFile per index.

Phase 2 would include guaranteed consistent index maintenance. Jesse, Lars, and I have been talking about how to do this. We did some perf testing of their initial idea, but the batched gets to get back the data rows limits its usefulness (requiring a maximum selectivity of maybe 2-3%). Instead, Jesse and Lars have come up with a way of guaranteeing consistency at write time so that the reads aren't required to join back to the data table. They're still hashing through it a bit, but it seems to be gelling. Maybe we can get them to blog about it? @jyates @lhofhansl

@HaisenLee

This comment has been minimized.

Show comment
Hide comment
@HaisenLee

HaisenLee Mar 14, 2013

Now,our index way is so:
The index row iconsists of four parts :

  1. the start key of the indexed data table's region
  2. the identifying of the index(the type of the value is integer,the index identifying of every column is sorted in the index metadata table )
    3.the value of the column indexed(the length of the value is fixed,string data will be transferred to long in MD5)
    4.the row of a piece of data indexed
    If the situation the multi-columns index exsits,the 2nd and 3rd part are repeated.

We use the index metadata api to create and delete the index.We use the postPut method of the region observer to bulid the index, and the preDelete method of the region observer to delete the index data,and the postSplit of the region observer to control the index table and data table is split in the same time and update the index data.We use the postBalance method of the master observer to control the data region and the related index region in the same region server.

When a batch of data will be put or deleted,we use MR to build or delete indexes in the client-server way.

We use the endpoint way to query data with index.According to the query,we scan the index table to get the rows of data table.Then,we batch the gets with the rows to gain the query result.

May we join in the second index development?Thanks!

HaisenLee commented Mar 14, 2013

Now,our index way is so:
The index row iconsists of four parts :

  1. the start key of the indexed data table's region
  2. the identifying of the index(the type of the value is integer,the index identifying of every column is sorted in the index metadata table )
    3.the value of the column indexed(the length of the value is fixed,string data will be transferred to long in MD5)
    4.the row of a piece of data indexed
    If the situation the multi-columns index exsits,the 2nd and 3rd part are repeated.

We use the index metadata api to create and delete the index.We use the postPut method of the region observer to bulid the index, and the preDelete method of the region observer to delete the index data,and the postSplit of the region observer to control the index table and data table is split in the same time and update the index data.We use the postBalance method of the master observer to control the data region and the related index region in the same region server.

When a batch of data will be put or deleted,we use MR to build or delete indexes in the client-server way.

We use the endpoint way to query data with index.According to the query,we scan the index table to get the rows of data table.Then,we batch the gets with the rows to gain the query result.

May we join in the second index development?Thanks!

@jtaylor-sfdc

This comment has been minimized.

Show comment
Hide comment
@jtaylor-sfdc

jtaylor-sfdc Mar 16, 2013

Contributor

Thank you, @HaisenLee for your offer to help with secondary indexing. We've already pretty far into the implementation of phase 1, and phase 2 require hbase core functionality to be added which Jesse and Lars are working on. Would you be interested in helping with any of other open issues?

Contributor

jtaylor-sfdc commented Mar 16, 2013

Thank you, @HaisenLee for your offer to help with secondary indexing. We've already pretty far into the implementation of phase 1, and phase 2 require hbase core functionality to be added which Jesse and Lars are working on. Would you be interested in helping with any of other open issues?

@HaisenLee

This comment has been minimized.

Show comment
Hide comment
@HaisenLee

HaisenLee Mar 16, 2013

@jtaylor-sfdc Sure,I am pleasure to join in dealing the open issues.

HaisenLee commented Mar 16, 2013

@jtaylor-sfdc Sure,I am pleasure to join in dealing the open issues.

@Terry-Shi

This comment has been minimized.

Show comment
Hide comment
@Terry-Shi

Terry-Shi Apr 30, 2013

Hi @jtaylor-sfdc
I'm little curious why index talbe's rowkey need contain primary rowkey.
Let's say there is a table called "EMP",this table has columns such as emp_name, age, dept_no ...
and we want to create index on emp_name.
assume index table is called IDX_EMP_emp_name.
I think the mapping relation from EMP {e001, Tom, 31, d001} to IDX_EMP_emp_name{Tom, e001} should be pretty clear.
I know there could be more than one "Tom" in the "EMP", because of that we need to save primary rowkey as qualifier name instead of value.

any thoughts ?

Terry-Shi commented Apr 30, 2013

Hi @jtaylor-sfdc
I'm little curious why index talbe's rowkey need contain primary rowkey.
Let's say there is a table called "EMP",this table has columns such as emp_name, age, dept_no ...
and we want to create index on emp_name.
assume index table is called IDX_EMP_emp_name.
I think the mapping relation from EMP {e001, Tom, 31, d001} to IDX_EMP_emp_name{Tom, e001} should be pretty clear.
I know there could be more than one "Tom" in the "EMP", because of that we need to save primary rowkey as qualifier name instead of value.

any thoughts ?

@jtaylor-sfdc

This comment has been minimized.

Show comment
Hide comment
@jtaylor-sfdc

jtaylor-sfdc May 29, 2013

Contributor

Here are the client-side changes to add secondary index support:

  1. Modify grammar to allow
    CREATE INDEX <index_name>
    ON <table_name> (<column_ref> [ASC|DESC], ...)
    INCLUDE (<column_ref>...) fam_properties

    Make sure you reuse the rules we already have to parse table_name optionally including the schema name, column_ref as optionally including the column family name, and fam_properties to pass through properties for the HBase table and column families.
  2. Add a CreateIndexCompiler, similar to CreateTableCompiler and invoke this from PhoenixStatement. Also, add a ExecutableDropIndexStatement, similar to ExecutableDropTableStatement. I think the drop index can just set the INDEX_STATE of the metadata for the index to DROPPED - no need to put Delete markers in the index table. We can just enforce that index names must be unique (i.e. you can't reuse index names after deletion).
  3. The CreateTableCompiler would call through to a createIndex call in MetaDataClient that
    • Calls through to connectionQueryServices to create the hbase table that'll store the index. You'd create each column family referenced in any column_ref (follow the same model as the metaDataClient.createTable for this). Make sure to support the SALT_BUCKETS property as this will be particularly useful for indexes.
    • Upserts the metadata of the index as a Phoenix table in a similar way that we create a regular table. You'll need to add a new PTableType of INDEX to differentiate it. Also, a new nullable column, INDEXED_TABLE, on SYSTEM.TABLE, only for the table row of an index table that stores the name of the table being indexed. Also, add a INDEX_STATE table that can track the index life-cycle: CREATED (initially), ACTIVE (after populated), INACTIVE (if/when it shouldn't be used,maybe because it's invalid), and DROPPED.
    • To ensure that the index metadata is in the same region as the table, use the schema name and table name from the data table for the index metadata rows. Add a nullable INDEX_NAME column at the end of the PK to store the index_name so that the index rows are distinct from the table metadata rows (if we want to avoid updating existing table metadata, we'll need to account for the lack of a null byte at the end - I'll take care of this). See (5) below for why, but we want to ensure the index metadata is contiguous with the table metadata.
    • The PK columns of the index are the columns being indexed, plus the PK of the data row as a trailing VARBINARY column. Use the same types as from the data table, but the column modifier will be gotten from the index DDL statement.
    • The other columns will be the ones from the INCLUDE part of the DDL statement.
    • Add a new MetaDataProtocol createIndex and dropIndex method. The implementation can reuse the code for building a table. It should also invalidate the data table row, so that clients get sent the updated PTable with the index on it.
  4. Create a PTableIndex, derived from PTable for the client-side cache. Add a List in PTable to store the indexes and update the serialization/deserialization logic
  5. Modify the private MetaDataEndPointImpl.getTable to pickup the index metadata along with the table metadata. The rows will be contiguous. The PTableIndex list will get passed through the PTable constructor.
  6. Modify PostDDLCompiler to populate the index initially. You can just use an UPSERT SELECT statement for this. Run it asynchronously and mark the index metadata row as ACTIVE when done.

This should take care of the parsing, compiling, and caching of the index metadata. For the usage of it, we'll need to do the following:

  1. In QueryCompiler, instead of building a single QueryPlan, you'll want to build one for each ACTIVE index of the PTable as well (you could not do this if it's fully qualified for the main data table). You'd basically replace the ColumnResolver that's created based on the FROM data_table with a ColumnResolver that represents the index table, as if the query was written as FROM index_table. You'll need to take into account if the query is using any columns that are not included in the index and then for now rule out using it (since we're not joining back from the index table to the data table yet). You could just catch a ColumnNotFoundException and ignore it and move on to the next index table, since that's what would happen. You'll likely want to create a new StatementContext for each iteration, since there's shared state there.
  2. You'd pick the QueryPlan that has the longest start/stop row key defined. It would be good to form some abstraction around this by passing the QueryPlan list to a QueryOptimizer class that chooses the "best" one.

This should take care of the usage part of things. For the index maintenance, talk with @jyates. He's got the plumbing all worked out. Basically, there's an interface you need to implement where given the Put/Delete list from a data table, return the Put/Delete list for an index table. You'll also need to send over a bit of metadata in the Put/Delete operation to indicate any column qualifiers on the data table must be retrieved to build the Put/Delete list for the index. You likely need to send over the RowKeySchema too. Make sure that you delegate to a separate Phoenix class to figure out the list of mutations given a Put/Delete on the main table. The reason is that I'd like to provide a method in PhoenixRuntime, similar to getUncommittedData that gets the List<KeyValue> for each index table. This will provide a way of generating an HFile for map/reduce jobs for the index tables that'll be consistent with the data table.

Contributor

jtaylor-sfdc commented May 29, 2013

Here are the client-side changes to add secondary index support:

  1. Modify grammar to allow
    CREATE INDEX <index_name>
    ON <table_name> (<column_ref> [ASC|DESC], ...)
    INCLUDE (<column_ref>...) fam_properties

    Make sure you reuse the rules we already have to parse table_name optionally including the schema name, column_ref as optionally including the column family name, and fam_properties to pass through properties for the HBase table and column families.
  2. Add a CreateIndexCompiler, similar to CreateTableCompiler and invoke this from PhoenixStatement. Also, add a ExecutableDropIndexStatement, similar to ExecutableDropTableStatement. I think the drop index can just set the INDEX_STATE of the metadata for the index to DROPPED - no need to put Delete markers in the index table. We can just enforce that index names must be unique (i.e. you can't reuse index names after deletion).
  3. The CreateTableCompiler would call through to a createIndex call in MetaDataClient that
    • Calls through to connectionQueryServices to create the hbase table that'll store the index. You'd create each column family referenced in any column_ref (follow the same model as the metaDataClient.createTable for this). Make sure to support the SALT_BUCKETS property as this will be particularly useful for indexes.
    • Upserts the metadata of the index as a Phoenix table in a similar way that we create a regular table. You'll need to add a new PTableType of INDEX to differentiate it. Also, a new nullable column, INDEXED_TABLE, on SYSTEM.TABLE, only for the table row of an index table that stores the name of the table being indexed. Also, add a INDEX_STATE table that can track the index life-cycle: CREATED (initially), ACTIVE (after populated), INACTIVE (if/when it shouldn't be used,maybe because it's invalid), and DROPPED.
    • To ensure that the index metadata is in the same region as the table, use the schema name and table name from the data table for the index metadata rows. Add a nullable INDEX_NAME column at the end of the PK to store the index_name so that the index rows are distinct from the table metadata rows (if we want to avoid updating existing table metadata, we'll need to account for the lack of a null byte at the end - I'll take care of this). See (5) below for why, but we want to ensure the index metadata is contiguous with the table metadata.
    • The PK columns of the index are the columns being indexed, plus the PK of the data row as a trailing VARBINARY column. Use the same types as from the data table, but the column modifier will be gotten from the index DDL statement.
    • The other columns will be the ones from the INCLUDE part of the DDL statement.
    • Add a new MetaDataProtocol createIndex and dropIndex method. The implementation can reuse the code for building a table. It should also invalidate the data table row, so that clients get sent the updated PTable with the index on it.
  4. Create a PTableIndex, derived from PTable for the client-side cache. Add a List in PTable to store the indexes and update the serialization/deserialization logic
  5. Modify the private MetaDataEndPointImpl.getTable to pickup the index metadata along with the table metadata. The rows will be contiguous. The PTableIndex list will get passed through the PTable constructor.
  6. Modify PostDDLCompiler to populate the index initially. You can just use an UPSERT SELECT statement for this. Run it asynchronously and mark the index metadata row as ACTIVE when done.

This should take care of the parsing, compiling, and caching of the index metadata. For the usage of it, we'll need to do the following:

  1. In QueryCompiler, instead of building a single QueryPlan, you'll want to build one for each ACTIVE index of the PTable as well (you could not do this if it's fully qualified for the main data table). You'd basically replace the ColumnResolver that's created based on the FROM data_table with a ColumnResolver that represents the index table, as if the query was written as FROM index_table. You'll need to take into account if the query is using any columns that are not included in the index and then for now rule out using it (since we're not joining back from the index table to the data table yet). You could just catch a ColumnNotFoundException and ignore it and move on to the next index table, since that's what would happen. You'll likely want to create a new StatementContext for each iteration, since there's shared state there.
  2. You'd pick the QueryPlan that has the longest start/stop row key defined. It would be good to form some abstraction around this by passing the QueryPlan list to a QueryOptimizer class that chooses the "best" one.

This should take care of the usage part of things. For the index maintenance, talk with @jyates. He's got the plumbing all worked out. Basically, there's an interface you need to implement where given the Put/Delete list from a data table, return the Put/Delete list for an index table. You'll also need to send over a bit of metadata in the Put/Delete operation to indicate any column qualifiers on the data table must be retrieved to build the Put/Delete list for the index. You likely need to send over the RowKeySchema too. Make sure that you delegate to a separate Phoenix class to figure out the list of mutations given a Put/Delete on the main table. The reason is that I'd like to provide a method in PhoenixRuntime, similar to getUncommittedData that gets the List<KeyValue> for each index table. This will provide a way of generating an HFile for map/reduce jobs for the index tables that'll be consistent with the data table.

@ghost ghost assigned tonyhuang May 29, 2013

@rodrigopr

This comment has been minimized.

Show comment
Hide comment
@rodrigopr

rodrigopr May 29, 2013

Seems great @jtaylor-sfdc, is it in an branch/fork already? Looking forward to see how it was implemented.

rodrigopr commented May 29, 2013

Seems great @jtaylor-sfdc, is it in an branch/fork already? Looking forward to see how it was implemented.

@jyates

This comment has been minimized.

Show comment
Hide comment
@jyates

jyates May 29, 2013

Contributor

For the maintenance side, I'm working on getting a patch into HBase (HBASE-8636) so indexing can supporting a compressed WAL (should be in 0.94.9). After that goes in, I'll send up a pull request to phoenix.

Its nothing too fancy though - there is no magic. Its just hacking the WAL entries to get durability, but otherwise only has a passing adherance to ACID - it only meets the HBase expectations. Coming soon - promise!

Contributor

jyates commented May 29, 2013

For the maintenance side, I'm working on getting a patch into HBase (HBASE-8636) so indexing can supporting a compressed WAL (should be in 0.94.9). After that goes in, I'll send up a pull request to phoenix.

Its nothing too fancy though - there is no magic. Its just hacking the WAL entries to get durability, but otherwise only has a passing adherance to ACID - it only meets the HBase expectations. Coming soon - promise!

@jtaylor-sfdc

This comment has been minimized.

Show comment
Hide comment
@jtaylor-sfdc

jtaylor-sfdc May 29, 2013

Contributor

Request to @jyates - for the compressed WAL patch, can you make your index stuff not have a hard dependency on that? I'd like folks to be able to use the indexing with 0.94.4 and above. We can detect if it's pre 0.94.9 and throw if an index is attempted to be created on an HBase table that has compressed WAL enabled.

Contributor

jtaylor-sfdc commented May 29, 2013

Request to @jyates - for the compressed WAL patch, can you make your index stuff not have a hard dependency on that? I'd like folks to be able to use the indexing with 0.94.4 and above. We can detect if it's pre 0.94.9 and throw if an index is attempted to be created on an HBase table that has compressed WAL enabled.

@jyates

This comment has been minimized.

Show comment
Hide comment
@jyates

jyates May 29, 2013

Contributor

Yeah, I think we can do that - shouldn't be too hard.

Contributor

jyates commented May 29, 2013

Yeah, I think we can do that - shouldn't be too hard.

@jtaylor-sfdc

This comment has been minimized.

Show comment
Hide comment
@jtaylor-sfdc

jtaylor-sfdc Jun 3, 2013

Contributor

Couple more additions, @tonyhuang. Instead of using a VARBINARY, just use the data row PK columns as the are. That way you can just remove any that are already in the index and you won't have any duplication.

Contributor

jtaylor-sfdc commented Jun 3, 2013

Couple more additions, @tonyhuang. Instead of using a VARBINARY, just use the data row PK columns as the are. That way you can just remove any that are already in the index and you won't have any duplication.

@tonyhuang

This comment has been minimized.

Show comment
Hide comment
@tonyhuang

tonyhuang Jun 3, 2013

Contributor

Got you.

Contributor

tonyhuang commented Jun 3, 2013

Got you.

@jtaylor-sfdc

This comment has been minimized.

Show comment
Hide comment
@jtaylor-sfdc

jtaylor-sfdc Jun 4, 2013

Contributor

One other consideration on deciding between the "right" query plan to choose: you'll want to consider the ORDER BY clause as well. If we're ordering by the leading indexed columns, we'll definitely want to use that index, even if there's no start/stop row key formed for that index.

Contributor

jtaylor-sfdc commented Jun 4, 2013

One other consideration on deciding between the "right" query plan to choose: you'll want to consider the ORDER BY clause as well. If we're ordering by the leading indexed columns, we'll definitely want to use that index, even if there's no start/stop row key formed for that index.

jtaylor-sfdc added a commit that referenced this issue Jun 14, 2013

Merge pull request #263 from jyates/hbase-index
Adding hbase-index contrib/ project - support #4
@jtaylor-sfdc

This comment has been minimized.

Show comment
Hide comment
@jtaylor-sfdc

jtaylor-sfdc Jul 13, 2013

Contributor

@mujtabachohan, @jyates, @simontoens, @elilevine, @anoopsjohn, @ramkrish86, @maryannxue, @ivarley @lhofhansl @ryang-sfdc @srau Phase 1 of secondary indexing is ready for testing. This basically includes:

  • Creating a composite, covered index and populating it based on the existing table data. This still needs some work to improve performance (see #281 and #282).
  • Building the index as the table is built. No incremental maintenance yet (but we're close - see #336), so you have to have a use case where only new rows are inserted as opposed to being updated in-place (which I believe covers your use cases, @simontoens, @elilevine, @ryang-sfdc).
  • Failure scenarios must be handled by the client. When a series of upserts are committed, both the table and index updates will be sent as a batch to the server. If any failures occur, an exception is thrown and it's up to the client to retry the batch until it succeeds.
  • Using the index for your data table query when the query processor deems that it'll be faster. We never join back from the index table to the data table, but instead rely on covered indexes. If your index doesn't contain all the columns referenced in a query, it won't be used.
  • Cascade deleting the index table when the data table is dropped.
  • Support for dropping an index individually.
  • Allowing the index to be disabled/enabled.
  • Support for accessing the index metadata through the standard JDBC [DatabaseMetaData#getIndexInfo](http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html#getIndexInfo(java.lang.String, java.lang.String, java.lang.String, boolean, boolean))
  • Showing whether or not an index will be used in the explain plan.

If you want a little example, take a look at our tests here and here.

I think at a minimum we need more testing prior to releasing (#321). Nice to have would be #281, #337, and #336, but I'd be ok with a release that just hardens what's there now, since it helps in our initial use cases. Also, we have a lot of other great stuff that I'd like to get out there in a release as well.

Thoughts?

Contributor

jtaylor-sfdc commented Jul 13, 2013

@mujtabachohan, @jyates, @simontoens, @elilevine, @anoopsjohn, @ramkrish86, @maryannxue, @ivarley @lhofhansl @ryang-sfdc @srau Phase 1 of secondary indexing is ready for testing. This basically includes:

  • Creating a composite, covered index and populating it based on the existing table data. This still needs some work to improve performance (see #281 and #282).
  • Building the index as the table is built. No incremental maintenance yet (but we're close - see #336), so you have to have a use case where only new rows are inserted as opposed to being updated in-place (which I believe covers your use cases, @simontoens, @elilevine, @ryang-sfdc).
  • Failure scenarios must be handled by the client. When a series of upserts are committed, both the table and index updates will be sent as a batch to the server. If any failures occur, an exception is thrown and it's up to the client to retry the batch until it succeeds.
  • Using the index for your data table query when the query processor deems that it'll be faster. We never join back from the index table to the data table, but instead rely on covered indexes. If your index doesn't contain all the columns referenced in a query, it won't be used.
  • Cascade deleting the index table when the data table is dropped.
  • Support for dropping an index individually.
  • Allowing the index to be disabled/enabled.
  • Support for accessing the index metadata through the standard JDBC [DatabaseMetaData#getIndexInfo](http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html#getIndexInfo(java.lang.String, java.lang.String, java.lang.String, boolean, boolean))
  • Showing whether or not an index will be used in the explain plan.

If you want a little example, take a look at our tests here and here.

I think at a minimum we need more testing prior to releasing (#321). Nice to have would be #281, #337, and #336, but I'd be ok with a release that just hardens what's there now, since it helps in our initial use cases. Also, we have a lot of other great stuff that I'd like to get out there in a release as well.

Thoughts?

@simontoens

This comment has been minimized.

Show comment
Hide comment
@simontoens

simontoens Jul 13, 2013

Contributor

Hey James - that's awesome. We will give this a try!

Contributor

simontoens commented Jul 13, 2013

Hey James - that's awesome. We will give this a try!

@apurtell

This comment has been minimized.

Show comment
Hide comment
@apurtell

apurtell Jul 15, 2013

Contributor

@jtaylor-sfdc Looks like a sensor network simulation would be a good test case for what's in place now.

Do you guys have a framework for larger scale integration tests? Or any thoughts on one? I've done some recent work with benchpress. For testing secondary indexing and other features on clusters with some test data heft, and realistic application simulations, please consider opening issues for brainstorming of such, and so will I.

Contributor

apurtell commented Jul 15, 2013

@jtaylor-sfdc Looks like a sensor network simulation would be a good test case for what's in place now.

Do you guys have a framework for larger scale integration tests? Or any thoughts on one? I've done some recent work with benchpress. For testing secondary indexing and other features on clusters with some test data heft, and realistic application simulations, please consider opening issues for brainstorming of such, and so will I.

@jtaylor-sfdc

This comment has been minimized.

Show comment
Hide comment
@jtaylor-sfdc

jtaylor-sfdc Aug 19, 2013

Contributor

Support for secondary indexes over tables with immutable rows is in the 2.0.0 release. The incremental maintenance piece coming shortly will be traced by #336, so closing this issue.

Contributor

jtaylor-sfdc commented Aug 19, 2013

Support for secondary indexes over tables with immutable rows is in the 2.0.0 release. The incremental maintenance piece coming shortly will be traced by #336, so closing this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.