New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secondary index #401

Open
doublex opened this Issue Sep 24, 2015 · 56 comments

Comments

Projects
None yet
@doublex

doublex commented Sep 24, 2015

Is scylladb a "newsql" database like cockroachdb, clustrix, ...?
More precise:
Is there support for a secondary index?
If yes, is the secondary index stored on the same machine as the record? Is it necessary to ask all the machines to query a secondary index?

@dorlaor

This comment has been minimized.

Show comment
Hide comment
@dorlaor

dorlaor Sep 24, 2015

Contributor

On Thu, Sep 24, 2015 at 1:14 PM, doublex notifications@github.com wrote:

Is scylladb a "newsql" database like cockroachdb, clustrix, ...?
More precise:
Is there support for a secondary index?
If yes, is the secondary index stored on the same machine as the record?
Is it necessary to ask all the machines to query a secondary index?

We have a half-ready secondary index code which follow's Cassandra one.
We'll complete it but more interested in the materialized view version of
Cassandra.

The nice thing is that our performance allows one to use more views and
really untangle the value
of indexes.


Reply to this email directly or view it on GitHub
#401.

Contributor

dorlaor commented Sep 24, 2015

On Thu, Sep 24, 2015 at 1:14 PM, doublex notifications@github.com wrote:

Is scylladb a "newsql" database like cockroachdb, clustrix, ...?
More precise:
Is there support for a secondary index?
If yes, is the secondary index stored on the same machine as the record?
Is it necessary to ask all the machines to query a secondary index?

We have a half-ready secondary index code which follow's Cassandra one.
We'll complete it but more interested in the materialized view version of
Cassandra.

The nice thing is that our performance allows one to use more views and
really untangle the value
of indexes.


Reply to this email directly or view it on GitHub
#401.

@sfescape

This comment has been minimized.

Show comment
Hide comment
@sfescape

sfescape Sep 25, 2015

I think secondary indices have strong value (without the performance hit) wrt lookups in wide rows. Cases where you know the partition key but want to look up rows without including the clustering key.

sfescape commented Sep 25, 2015

I think secondary indices have strong value (without the performance hit) wrt lookups in wide rows. Cases where you know the partition key but want to look up rows without including the clustering key.

@nyh

This comment has been minimized.

Show comment
Hide comment
@nyh

nyh Oct 6, 2015

Contributor

I stumbled upon this paper: http://researcher.watson.ibm.com/researcher/files/us-wtan/DiffIndex-EDBT14-CR.pdf which might be relevant if we ever want to use a different algorithm for secondary indexes than Cassandra's.

Contributor

nyh commented Oct 6, 2015

I stumbled upon this paper: http://researcher.watson.ibm.com/researcher/files/us-wtan/DiffIndex-EDBT14-CR.pdf which might be relevant if we ever want to use a different algorithm for secondary indexes than Cassandra's.

@doublex

This comment has been minimized.

Show comment
Hide comment
@doublex

doublex Nov 11, 2015

All the DBs on the market have some flaws: either they are based on "garbage collection" (e.g. cockroachdb, hbase, ...) or without a sorted distributed index (cassandra, riak, ...)

doublex commented Nov 11, 2015

All the DBs on the market have some flaws: either they are based on "garbage collection" (e.g. cockroachdb, hbase, ...) or without a sorted distributed index (cassandra, riak, ...)

@duarten

This comment has been minimized.

Show comment
Hide comment
@duarten

duarten Apr 3, 2016

Member

A relevant paper on a more scalable approach to secondary indexes: https://ramcloud.atlassian.net/wiki/download/attachments/6848671/slik.pdf

Member

duarten commented Apr 3, 2016

A relevant paper on a more scalable approach to secondary indexes: https://ramcloud.atlassian.net/wiki/download/attachments/6848671/slik.pdf

@tzach tzach modified the milestones: 1.1, Yellowfin Apr 3, 2016

@tzach tzach added the CQL label Apr 3, 2016

@slivne slivne added the high label Apr 4, 2016

@tzach

This comment has been minimized.

Show comment
Hide comment
Contributor

tzach commented Apr 6, 2016

@doublex

This comment has been minimized.

Show comment
Hide comment
@doublex

doublex Apr 17, 2016

The index should be a distributed, sorted map - otherwise it could not be used for "range requests".

doublex commented Apr 17, 2016

The index should be a distributed, sorted map - otherwise it could not be used for "range requests".

@nyh

This comment has been minimized.

Show comment
Hide comment
@nyh

nyh Apr 17, 2016

Contributor

@doublex can you please expand on what you mean?

I think that SASI's method is particularly well-suited for range requests which result in a large number of results - e.g., find all items with price between $1 and $10. If we expect the results' price field to fall into that range, but do not expect the results to be sorted by price, SASI can return the result sorted by token (murmur3 hash of the item's original partition key) and therefore instead of querying all the nodes at once, it can query just the node with the lowest tokens, get a bunch of results, and only query the next node if it didn't get enough results from the first node.

However, in different use cases, it's less clear to me how SASI is useful... If the search only turns up a small number of results, all nodes will need to be queried. If each node has a very large number of vnodes (in scylla, we have 256 vnodes per node!), we can get very few results from each vnode and even getting token order might require querying many nodes. And if we need to sort the results by some meaningful order, not random token order, again we will need to query all nodes and collect all the results.

Contributor

nyh commented Apr 17, 2016

@doublex can you please expand on what you mean?

I think that SASI's method is particularly well-suited for range requests which result in a large number of results - e.g., find all items with price between $1 and $10. If we expect the results' price field to fall into that range, but do not expect the results to be sorted by price, SASI can return the result sorted by token (murmur3 hash of the item's original partition key) and therefore instead of querying all the nodes at once, it can query just the node with the lowest tokens, get a bunch of results, and only query the next node if it didn't get enough results from the first node.

However, in different use cases, it's less clear to me how SASI is useful... If the search only turns up a small number of results, all nodes will need to be queried. If each node has a very large number of vnodes (in scylla, we have 256 vnodes per node!), we can get very few results from each vnode and even getting token order might require querying many nodes. And if we need to sort the results by some meaningful order, not random token order, again we will need to query all nodes and collect all the results.

@doublex

This comment has been minimized.

Show comment
Hide comment
@doublex

doublex Apr 18, 2016

If the index is stored on the same machine ("machine local"), all the nodes have to be queried. If the key is a distributed sorted map, you only ask the node which is responsible for that range.
More precise: A "machine local"-index does not scale with an increasing number of nodes.

doublex commented Apr 18, 2016

If the index is stored on the same machine ("machine local"), all the nodes have to be queried. If the key is a distributed sorted map, you only ask the node which is responsible for that range.
More precise: A "machine local"-index does not scale with an increasing number of nodes.

@nyh

This comment has been minimized.

Show comment
Hide comment
@nyh

nyh Apr 18, 2016

Contributor

TL;DR I think you want Materialized Views

@doublex, "machine local" is the way Secondary Index work in Cassandra. So I assume you don't want that implementation.
The "distributed map" which you are looking for is the way that "Materialized Views" work in Cassandra, and we have a separate issue (#1141) to support that too. The partitions of a Materialized View will be sorted according the specified sorting order - it is possible to sort them in string order (and thereby enable efficient range queries using a single node) but usually in Cassandra users prefer to sort partitions in hash order (murmur3) which gives up on range queries but gives better protection against "hotspots" (nodes which get a significantly busier key range). Arguably, because we have so many vnodes (256 per node), this might give us good enough protection against hotspots, that ordering partitions in natural order might be acceptable, and this will give you the range query feature you were hoping for.

Contributor

nyh commented Apr 18, 2016

TL;DR I think you want Materialized Views

@doublex, "machine local" is the way Secondary Index work in Cassandra. So I assume you don't want that implementation.
The "distributed map" which you are looking for is the way that "Materialized Views" work in Cassandra, and we have a separate issue (#1141) to support that too. The partitions of a Materialized View will be sorted according the specified sorting order - it is possible to sort them in string order (and thereby enable efficient range queries using a single node) but usually in Cassandra users prefer to sort partitions in hash order (murmur3) which gives up on range queries but gives better protection against "hotspots" (nodes which get a significantly busier key range). Arguably, because we have so many vnodes (256 per node), this might give us good enough protection against hotspots, that ordering partitions in natural order might be acceptable, and this will give you the range query feature you were hoping for.

@doublex

This comment has been minimized.

Show comment
Hide comment
@doublex

doublex Apr 18, 2016

It would be great to have one distributed database which
a) supports range requests (on secondary index)
b) scales with sherd count
c) not based on "garbage collection"
To my knowledge such a database does not exist

doublex commented Apr 18, 2016

It would be great to have one distributed database which
a) supports range requests (on secondary index)
b) scales with sherd count
c) not based on "garbage collection"
To my knowledge such a database does not exist

@tzach

This comment has been minimized.

Show comment
Hide comment
@tzach

tzach Apr 18, 2016

Contributor

@doublex have you looked into DynamoDB Local Secondary Index?

It true to his name as secondary, similar to Cassandra clustering key.

Contributor

tzach commented Apr 18, 2016

@doublex have you looked into DynamoDB Local Secondary Index?

It true to his name as secondary, similar to Cassandra clustering key.

@tzach tzach modified the milestones: 1.2, 1.1 May 2, 2016

@doublex

This comment has been minimized.

Show comment
Hide comment
@doublex

doublex May 10, 2016

I am so stunned. I can hardly wait for 1.1 or 1.2.
A fast database (written in C++) which supports range requests on a secondary index and scales with shard count is a "missing link".
Hopefully scylladb saves us.

doublex commented May 10, 2016

I am so stunned. I can hardly wait for 1.1 or 1.2.
A fast database (written in C++) which supports range requests on a secondary index and scales with shard count is a "missing link".
Hopefully scylladb saves us.

@dahankzter

This comment has been minimized.

Show comment
Hide comment
@dahankzter

dahankzter May 10, 2016

So how does the current implementation of primary indexes really work?
If I have a PK(a,b) and execute a query select * from T where a='value' will the index kick in?

dahankzter commented May 10, 2016

So how does the current implementation of primary indexes really work?
If I have a PK(a,b) and execute a query select * from T where a='value' will the index kick in?

@avikivity

This comment has been minimized.

Show comment
Hide comment
@avikivity

avikivity May 10, 2016

Contributor

Yes, that case will be very fast.

Contributor

avikivity commented May 10, 2016

Yes, that case will be very fast.

@doublex

This comment has been minimized.

Show comment
Hide comment
@doublex

doublex May 10, 2016

A good compilation of different secondary index designs:
https://raw.githubusercontent.com/cockroachdb/cockroach/c3bc53a463f0a7036dcea06ac6d109d3ef2aa4fe/resources/doc/scan-efficiency.png

Currently the only distributed database with secondary index (that scales with shard count) are "HBase" (bloatware, ...) and "Hypertable" (fat client, ...), maybe CockroachDB in future (mark-and-sweep garbage collector, ...)

doublex commented May 10, 2016

A good compilation of different secondary index designs:
https://raw.githubusercontent.com/cockroachdb/cockroach/c3bc53a463f0a7036dcea06ac6d109d3ef2aa4fe/resources/doc/scan-efficiency.png

Currently the only distributed database with secondary index (that scales with shard count) are "HBase" (bloatware, ...) and "Hypertable" (fat client, ...), maybe CockroachDB in future (mark-and-sweep garbage collector, ...)

@doublex

This comment has been minimized.

Show comment
Hide comment
@doublex

doublex Nov 6, 2016

Ahhh - another database written in a language with the "stop the world"-issue during garbage collection

doublex commented Nov 6, 2016

Ahhh - another database written in a language with the "stop the world"-issue during garbage collection

@manigandham

This comment has been minimized.

Show comment
Hide comment
@manigandham

manigandham Nov 6, 2016

@tzach - The video link is probably the most clear description but there is a concept of hybrid replexes that aid in recovery of either the original partitioned or the secondary indexed data with some trade-offs in the recovery taking longer or shorter.

You're right though that if it's necessary to sustain the exact same latencies through a failure then the best option is to just have an entire replicated copy of the data, in which case that is what MVs do since they just act as differently keyed tables. This is basically what HyperDex does too, Replex is an approach to lessen the storage requirements and improve some recovery characteristics that I feel is interesting if MVs are not the foundation for all secondary index types.

@doublex - Perhaps, but the language is just an implementation detail, the same way Cassandra the wide-column store is implemented in both Java or C++ with Scylla.

manigandham commented Nov 6, 2016

@tzach - The video link is probably the most clear description but there is a concept of hybrid replexes that aid in recovery of either the original partitioned or the secondary indexed data with some trade-offs in the recovery taking longer or shorter.

You're right though that if it's necessary to sustain the exact same latencies through a failure then the best option is to just have an entire replicated copy of the data, in which case that is what MVs do since they just act as differently keyed tables. This is basically what HyperDex does too, Replex is an approach to lessen the storage requirements and improve some recovery characteristics that I feel is interesting if MVs are not the foundation for all secondary index types.

@doublex - Perhaps, but the language is just an implementation detail, the same way Cassandra the wide-column store is implemented in both Java or C++ with Scylla.

@tzach

This comment has been minimized.

Show comment
Hide comment
@tzach

tzach Mar 23, 2017

Contributor

related: Secondary index support for key-value pairs in CQL3 maps https://issues.apache.org/jira/browse/CASSANDRA-8473

Contributor

tzach commented Mar 23, 2017

related: Secondary index support for key-value pairs in CQL3 maps https://issues.apache.org/jira/browse/CASSANDRA-8473

@tzach

This comment has been minimized.

Show comment
Hide comment
@tzach

tzach Mar 26, 2017

Contributor

related:

Contributor

tzach commented Mar 26, 2017

related:

@jvsinclair

This comment has been minimized.

Show comment
Hide comment
@jvsinclair

jvsinclair Sep 8, 2017

What is the Status of Secondary Indexes? Will they ever be supported? This is the main issue keeping our company from switching from Cassandra.

jvsinclair commented Sep 8, 2017

What is the Status of Secondary Indexes? Will they ever be supported? This is the main issue keeping our company from switching from Cassandra.

@dorlaor

This comment has been minimized.

Show comment
Hide comment
@dorlaor

dorlaor Sep 8, 2017

Contributor

SI (Secondary indexes) is in progress. @penberg just recently sent an important patchset which makes it 70% there and there isn't much to do in terms of coding. Since our implementation relies on our Materialized View feature, we need it to reach general availability maturity level. It's in experimental mode today and misses couple of important pieces like repair and some more.

I think it's fair to say that SI will be committed as experimental in Oct and get GAed through the rest of the year.

Contributor

dorlaor commented Sep 8, 2017

SI (Secondary indexes) is in progress. @penberg just recently sent an important patchset which makes it 70% there and there isn't much to do in terms of coding. Since our implementation relies on our Materialized View feature, we need it to reach general availability maturity level. It's in experimental mode today and misses couple of important pieces like repair and some more.

I think it's fair to say that SI will be committed as experimental in Oct and get GAed through the rest of the year.

@doublex

This comment has been minimized.

Show comment
Hide comment
@doublex

doublex Sep 14, 2017

@jvsinclair
We are also waiting for "2i" (switching from "Berkeley DB")

A little joke:

aroundcorner

doublex commented Sep 14, 2017

@jvsinclair
We are also waiting for "2i" (switching from "Berkeley DB")

A little joke:

aroundcorner

@penberg penberg assigned penberg and unassigned nyh Sep 15, 2017

@penberg

This comment has been minimized.

Show comment
Hide comment
@penberg

penberg Sep 15, 2017

Contributor

You can find our working prototype of SI over MV here:

https://github.com/penberg/scylla/commits/penberg/cql-2i-queries/wip

The prototype creates a materialized view under the hood on CREATE INDEX and SELECT statement restrictions on indexed columns use the backing view. It will still take a little while for the code to be merged to master, after which it will be available in our nightly build. The target for experimental version is still Scylla 2.2 and GA once all the issues are ironed out. The SI implementation relies on MV so that also needs to be GA for that to happen.

It will still take some time, but hopefully not as long as @doublex's picture suggests.

Contributor

penberg commented Sep 15, 2017

You can find our working prototype of SI over MV here:

https://github.com/penberg/scylla/commits/penberg/cql-2i-queries/wip

The prototype creates a materialized view under the hood on CREATE INDEX and SELECT statement restrictions on indexed columns use the backing view. It will still take a little while for the code to be merged to master, after which it will be available in our nightly build. The target for experimental version is still Scylla 2.2 and GA once all the issues are ironed out. The SI implementation relies on MV so that also needs to be GA for that to happen.

It will still take some time, but hopefully not as long as @doublex's picture suggests.

@jvsinclair

This comment has been minimized.

Show comment
Hide comment
@jvsinclair

jvsinclair Sep 19, 2017

@penberg that is great news. If you want help testing this when your ready let me know.

jvsinclair commented Sep 19, 2017

@penberg that is great news. If you want help testing this when your ready let me know.

@penberg

This comment has been minimized.

Show comment
Hide comment
@penberg

penberg Nov 4, 2017

Contributor

First experimental version of CQL indexed queries landed in master in commit d8e0b47. There's also a blog post on the feature, detailing how we're implementing SI for Scylla. The main limitation right now is that it only works on regular columns and only indexes new data as per MV implementation limitations.

Contributor

penberg commented Nov 4, 2017

First experimental version of CQL indexed queries landed in master in commit d8e0b47. There's also a blog post on the feature, detailing how we're implementing SI for Scylla. The main limitation right now is that it only works on regular columns and only indexes new data as per MV implementation limitations.

@ndelvalle

This comment has been minimized.

Show comment
Hide comment
@ndelvalle

ndelvalle Aug 1, 2018

Is this implemented on version 2.2? I'm trying to create an index as shown in the blog post using version 2.2.0-0.20180705.240b9f122 but I get the following error:

InvalidRequest: Error from server: code=2200 [Invalid query] message="Index support is not enabled"

ndelvalle commented Aug 1, 2018

Is this implemented on version 2.2? I'm trying to create an index as shown in the blog post using version 2.2.0-0.20180705.240b9f122 but I get the following error:

InvalidRequest: Error from server: code=2200 [Invalid query] message="Index support is not enabled"
@dorlaor

This comment has been minimized.

Show comment
Hide comment
@dorlaor

dorlaor Aug 1, 2018

Contributor
Contributor

dorlaor commented Aug 1, 2018

@tzach

This comment has been minimized.

Show comment
Hide comment
Contributor

tzach commented Aug 8, 2018

@ndelvalle

This comment has been minimized.

Show comment
Hide comment
@ndelvalle

ndelvalle Aug 13, 2018

Cool, thanks @tzach!
Is there a roadmap or something where I can check when I would be able to use this safely? I want to build my models using it, but I don't want to use it on production.
Thanks in advance!

ndelvalle commented Aug 13, 2018

Cool, thanks @tzach!
Is there a roadmap or something where I can check when I would be able to use this safely? I want to build my models using it, but I don't want to use it on production.
Thanks in advance!

@tzach

This comment has been minimized.

Show comment
Hide comment
@tzach

tzach Aug 13, 2018

Contributor

Sec index will be feature complete on the next release, Scylla 2.4, and production ready (after more testing) in the release after.

Contributor

tzach commented Aug 13, 2018

Sec index will be feature complete on the next release, Scylla 2.4, and production ready (after more testing) in the release after.

@LuaiKamel

This comment has been minimized.

Show comment
Hide comment
@LuaiKamel

LuaiKamel Aug 28, 2018

@tzach do you mean the full secondary index features(LIKE SASI OPERATION in CUSTOM INDEX which enable full text search) will be supported in 2.4?

LuaiKamel commented Aug 28, 2018

@tzach do you mean the full secondary index features(LIKE SASI OPERATION in CUSTOM INDEX which enable full text search) will be supported in 2.4?

@LuaiKamel

This comment has been minimized.

Show comment
Hide comment
@LuaiKamel

LuaiKamel Aug 28, 2018

@dorlaor @tzach you guys should check this paper regarding efficient secondary index on LSM based storage (scylla like)

LuaiKamel commented Aug 28, 2018

@dorlaor @tzach you guys should check this paper regarding efficient secondary index on LSM based storage (scylla like)

@nyh

This comment has been minimized.

Show comment
Hide comment
@nyh

nyh Aug 28, 2018

Contributor

@LuaiKamel the secondary index feature being developed for Scylla 2.4 is meant to be (more-or-less) compatible with Cassandra's original secondary index feature, not with SASI. It will not support, at least not initially, full text search, prefix or substring search, or numeric range searches.

Thinking ahead, adding support in our implementation for search-engine-style full text search should be fairly straightforward (it requires tokenizing and stemming of each string and indexing all the resulting words separately) but support for efficient prefix/suffix/substring/range searches will require additional changes, like using an order-preserving partitioner to allow efficiently scanning a range of index partitions.

Contributor

nyh commented Aug 28, 2018

@LuaiKamel the secondary index feature being developed for Scylla 2.4 is meant to be (more-or-less) compatible with Cassandra's original secondary index feature, not with SASI. It will not support, at least not initially, full text search, prefix or substring search, or numeric range searches.

Thinking ahead, adding support in our implementation for search-engine-style full text search should be fairly straightforward (it requires tokenizing and stemming of each string and indexing all the resulting words separately) but support for efficient prefix/suffix/substring/range searches will require additional changes, like using an order-preserving partitioner to allow efficiently scanning a range of index partitions.

@tzach

This comment has been minimized.

Show comment
Hide comment
@tzach

tzach Aug 28, 2018

Contributor

@nyh @LuaiKamel there is a SASI ticket #2203 for further discussion

Contributor

tzach commented Aug 28, 2018

@nyh @LuaiKamel there is a SASI ticket #2203 for further discussion

@doublex

This comment has been minimized.

Show comment
Hide comment
@doublex

doublex Sep 8, 2018

@nyh
Is the secondary index implemented as "distributed hash table" or as "distributed heap table" (=sorted)?

doublex commented Sep 8, 2018

@nyh
Is the secondary index implemented as "distributed hash table" or as "distributed heap table" (=sorted)?

@nyh

This comment has been minimized.

Show comment
Hide comment
@nyh

nyh Sep 12, 2018

Contributor

@doublex in our implementation the secondary index is a materialized view, i.e., an ordinary Scylla table, and as such it is what I guess you called a "distributed hash table". The keys held by each node are sorted, but the sort order is a token (by default - a hash function, the so-called Murmur3 hash of the key), so walking on these keys in that order is not helpful for implementing a range search (e.g., give me all keys alphabetically between the strings "cat" and "dog").
We could use a different partitioner - what determines the tokens - not the Mumur3 partitioner but something which preserves order. But we haven't yet tried doing that. One big problem is that doing this can also have negative implications on load balancing of nodes since if the data isn't distributed randomly between nodes, the load may be skewed more to some of the nodes...

Contributor

nyh commented Sep 12, 2018

@doublex in our implementation the secondary index is a materialized view, i.e., an ordinary Scylla table, and as such it is what I guess you called a "distributed hash table". The keys held by each node are sorted, but the sort order is a token (by default - a hash function, the so-called Murmur3 hash of the key), so walking on these keys in that order is not helpful for implementing a range search (e.g., give me all keys alphabetically between the strings "cat" and "dog").
We could use a different partitioner - what determines the tokens - not the Mumur3 partitioner but something which preserves order. But we haven't yet tried doing that. One big problem is that doing this can also have negative implications on load balancing of nodes since if the data isn't distributed randomly between nodes, the load may be skewed more to some of the nodes...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment