does each elassandra node only has index information for its own node and any benchmarks? #35

krishna81m · 2016-07-13T02:50:12Z

Does each elassandra node only has index information for its own node, i.e., it indices only the data it owns by the node's assigned partition keys/tokens?

We had a very bad experience with the our old DSE cassandra 1.2 + SOLR setup, that any SOLR query would fan out to all nodes because each node would contain index information of the node itself and when you have an index on non-partition key columns, the indexed data "has to be" searched on all nodes. The more the nodes the higher the IO and slower the calls.

I could not find any documentation on how it fares on throughput or any benchmarks or comparison with other cassandra SOLR indices out there.
http://elassandra.readthedocs.io/en/latest/integration.html

ddorian · 2016-07-13T10:07:42Z

Yes only index the data that it already has locally.

Looking at datastax docs, it IS possible currently to specify which node to query.

Just like in normal cassandra, when you filter by partition key(s) or _routing field in es, you will hit a minimum number of nodes.

While if you don't user partition/_routing, you will hit all nodes.

krishna81m · 2016-07-13T20:03:21Z

So, is it possible to execute a single query that has both: partition or routing keys (pick the right Cassandra node) and SOLR indices on the specific node? That seems to be a good compromise and most cassandra table designs are! Are there any such example queries?

ddorian · 2016-07-13T20:09:54Z

Yes, you can search by using the elastics-search rest-api and use _routing like you normally do. I don't know what SOLR has to do, it's not included.

krishna81m · 2016-07-13T20:14:36Z

Sorry I meant elastic search, rest-api would be way slow isn't it compared to other native protocol calls.

ddorian · 2016-07-13T20:27:37Z

For now you have to use the rest-api since it's not yet possible to make the search query from the cql interface. See here: #14

krishna81m · 2016-07-13T20:41:23Z

Thanks so much! What about any benchmarks with other indices?

vroyer · 2016-07-13T22:20:14Z

ddorian is right.

In addition, You can also set preference=_only_local in your query string to search only on a specific node, or even specify your token_ranges to restrict the search on nodes hosting these ranges.

In addition, the search on the cassandra ring is managed by a index.default_search_strategy class. By default, this search strategy is org.elasticsearch.cassandra.cluster.routing.PrimaryFirstSearchStrategy that distribute a search to all alive nodes in the datacenter. If a node is unavailable, its primary token ranges are served by available repilca. This strategy class is a new index setting in elassandra.

Another strategy could be to search on N / RF nodes (where N= number of nodes in your elassandra datacenter and RF its replication factor) to cover 100% of the ring with a minimum of nodes. To do that, you just have to subclass the AbstractSearchStrategy.

For search performance, you have a cassandra overhead compared to a regular elasticsearch. Fetching a field or the _source require a cassandra read using the cache according to your cassandra configuration (key and row off heap memory cache may help....). BTW, i have to do some benchmarks....

krishna81m · 2016-07-13T23:05:10Z

Thanks!

vroyer · 2016-07-23T14:30:16Z

In addition, you can use the elasticsearch binary protocol to query elassandra with the java elasticsearch client API or even the elasticsearch JDBC driver sql4es, ses http://doc.elassandra.io/en/latest/integration.html#jdbc-driver-sql4es-elassandra

This driver translates SQL to an elasticsearch binary request.

DBarthe added the question label Jul 4, 2017

vroyer closed this as completed Aug 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

does each elassandra node only has index information for its own node and any benchmarks? #35

does each elassandra node only has index information for its own node and any benchmarks? #35

krishna81m commented Jul 13, 2016 •

edited

ddorian commented Jul 13, 2016

krishna81m commented Jul 13, 2016

ddorian commented Jul 13, 2016

krishna81m commented Jul 13, 2016

ddorian commented Jul 13, 2016

krishna81m commented Jul 13, 2016

vroyer commented Jul 13, 2016

krishna81m commented Jul 13, 2016

vroyer commented Jul 23, 2016

does each elassandra node only has index information for its own node and any benchmarks? #35

does each elassandra node only has index information for its own node and any benchmarks? #35

Comments

krishna81m commented Jul 13, 2016 • edited

ddorian commented Jul 13, 2016

krishna81m commented Jul 13, 2016

ddorian commented Jul 13, 2016

krishna81m commented Jul 13, 2016

ddorian commented Jul 13, 2016

krishna81m commented Jul 13, 2016

vroyer commented Jul 13, 2016

krishna81m commented Jul 13, 2016

vroyer commented Jul 23, 2016

krishna81m commented Jul 13, 2016 •

edited