Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

does each elassandra node only has index information for its own node and any benchmarks? #35

Closed
krishna81m opened this issue Jul 13, 2016 · 9 comments
Labels

Comments

@krishna81m
Copy link

krishna81m commented Jul 13, 2016

Does each elassandra node only has index information for its own node, i.e., it indices only the data it owns by the node's assigned partition keys/tokens?

We had a very bad experience with the our old DSE cassandra 1.2 + SOLR setup, that any SOLR query would fan out to all nodes because each node would contain index information of the node itself and when you have an index on non-partition key columns, the indexed data "has to be" searched on all nodes. The more the nodes the higher the IO and slower the calls.

I could not find any documentation on how it fares on throughput or any benchmarks or comparison with other cassandra SOLR indices out there.
http://elassandra.readthedocs.io/en/latest/integration.html

@ddorian
Copy link

ddorian commented Jul 13, 2016

Yes only index the data that it already has locally.

Looking at datastax docs, it IS possible currently to specify which node to query.

Just like in normal cassandra, when you filter by partition key(s) or _routing field in es, you will hit a minimum number of nodes.

While if you don't user partition/_routing, you will hit all nodes.

@krishna81m
Copy link
Author

So, is it possible to execute a single query that has both: partition or routing keys (pick the right Cassandra node) and SOLR indices on the specific node? That seems to be a good compromise and most cassandra table designs are! Are there any such example queries?

@ddorian
Copy link

ddorian commented Jul 13, 2016

Yes, you can search by using the elastics-search rest-api and use _routing like you normally do. I don't know what SOLR has to do, it's not included.

@krishna81m
Copy link
Author

Sorry I meant elastic search, rest-api would be way slow isn't it compared to other native protocol calls.

@ddorian
Copy link

ddorian commented Jul 13, 2016

For now you have to use the rest-api since it's not yet possible to make the search query from the cql interface. See here: #14

@krishna81m
Copy link
Author

Thanks so much! What about any benchmarks with other indices?

@vroyer
Copy link
Collaborator

vroyer commented Jul 13, 2016

ddorian is right.

In addition, You can also set preference=_only_local in your query string to search only on a specific node, or even specify your token_ranges to restrict the search on nodes hosting these ranges.

In addition, the search on the cassandra ring is managed by a index.default_search_strategy class. By default, this search strategy is org.elasticsearch.cassandra.cluster.routing.PrimaryFirstSearchStrategy that distribute a search to all alive nodes in the datacenter. If a node is unavailable, its primary token ranges are served by available repilca. This strategy class is a new index setting in elassandra.

Another strategy could be to search on N / RF nodes (where N= number of nodes in your elassandra datacenter and RF its replication factor) to cover 100% of the ring with a minimum of nodes. To do that, you just have to subclass the AbstractSearchStrategy.

For search performance, you have a cassandra overhead compared to a regular elasticsearch. Fetching a field or the _source require a cassandra read using the cache according to your cassandra configuration (key and row off heap memory cache may help....). BTW, i have to do some benchmarks....

@krishna81m
Copy link
Author

Thanks!

@vroyer
Copy link
Collaborator

vroyer commented Jul 23, 2016

In addition, you can use the elasticsearch binary protocol to query elassandra with the java elasticsearch client API or even the elasticsearch JDBC driver sql4es, ses http://doc.elassandra.io/en/latest/integration.html#jdbc-driver-sql4es-elassandra

This driver translates SQL to an elasticsearch binary request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants