New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
does each elassandra node only has index information for its own node and any benchmarks? #35
Comments
Yes only index the data that it already has locally. Looking at datastax docs, it IS possible currently to specify which node to query. Just like in normal cassandra, when you filter by partition key(s) or _routing field in es, you will hit a minimum number of nodes. While if you don't user partition/_routing, you will hit all nodes. |
So, is it possible to execute a single query that has both: partition or routing keys (pick the right Cassandra node) and SOLR indices on the specific node? That seems to be a good compromise and most cassandra table designs are! Are there any such example queries? |
Yes, you can search by using the elastics-search rest-api and use _routing like you normally do. I don't know what SOLR has to do, it's not included. |
Sorry I meant elastic search, rest-api would be way slow isn't it compared to other native protocol calls. |
For now you have to use the rest-api since it's not yet possible to make the search query from the cql interface. See here: #14 |
Thanks so much! What about any benchmarks with other indices? |
ddorian is right. In addition, You can also set preference=_only_local in your query string to search only on a specific node, or even specify your token_ranges to restrict the search on nodes hosting these ranges. In addition, the search on the cassandra ring is managed by a index.default_search_strategy class. By default, this search strategy is org.elasticsearch.cassandra.cluster.routing.PrimaryFirstSearchStrategy that distribute a search to all alive nodes in the datacenter. If a node is unavailable, its primary token ranges are served by available repilca. This strategy class is a new index setting in elassandra. Another strategy could be to search on N / RF nodes (where N= number of nodes in your elassandra datacenter and RF its replication factor) to cover 100% of the ring with a minimum of nodes. To do that, you just have to subclass the AbstractSearchStrategy. For search performance, you have a cassandra overhead compared to a regular elasticsearch. Fetching a field or the _source require a cassandra read using the cache according to your cassandra configuration (key and row off heap memory cache may help....). BTW, i have to do some benchmarks.... |
Thanks! |
In addition, you can use the elasticsearch binary protocol to query elassandra with the java elasticsearch client API or even the elasticsearch JDBC driver sql4es, ses http://doc.elassandra.io/en/latest/integration.html#jdbc-driver-sql4es-elassandra This driver translates SQL to an elasticsearch binary request. |
Does each elassandra node only has index information for its own node, i.e., it indices only the data it owns by the node's assigned partition keys/tokens?
We had a very bad experience with the our old DSE cassandra 1.2 + SOLR setup, that any SOLR query would fan out to all nodes because each node would contain index information of the node itself and when you have an index on non-partition key columns, the indexed data "has to be" searched on all nodes. The more the nodes the higher the IO and slower the calls.
I could not find any documentation on how it fares on throughput or any benchmarks or comparison with other cassandra SOLR indices out there.
http://elassandra.readthedocs.io/en/latest/integration.html
The text was updated successfully, but these errors were encountered: