-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe internals, how es segments/source/fields are stored/merged and search-speed compared to normal-es #24
Comments
Now that I reread it, it says "elasticsearch on top of cassandra". Meaning more description of internals ? |
Hi, Yes, _source is stored in cassandra, each field in a cassandra column stored in SSTable. Lucene files are still managed by the original elasticsearch code, and search features remain unchanged. During the fetch phase of a search, requested field are retreived from the underlying cassandra table through a CQL request. If you update a cassandra cell, a secondary index rebuild the document from the updated row and index it in elasticsearch. Search then remain unchanged. Of course, to avoid duplicate results if your cassandra replication factor > 1, every document is indexed with a token fields (murmur3 of the partition key), and a token filter is added to every search request to avoid duplicate results. This filter is computed from the routing table of the coordinator node, see token_ranges. Hope this help. |
Do you do something like Datastax Solr does when doing global queries to remove duplicates ?
I mean, you say you do, but on this page https://github.com/vroyer/elassandra/blob/master/cross-datacenter-replication.md you say:
And isn't elasitcHQ doing just a count there ? Doesn't the count have automatic token filtering ? Thanks |
ElasticHQ issue a stats request and get shards information including size on disk for primary shards, so no filtering is involved. Of course, this is erroneous in elassandra, because a shards may contains data for primary and non-primary token ranges. |
Is _source saved in es? If yes, is it possible to disable it (and only store in cassandra) ?
The text was updated successfully, but these errors were encountered: