Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardware sizing calculation [Question] #27

Closed
m1sta opened this issue Jun 8, 2016 · 4 comments
Closed

Hardware sizing calculation [Question] #27

m1sta opened this issue Jun 8, 2016 · 4 comments

Comments

@m1sta
Copy link

m1sta commented Jun 8, 2016

Has elassandra been tested at scale? Are you able to provide any guidance to help calculate hardware requirements based on data set sizes?

@vroyer
Copy link
Collaborator

vroyer commented Jun 29, 2016

Hi,
You should have at least 32Gb per node, and configured elassandra with half the total RAM up to maximum of 30,5Gb. But if you are not aggregating on analyzed string fields, you can probably use less memory to improve file system cache.

For write, elassandra write throughput is roughly half the throughput of cassandra if you index all columns. In this scenario, Cassandra and Elasticsearch files with be roughly the same size, for example 100Gb of cassandra data generates 100Gb of elasticsearch index files if you map all columns, but again, write throughput and elasticsearch disk usage will depend on your elasticsearch mapping.

In your cassandra schema, avoid huge indexed wide rows, because locking on wide rows can dramatically affect write performance.

For good search performance, keep shards under 50Gb each, and use partitioned index to split a cassandra table to more than one shard per node, see http://doc.elassandra.io/en/latest/mapping.html#partitioned-index.

Finally, depending on your hardware, you should have a few Tb of data per node and add more nodes to improve performances (index/search throughput) and/or increase the volume of data. Keep in mind that elassandra should require less nodes than separate elasticsearch + cassandra nodes providing the same service.

Thanks',
Vincent

@ddorian
Copy link

ddorian commented Jul 6, 2016

@vroyer

Why do you say less nodes compared to cassandra + elasticsearch (assuming you don't store "_source" in elasticsearch) ? Or is that by lowering per-shard overhead since you can include everything in 1 shard while in normal-es you have to over provision the number of shards.

@RayWhitmer
Copy link

"For write, elassandra write throughput is roughly half the throughput of cassandra if you index all columns."

It would be nice to be able to count on highly-optimized Cassandra write efficiency for significant operations on large data sets. Cassandra is UPSERT optimised, e.g. update a single property without reading the existing record. If a single property change is appended in Cassandra, then will Elassandra be smart enough to only reindex the one property, or will it reindex all properties for much less than half the throughput on single-property updates?

@ddorian
Copy link

ddorian commented Aug 17, 2016

@RayWhitmer it will reindex the full document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants