Python toolkit for Cassandra Time-Series Data
With simple changes to the schema, this can be adapted to a variety of time-series date. Feel free to ping me for guidance after you go through the tutorial below.
-
pycass.py: This is the base code that sets up basic and advanced time-series schemas, reads, writes, etc.
-
cass_worker.py: This is a higher level code that works on top of pycass.py. It contains a basic function for batch writes, a safe connection method in the rare case the Cassandra connection fails, and an example of how to format data to your the schema.
I have prepared a tutorial on using Cassandra for advanced time series data due to multiple requests.
The blogs go from basic to advanced. I have also included some Cassandra CQL basics that will help with schema design and key indexing.
http://planetcassandra.org/blog/post/getting-started-with-time-series-data-modeling/
http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1
http://www.ebaytechblog.com/2012/08/14/cassandra-data-modeling-best-practices-part-2
http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
(The last one is most relevant for large scale time-series logging from varying sources, but build up to it.)
http://planetcassandra.org/blog/post/datastax-developer-blog-cql3-for-cassandra-experts/
http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_compound_keys_c.html
If you have any questions find me on Linkedin (Pete Perlegos) and reference the topic.