Skip to content
shorrockin edited this page Sep 14, 2010 · 25 revisions

Cascal Introduction

Cascal is a simple Cassandra library built on the Scala language that provides a very consistent and simple means by which you can interact with the Cassandra system.

Cascal has several goals including:

  • Construct a way to use the Thrift library in a manner more conducive with the Scala language.
  • Ensure that no Cassandra specific Thrift libraries need to be learned or used.
  • Create a type safe model that mimics the Cassandra system while providing various abstractions on top.
  • Built in support for connection Pooling.
  • Provide a simple means to convert Cascal results (list/get) into Scala domain objects.
  • 100% usable with Maven – No need to hunt down jars.

Installation Instructions

To get Cascal you can either use Maven, or get the sources directly from GitHub and build it yourself. To Use maven you need to first add the shorrockin.com maven repository, then add the dependency. The XML for this is as follows:


  <dependencies>
    <dependency>
      <groupId>com.shorrockin</groupId>
      <artifactId>cascal</artifactId>
      <version>1.3-SNAPSHOT</version>
    </dependency>
  </dependencies>

  <repositories>
    <repository>
      <id>shorrockin.com</id>
      <name>Shorrockin Repository</name>
      <url>http://maven.shorrockin.com/</url>
    </repository>
  </repositories>

Once you’ve added this to your pom.xml file you should be ready to start using Cascal.

Usage Instructions

Using Cascal is quite simple, you create a path of immutable structures to model what you want to work with, then use this object with a Cascal session to perform a variety of different functions.

Creating Cascal Paths

The Cascal model mirrors the Cassandra data model with several supertypes on top to identify common features across constructs. While familiarity with Cassandra will help with Cascal usage, it is not required. The terms objects referenced within this document are:

  • Keyspace: If you consider Cassandra as a 4 or 5 dimensional map, the Keyspace is the first dimension of this map.
  • ColumnFamily: If you continue to think of Cassandra as a map, the column ColumnFamily can be thought of as the 2nd dimension of the map.
    • StandardColumnFamily: column families come in two types. A standard column family is used when you wish to model a 4 dimension map. That is a map which follows Keyspace → StandardColumnFamily → StandardKey → Column.
    • SuperColumnFamily: the second type of column family. A super column family is used when you wish to model a 5 dimension map. That is a map which follows Keyspace → SuperColumnFamily → SuperKey → SuperColumn → Column.
  • Key: a key is the 3rd dimension of our map. It is a string identifier which maps to either a Column or a SuperColumn depending on the family in which its used.
    • StandardKey: a generalization of a Key that is used in a StandardColumnFamily.
    • SuperKey: a generalization of a Key that is used in SuperColumnFamily.
  • Column: the 4th (if used in a StandardColumnFamily) or 5th (if used in a SuperColumnFamily) dimension of the hash. Contains a name, value, and timestamp.
  • SuperColumn: if using a SuperColumnFamily the super column is the 4th dimension of the hash. Contains many ordered according to the Cassandra order configuration.
  • ColumnContainer: a categorization type which is that the object holds some typ of Column, is either a Key (standard or super) or a Supercolumn.
  • Gettable: a categorization type which denotes that an object may be used in the Session.get() method. Will be either a Column or a SuperColumn. looking for a better name.

A visual representation of the information above:

While this looks complicated, much of this information obvious when you start to use Cascal and think of Cassandra as a large hash. All paths are created using the \ method, and must begin with a Keyspace.

For example the following showcases how to create various objects in Cascal, using the Scala REPL:


scala> import com.shorrockin.cascal.utils.Conversions._
import com.shorrockin.cascal.utils.Conversions._

scala> println("Keyspace" \ "StandardCF")
Keyspace(value = Keyspace) \ StandardColumnFamily(value = StandardCF)

scala> println("Keyspace" \\ "SuperCF")
Keyspace(value = Keyspace) \\ SuperColumnFamily(value = SuperCF)

scala> println("Keyspace" \ "StandardCF" \ "StandardKey")
Keyspace(value = Keyspace) \ StandardColumnFamily(value = StandardCF) \ StandardKey(value = StandardKey)

scala> println("Keyspace" \\ "SuperCF" \ "SuperKey")
Keyspace(value = Keyspace) \\ SuperColumnFamily(value = SuperCF) \ SuperKey(value = SuperKey)

scala> println("Keyspace" \ "StandardCF" \ "StandardKey" \ ("Column-Name", "Column-Value"))
Keyspace(value = Keyspace) \ StandardColumnFamily(value = StandardCF) \ StandardKey(value = StandardKey) \ Column(name = [B@1377711, value = [B@6a00ca, time = 1270832018305)

scala> println("Keyspace" \ "StandardCF" \ "StandardKey" \ "Column-Name" \ "Column-Value")
Keyspace(value = Keyspace) \ StandardColumnFamily(value = StandardCF) \ StandardKey(value = StandardKey) \ Column(name = [B@9bb457, value = [B@5ce0fe, time = 1270832067493)

scala> import com.shorrockin.cascal.utils.UUID
import com.shorrockin.cascal.utils.UUID

scala> val superKey = "Keyspace" \\ "SuperCF" \ "SuperKey"
superKey: com.shorrockin.cascal.model.SuperKey = Keyspace(value = Keyspace) \\ SuperColumnFamily(value = SuperCF) \ SuperKey(value = SuperKey)pri

scala> println(superKey \ UUID())
Keyspace(value = Keyspace) \\ SuperColumnFamily(value = SuperCF) \ SuperKey(value = SuperKey) \ SuperColumn(value = [B@a553e2)

scala> println(superKey \ UUID() \ ("Column-Name", "Column-Value"))
Keyspace(value = Keyspace) \\ SuperColumnFamily(value = SuperCF) \ SuperKey(value = SuperKey) \ SuperColumn(value = [B@130661d) \ Column(name = [B@80370d, value = [B@19e3bdd, time = 1270832238230)

In this example we use the Conversions to implicitly convert a String into a Keyspace, as well as to implicitly convert strings into bytes for the objects which require them. We also use the UUID object to easily create UUIDs which can be used as Cassandra TimeUUID.

Once we have a keyspace all sub-values are chained together using the \ method. The return object always contains a reference to the object which created it – parent objects however do not have references to their children. The toString method, which is defined on all cascal path objects is such that it displays the entire structure of how a path component was created.

Creating A Cascal Session

A cascal session holds a connection to the Cassandra DB. It can either be created explicitly, or created through the provided SessionPool. For all but the most basic examples it is recommended that you use the SessionPool.

The session pool takes in a sequence of host objects and a pool parameter object. Once you have a reference to a SessionPool you can call the borrow method, passing in a function, to execute some Cassandra logic.


  val hosts  = Host("localhost", 9160, 250) :: Nil
  val params = new PoolParams(10, ExhaustionPolicy.Fail, 500L, 6, 2)
  val pool   = new SessionPool(hosts, params, Consistency.One)  

  pool.borrow { session =>
    log.debug("Count Value: " + session.count("Test" \ "Standard" \ "1"))
  }

Using A Cascal Session

Cascal sessions support a variety of operations, including:

  • get: takes in a column (with only name populated) or a super column.
    • column returns a column.
    • super column returns a sequence of columns.
  • insert: takes in and inserts a column.
  • count: counts the number of entries in the specified column container.
  • remove: removes the specified column container, or column.
  • list: performs different types of lists (slices) based on the object provided, examples are:
    • lists the contents of the specified column container using the specified predicate.
    • lists the contents of all sequence of keys
    • lists the contents of a range of keys
  • batch: performs a batch function. Takes in a sequence of Operations. Operations are either:
    • Insert – an insert operation takes in a column to insert.
    • Delete – deletes a column container using the specified predicate.

All methods have multiple method signatures which allow you to control in various levels of details (predicate, consistency, etc) what you want to retrieve.

The scaladocs should provide more information on all the various method signatures and objects used in Session.

Cascal Examples

The following are some usage examples on how to use Cascal. Most of these examples assume session already exists and the Conversions object has been imported for implicit conversion of string → bytes and string → column family:

Insertion


  session.insert("Test" \ "Standard" \ "Key" \ ("ColumnName", "ColumnValue") // standard column family
  session.insert("Test" \\ "Super" \ "Key" \ "SuperColumn" \ ("ColumnName", "ColumnValue") // super column family

Get


  // returns Option[Column]
  session.get("Test" \ "Standard" \ "Key" \ "ColumnName")  // standard column family
  session.get("Test" \\ "Super" \ "Key" \ "SuperColumn" \ "ColumnName")  // super column family

  // returns Option[Seq[Column]]
  session.get("Test" \\ "Super" \ "Key" \ "SuperColumn") // super column family

Count


  session.count("Test" \ "Standard" \ "Key") // standard column family
  session.count("Test" \\ "Super" \ "Key") // super column family
  session.count("Test" \\ "Super" \ "Key" \ "SuperColumn") // super column family

Remove


  session.remove("Test" \ "Standard" \ "Key" \ "Column") // standard column family
  session.remove("Test" \ "Standard" \ "Key") // standard column family
  session.remove("Test" \\ "Super" \ "Key") // super column family
  session.remove("Test" \\ "Super" \ "Key" \ "SuperColumn" \ "Column") // super column family
  session.remove("Test" \\ "Super" \ "Key" \ "SuperColumn") // super column family

List


  val key = "Test" \ "Standard" \ "Key"
  session.list(key) // returns Seq[Column]
  session.list(key, RangePredicate("Column-1", "Column-3"))
  session.list(key, ColumnPredicate(List("Column-1", "Column-3")))

  val superKey = "Test" \\ "Super" \ "Key"
  session.list(superKey) // returns Map[SuperColumn, Seq[Column]]

  val superCol = "Test" \\ "Super" \ "Key" \ "SuperColumn"
  session.list(superCol) // returns Seq[Column]

  val family = "Test" \ "Standard"
  session.list(family, KeyRange("Key1", "Key3", 100)

Batch Insert


    val key  = "Test" \ "Standard" \ UUID()
    val col1 = key \ ("Column-1", "Value-1")
    val col2 = key \ ("Column-2", "Value-2")
    val col3 = key \ ("Column-3", "Value-3")

    session.batch(Insert(col1) :: Insert(col2) :: Insert(col3))

Batch Delete


    val key  = "Test" \ "Standard" \ UUID()
    val col3 = key \ ("Column-3", "Value-3")

    session.batch(Delete(key, ColumnPredicate("Column-1" :: "Column-2" :: Nil)) :: Insert(col3))

Cascal Object Mapping

At current Cascal contains a very simple object mapping framework (available in 1.1-SNAPSHOT+) that you can optionally use to reduce a lot of the heavy lifting involved in converting the results of a list or get request into a scala object for some common circumstances. This framework has a few restrictions:

  • mapped classes must have all values provided as constructor parameters (case classes work well)
  • mapped columns must have a column name that is defined as a String.

This is done by annotating a Scala case class (not restricted to – however – most common choice) constructor with information regarding how the various constructor parameters should be retrieved. All annotations are in the com.shorrockin.cascal.serialization.annotations package and include:

  • @Keyspace: must exist as an annotation on the class level. Has a value equal to the name of the keyspace that this class is mapped to.
  • @Family: must exist as an annotation on the class level. Has a value equal to the name of the family that this class is mapped to.
  • @Super: if a class is mapped to a super column family this annotation will be provided on the class level. It does not take in any parameters.
  • @Key: if defined on a constructor parameter it indicates that the parameter provided is mapped to the Key value. May only exist on a single constructor parameter. Must be of type String.
  • @SuperColumn: if defined on a constructor parameter it indicates that the parameter provided is mapped to the super column value. May only exist on a single constructor parameter. The type will be automatically converted from an Array[Byte] into the type provided using the Serializers in the Converter.
  • @Value: if defined on a constructor parameter it indicates that the parameter provided is mapped to a column value. The name of the column that this is mapped to is passed into this annotation as the value. May exist on multiple constructor parameters. The type will be automatically converted from an Array[Byte] into the type provided using the Serializers in the Converter.
  • @Optional: if defined on a constructor parameter it indicates that the parameter provider is mapped to a column value which may or may not exist. Must be used a parameters of type scala.Option. The name of the column, as well as the type (due to the JVMs type erasure we are unable to determine the type through reflection) are provided as input values to this annotation. May exist on multiple constructor parameters. The type will be automatically converted from an Array[Byte] into the type provided using the Serializers in the Converter.

Next we’ll provide some examples on how this works. First we’ll examine a standard column family type of map:

 
@Keyspace("Example")
@Family("StandardCF")
case class MyObject(@Key val key:String, 
                    @Value("State") val state:String,
                    @Value("Created") val created:Date) {

In this example we create an object where the constructor parameters or populated based on the key value, as well as the column values from the columns named “State” and “Created”. For example, given that we inserted into this family like such:

 
val stateColumn   = Insert("Example" \ "StandardCF" \ "Key" \ "State" \ "this is my state")
val createdColumn = Insert("Example" \ "StandardCF" \ "Key" \ "Created" \ new Date)
session.batch(stateColumn :: createdColumn)

You could then retrieve this value and map it into a instance of MyObject like such:

 
val results  = session.list("Example" \ "StandardCF" \ "Key)
val myObject = Converter[MyObject](results)

Which is dramatically easier that attempting to iterate over the columns yourself, map the byte arrays into values and determine how to create your object. To perform similar functions on super column families you need to add the @Super annotation at the class level. In doing so you can map the value of a super column into one of your constructors. An example may look like:

 
@Keyspace("Example")
@Family("SuperCF")
@Super
case class MyObject(@Key val key:String,
                    @SuperColumn val id:UUID, 
                    @Value("State") val state:String,
                    @Value("Created") val created:Date)

If you run into a situation where a column may or may not be provided you can replace your usage of @Value with @Optional. In doing so your type must be of type scala.Option. If the value is provided in the Seq[Column] passed into the converter it will be set as Some, otherwise it will be set as none. When using @Optional you must provide the type of column, in addition to the column name as (due to type erasure) reflection is not able to infer the sub-type of Option. An example may look like:

 
@Keyspace("Example")
@Family("StandardCF")
case class MyObject(@Key val key:String, 
                    @Value("State") val state:String,
                    @Optional { val column="Created", val as=classOf[Date]) val created:Option[Date])

When using the Converter class there are a number of default serializers to convert objects to and from byte arrays. These include:

  • String
  • UUID
  • Int
  • Long
  • Boolean
  • Float
  • Double
  • Date

However – if you require an additional type (that doesn’t make sense to request as an official patch) then you can simply create your own instance of a Convert by passing in a Map[Class, Serializer]. If you go down this route you can obtain the current default Serializers in the Serializer#Default field.