Skip to content

joswlv/Spark2CassandraBulkLoad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark2CassandraBulkLoad

Hits Build Status Download

Spark Library for Bulk Loading into Cassandra

This project refers to Spark2Cassandra

Upgrade utility(spark, cassandra) version.

Features

  1. Convert rdd or dataframe to SSTableFile.
  2. Stream the SSTableFile to Cassandra nodes.

Requirements

Spark2CassandraBulkLoad supports Spark 2.x and above.

Spark2CassandraBulkLoad Version Spark Cassandra connector Version Cassandra Java Driver Version JDK Version
1.X.X [2.0, 2.5) [,4.0) 1.8+

Downloads

SBT

libraryDependencies += "com.joswlv.spark.cassandra.bulk" %% "Spark2CassandraBulkLoad" % "1.0.3"

Maven (JCenter)

<dependency>
	<groupId>com.joswlv.spark.cassandra.bulk</groupId>
	<artifactId>Spark2CassandraBulkLoad</artifactId>
	<version>1.0.3</version>
</dependency>

gradle

compile 'com.joswlv.spark.cassandra.bulk:Spark2CassandraBulkLoad:1.0.3'

Usage

Bulk Loading into Cassandra

// Import the following to have access to the `bulkLoadToCass()` function for RDDs or DataFrames.
import com.joswlv.spark.cassandra.bulk.rdd._
import com.joswlv.spark.cassandra.bulk.sql._

// Specify the `keyspaceName` and the `tableName` to write.
rdd.bulkLoadToCass(
  keyspaceName = "keyspaceName",
  tableName = "tableName"
)

// Specify the `keyspaceName` and the `tableName` to write.
df.bulkLoadToCass(
  keyspaceName = "keyspaceName",
  tableName = "tableName"
)