Skip to content
Migrate data extract using Spark to Scylla, normally from Cassandra
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
project
spark-cassandra-connector @ b5c7eca
src/main/scala/com/scylladb/migrator
.gitignore
.gitmodules
.scalafmt.conf
LICENSE
README.md
build.sbt
build.sh
config.yaml.example
docker-compose.yaml

README.md

Building

Make sure sbt is installed on your machine, and run build.sh.

Configuring the Migrator

Create a config.yaml for your migration using the template config.yaml in the repository root. Read the comments throughout carefully.

Running on a live Spark cluster

The Scylla Migrator is built against Spark 2.3.1, so you'll need to run that version on your cluster.

After running build.sh, copy the jar from ./target/scala-2.11/scylla-migrator-assembly-0.0.1.jar and the config.yaml you've created to the Spark master server.

Then, run this command on the Spark master server:

spark-submit --class com.scylladb.migrator.Migrator \
  --master spark://<spark-master-hostname>:7077 \
  --conf spark.scylla.config=<path to config.yaml>
  <path to scylla-migrator-assembly-0.0.1.jar>

Running locally

To run in the local Docker-based setup:

  1. First start the environment:
docker-compose up -d
  1. Launch cqlsh in Cassandra's container and create a keyspace and a table with some data:
docker-compose exec cassandra cqlsh
<create stuff>
  1. Launch cqlsh in Scylla's container and create the destination keyspace and table with the same schema as the source table:
docker-compose exec scylla cqlsh
<create stuff>
  1. Edit the config.yaml file; note the comments throughout.

  2. Run build.sh.

  3. Then, launch spark-submit in the master's container to run the job:

docker-compose exec spark-master spark-submit --class com.scylladb.migrator.Migrator \
  --master spark://spark-master:7077 \
  --conf spark.driver.host=spark-master \
  --conf spark.scylla.config=/app/config.yaml \
  /jars/scylla-migrator-assembly-0.0.1.jar

The spark-master container mounts the ./target/scala-2.11 dir on /jars and the repository root on /app. To update the jar with new code, just run build.sh and then run spark-submit again.

You can’t perform that action at this time.