Configuration Reference

Cassandra Authentication Parameters

All parameters should be prefixed with spark.cassandra.

Property Name	Default	Description
`auth.conf.factory`	DefaultAuthConfFactory	Name of a Scala module or class implementing AuthConfFactory providing custom authentication configuration

Cassandra Connection Parameters

All parameters should be prefixed with spark.cassandra.

Property Name	Default	Description
`connection.compression`		Compression to use (LZ4, SNAPPY or NONE)
`connection.connections_per_executor_max`	None	Maximum number of connections per Host set on each Executor JVM. Will be updated to DefaultParallelism / Executors for Spark Commands. Defaults to 1 if not specifying and not in a Spark Env
`connection.factory`	DefaultConnectionFactory	Name of a Scala module or class implementing CassandraConnectionFactory providing connections to the Cassandra cluster
`connection.host`	localhost	Contact point to connect to the Cassandra cluster. A comma separated list may also be used. ("127.0.0.1,192.168.0.1")
`connection.keep_alive_ms`	5000	Period of time to keep unused connections open
`connection.local_dc`	None	The local DC to connect to (other nodes will be ignored)
`connection.port`	9042	Cassandra native connection port
`connection.reconnection_delay_ms.max`	60000	Maximum period of time to wait before reconnecting to a dead node
`connection.reconnection_delay_ms.min`	1000	Minimum period of time to wait before reconnecting to a dead node
`connection.timeout_ms`	5000	Maximum period of time to attempt connecting to a node
`query.retry.count`	10	Number of times to retry a timed-out query
`read.timeout_ms`	120000	Maximum period of time to wait for a read to return

Cassandra DataFrame Source Parameters

All parameters should be prefixed with spark.cassandra.

Property Name	Default	Description
`sql.pushdown.additionalClasses`		A comma separated list of classes to be used (in order) to apply additional pushdown rules for Cassandra Dataframes. Classes must implement CassandraPredicateRules
`table.size.in.bytes`	None	Used by DataFrames Internally, will be updated in a future release to retrieve size from Cassandra. Can be set manually now

Cassandra SQL Context Options

All parameters should be prefixed with spark.cassandra.

Property Name	Default	Description
`sql.cluster`	default	Sets the default Cluster to inherit configuration from

Cassandra SSL Connection Options

All parameters should be prefixed with spark.cassandra.

Property Name	Default	Description
`connection.ssl.clientAuth.enabled`	false	Enable 2-way secure connection to Cassandra cluster
`connection.ssl.enabled`	false	Enable secure connection to Cassandra cluster
`connection.ssl.enabledAlgorithms`	Set(TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA)	SSL cipher suites
`connection.ssl.keyStore.password`	None	Key store password
`connection.ssl.keyStore.path`	None	Path for the key store being used
`connection.ssl.keyStore.type`	JKS	Key store type
`connection.ssl.protocol`	TLS	SSL protocol
`connection.ssl.trustStore.password`	None	Trust store password
`connection.ssl.trustStore.path`	None	Path for the trust store being used
`connection.ssl.trustStore.type`	JKS	Trust store type

Custom Cassandra Type Parameters (Expert Use Only)

All parameters should be prefixed with spark.cassandra.

Property Name	Default	Description
`dev.customFromDriver`	None	Provides an additional class implementing CustomDriverConverter for those clients that need to read non-standard primitive Cassandra types. If your Cassandra implementation uses a Java Driver which can read DataType.custom() you may need it this. If you are using OSS Cassandra this should never be used.

Read Tuning Parameters

All parameters should be prefixed with spark.cassandra.

Property Name	Default	Description
`input.consistency.level`	LOCAL_ONE	Consistency level to use when reading
`input.fetch.size_in_rows`	1000	Number of CQL rows fetched per driver request
`input.join.throughput_query_per_sec`	2147483647	Deprecated Please use input.reads_per_sec. Maximum read throughput allowed per single core in query/s while joining RDD with Cassandra table
`input.metrics`	true	Sets whether to record connector specific metrics on write
`input.reads_per_sec`	2147483647	Sets max requests per core per second for joinWithCassandraTable and some Enterprise integrations
`input.split.size_in_mb`	64	Approx amount of data to be fetched into a Spark partition. Minimum number of resulting Spark partitions is `1 + 2 * SparkContext.defaultParallelism`

Write Tuning Parameters

All parameters should be prefixed with spark.cassandra.

Property Name	Default	Description
`output.batch.grouping.buffer.size`	1000	How many batches per single Spark task can be stored in memory before sending to Cassandra
`output.batch.grouping.key`	Partition	Determines how insert statements are grouped into batches. Available values are `none` : a batch may contain any statements `replica_set` : a batch may contain only statements to be written to the same replica set `partition` : a batch may contain only statements for rows sharing the same partition key value
`output.batch.size.bytes`	1024	Maximum total size of the batch in bytes. Overridden by spark.cassandra.output.batch.size.rows
`output.batch.size.rows`	None	Number of rows per single batch. The default is 'auto' which means the connector will adjust the number of rows based on the amount of data in each row
`output.concurrent.writes`	5	Maximum number of batches executed in parallel by a single Spark task
`output.consistency.level`	LOCAL_QUORUM	Consistency level for writing
`output.ifNotExists`	false	Determines that the INSERT operation is not performed if a row with the same primary key already exists. Using the feature incurs a performance hit.
`output.ignoreNulls`	false	In Cassandra >= 2.2 null values can be left as unset in bound statements. Setting this to true will cause all null values to be left as unset rather than bound. For finer control see the CassandraOption class
`output.metrics`	true	Sets whether to record connector specific metrics on write
`output.throughput_mb_per_sec`	2.147483647E9	(Floating points allowed) Maximum write throughput allowed per single core in MB/s. Limit this on long (+8 hour) runs to 70% of your max throughput as seen on a smaller job for stability
`output.timestamp`	0	Timestamp (microseconds since epoch) of the write. If not specified, the time that the write occurred is used. A value of 0 means time of write.
`output.ttl`	0	Time To Live(TTL) assigned to writes to Cassandra. A value of 0 means no TTL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reference.md

reference.md

Configuration Reference

Cassandra Authentication Parameters

Cassandra Connection Parameters

Cassandra DataFrame Source Parameters

Cassandra SQL Context Options

Cassandra SSL Connection Options

Custom Cassandra Type Parameters (Expert Use Only)

Read Tuning Parameters

Write Tuning Parameters

Files

reference.md

Latest commit

History

reference.md

File metadata and controls

Configuration Reference

Cassandra Authentication Parameters

Cassandra Connection Parameters

Cassandra DataFrame Source Parameters

Cassandra SQL Context Options

Cassandra SSL Connection Options

Custom Cassandra Type Parameters (Expert Use Only)

Read Tuning Parameters

Write Tuning Parameters