Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Datastax config #10

Merged
merged 3 commits into from

2 participants

@breerly

Default recipe that optionally allows datastax to be configured. The original config is intact and can still be used for the tarball recipe.

@michaelklishin michaelklishin merged commit 572fa76 into from
@michaelklishin

Thank you!

@michaelklishin

It would be nice to document these new attributes in the README.

@breerly breerly deleted the branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Mar 28, 2013
  1. more default attributes

    breerly authored
Commits on Mar 29, 2013
  1. more defaults

    breerly authored
Commits on Apr 2, 2013
  1. allow datastax to be configued

    breerly authored
This page is out of date. Refresh to see the latest.
View
13 attributes/default.rb
@@ -3,7 +3,6 @@
default[:cassandra] = {
:cluster_name => "Test Cluster",
:initial_token => "",
- :seeds => "127.0.0.1",
:version => cassandra_version,
:tarball => {
:url => "http://www.eu.apache.org/dist/cassandra/#{cassandra_version}/apache-cassandra-#{cassandra_version}-bin.tar.gz",
@@ -24,7 +23,15 @@
:conf_dir => "/etc/cassandra/",
# commit log, data directory, saved caches and so on are all stored under the data root. MK.
:data_root_dir => "/var/lib/cassandra/",
+ :commitlog_dir => "/var/lib/cassandra/",
:log_dir => "/var/log/cassandra/",
- :listen_address => "localhost",
- :rpc_address => "localhost"
+ :listen_address => node[:ipaddress],
+ :rpc_address => node[:ipaddress],
+ :max_heap_size => nil,
+ :heap_new_size => nil,
+ :vnodes => false,
+ :seeds => [],
+ :concurrent_reads => 32,
+ :concurrent_writes => 32,
+ :snitch => 'SimpleSnitch'
}
View
11 recipes/default.rb
@@ -0,0 +1,11 @@
+include_recipe "cassandra::datastax"
+
+%w(cassandra.yaml cassandra-env.sh).each do |f|
+ template File.join(node["cassandra"]["conf_dir"], f) do
+ source "cassandra/#{f}.erb"
+ owner node["cassandra"]["user"]
+ group node["cassandra"]["user"]
+ mode 0644
+ notifies :restart, resources(:service => "cassandra")
+ end
+end
View
122 templates/default/cassandra-env.sh.erb
@@ -14,6 +14,78 @@
# See the License for the specific language governing permissions and
# limitations under the License.
+calculate_heap_sizes()
+{
+ case "`uname`" in
+ Linux)
+ system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`
+ system_cpu_cores=`egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo`
+ ;;
+ FreeBSD)
+ system_memory_in_bytes=`sysctl hw.physmem | awk '{print $2}'`
+ system_memory_in_mb=`expr $system_memory_in_bytes / 1024 / 1024`
+ system_cpu_cores=`sysctl hw.ncpu | awk '{print $2}'`
+ ;;
+ SunOS)
+ system_memory_in_mb=`prtconf | awk '/Memory size:/ {print $3}'`
+ system_cpu_cores=`psrinfo | wc -l`
+ ;;
+ Darwin)
+ system_memory_in_bytes=`sysctl hw.memsize | awk '{print $2}'`
+ system_memory_in_mb=`expr $system_memory_in_bytes / 1024 / 1024`
+ system_cpu_cores=`sysctl hw.ncpu | awk '{print $2}'`
+ ;;
+ *)
+ # assume reasonable defaults for e.g. a modern desktop or
+ # cheap server
+ system_memory_in_mb="2048"
+ system_cpu_cores="2"
+ ;;
+ esac
+
+ # some systems like the raspberry pi don't report cores, use at least 1
+ if [ "$system_cpu_cores" -lt "1" ]
+ then
+ system_cpu_cores="1"
+ fi
+
+ # set max heap size based on the following
+ # max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB))
+ # calculate 1/2 ram and cap to 1024MB
+ # calculate 1/4 ram and cap to 8192MB
+ # pick the max
+ half_system_memory_in_mb=`expr $system_memory_in_mb / 2`
+ quarter_system_memory_in_mb=`expr $half_system_memory_in_mb / 2`
+ if [ "$half_system_memory_in_mb" -gt "1024" ]
+ then
+ half_system_memory_in_mb="1024"
+ fi
+ if [ "$quarter_system_memory_in_mb" -gt "8192" ]
+ then
+ quarter_system_memory_in_mb="8192"
+ fi
+ if [ "$half_system_memory_in_mb" -gt "$quarter_system_memory_in_mb" ]
+ then
+ max_heap_size_in_mb="$half_system_memory_in_mb"
+ else
+ max_heap_size_in_mb="$quarter_system_memory_in_mb"
+ fi
+ MAX_HEAP_SIZE="${max_heap_size_in_mb}M"
+
+ # Young gen: min(max_sensible_per_modern_cpu_core * num_cores, 1/4 * heap size)
+ max_sensible_yg_per_core_in_mb="100"
+ max_sensible_yg_in_mb=`expr $max_sensible_yg_per_core_in_mb "*" $system_cpu_cores`
+
+ desired_yg_in_mb=`expr $max_heap_size_in_mb / 4`
+
+ if [ "$desired_yg_in_mb" -gt "$max_sensible_yg_in_mb" ]
+ then
+ HEAP_NEWSIZE="${max_sensible_yg_in_mb}M"
+ else
+ HEAP_NEWSIZE="${desired_yg_in_mb}M"
+ fi
+}
+
# Determine the sort of JVM we'll be running on.
java_ver_output=`"${JAVA:-java}" -version 2>&1`
@@ -43,11 +115,11 @@ esac
# Override these to set the amount of memory to allocate to the JVM at
-# start-up. For production use you almost certainly want to adjust
-# this for your environment. MAX_HEAP_SIZE is the total amount of
-# memory dedicated to the Java heap; HEAP_NEWSIZE refers to the size
-# of the young generation. Both MAX_HEAP_SIZE and HEAP_NEWSIZE should
-# be either set or not (if you set one, set the other).
+# start-up. For production use you may wish to adjust this for your
+# environment. MAX_HEAP_SIZE is the total amount of memory dedicated
+# to the Java heap; HEAP_NEWSIZE refers to the size of the young
+# generation. Both MAX_HEAP_SIZE and HEAP_NEWSIZE should be either set
+# or not (if you set one, set the other).
#
# The main trade-off for the young generation is that the larger it
# is, the longer GC pause times will be. The shorter it is, the more
@@ -57,8 +129,22 @@ esac
# times. If in doubt, and if you do not particularly want to tweak, go with
# 100 MB per physical CPU core.
-MAX_HEAP_SIZE="256M"
-HEAP_NEWSIZE="128M"
+<% if node[:cassandra][:max_heap_size] && node[:cassandra][:heap_new_size] %>
+MAX_HEAP_SIZE="<%=node[:cassandra][:max_heap_size]%>"
+HEAP_NEWSIZE="<%=node[:cassandra][:heap_new_size]%>"
+<% else %>
+#MAX_HEAP_SIZE="4G"
+#HEAP_NEWSIZE="800M"
+<% end %>
+
+if [ "x$MAX_HEAP_SIZE" = "x" ] && [ "x$HEAP_NEWSIZE" = "x" ]; then
+ calculate_heap_sizes
+else
+ if [ "x$MAX_HEAP_SIZE" = "x" ] || [ "x$HEAP_NEWSIZE" = "x" ]; then
+ echo "please set or unset MAX_HEAP_SIZE and HEAP_NEWSIZE in pairs (see cassandra-env.sh)"
+ exit 1
+ fi
+fi
# Specifies the default port over which Cassandra will be available for
# JMX connections.
@@ -101,19 +187,15 @@ if [ "x$CASSANDRA_HEAPDUMP_DIR" != "x" ]; then
fi
-startswith () [ "${1#$2}" != "$1" ]
+startswith() { [ "${1#$2}" != "$1" ]; }
if [ "`uname`" = "Linux" ] ; then
# reduce the per-thread stack size to minimize the impact of Thrift
# thread-per-client. (Best practice is for client connections to
# be pooled anyway.) Only do so on Linux where it is known to be
# supported.
- if startswith "$JVM_VERSION" '1.7.'
- then
- JVM_OPTS="$JVM_OPTS -Xss160k"
- else
- JVM_OPTS="$JVM_OPTS -Xss128k"
- fi
+ # u34 and greater need 180k
+ JVM_OPTS="$JVM_OPTS -Xss180k"
fi
echo "xss = $JVM_OPTS"
@@ -125,6 +207,10 @@ JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
+# note: bash evals '1.7.x' as > '1.7' so this is really a >= 1.7 jvm check
+if [ "$JVM_VERSION" \> "1.7" ] ; then
+ JVM_OPTS="$JVM_OPTS -XX:+UseCondCardMark"
+fi
# GC logging options -- uncomment to enable
# JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
@@ -135,6 +221,12 @@ JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
# JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure"
# JVM_OPTS="$JVM_OPTS -XX:PrintFLSStatistics=1"
# JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log"
+# If you are using JDK 6u34 7u2 or later you can enable GC log rotation
+# don't stick the date in the log name if rotation is on.
+# JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log"
+# JVM_OPTS="$JVM_OPTS -XX:+UseGCLogFileRotation"
+# JVM_OPTS="$JVM_OPTS -XX:NumberOfGCLogFiles=10"
+# JVM_OPTS="$JVM_OPTS -XX:GCLogFileSize=10M"
# uncomment to have Cassandra JVM listen for remote debuggers/profilers on port 1414
# JVM_OPTS="$JVM_OPTS -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=1414"
@@ -156,4 +248,4 @@ JVM_OPTS="$JVM_OPTS -Djava.net.preferIPv4Stack=true"
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT"
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl=false"
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=false"
-JVM_OPTS="$JVM_OPTS $JVM_EXTRA_OPTS"
+JVM_OPTS="$JVM_OPTS $JVM_EXTRA_OPTS"
View
96 templates/default/cassandra.yaml.erb
@@ -1,4 +1,4 @@
-# Cassandra storage config YAML
+# Cassandra storage config YAML
# NOTE:
# See http://wiki.apache.org/cassandra/StorageConfiguration for
@@ -19,9 +19,12 @@ cluster_name: '<%= node[:cassandra][:cluster_name] %>'
#
# Specifying initial_token will override this setting.
#
-# If you already have a cluster with 1 token per node, and wish to migrate to
+# If you already have a cluster with 1 token per node, and wish to migrate to
# multiple tokens per node, see http://wiki.apache.org/cassandra/Operations
# num_tokens: 256
+<% if node[:cassandra][:vnodes] %>
+num_tokens: <%=node[:cassandra][:vnodes]%>
+<% end %>
# If you haven't specified num_tokens, or have set it to the default of 1 then
# you should always specify InitialToken when setting up a production
@@ -34,7 +37,11 @@ cluster_name: '<%= node[:cassandra][:cluster_name] %>'
# the heaviest-loaded existing node. If there is no load information
# available, such as is the case with a new cluster, it will pick
# a random token, which will lead to hot spots.
+<% if node[:cassandra][:vnodes] %>
+initial_token:
+<% else %>
initial_token: <%= node[:cassandra][:initial_token] %>
+<% end %>
# See http://wiki.apache.org/cassandra/HintedHandoff
hinted_handoff_enabled: true
@@ -54,18 +61,37 @@ max_hints_delivery_threads: 2
# Defaults to: false
# populate_io_cache_on_flush: false
-# authentication backend, implementing IAuthenticator; used to identify users
+# Authentication backend, implementing IAuthenticator; used to identify users
+# Out of the box, Cassandra provides org.apache.cassandra.auth.{AllowAllAuthenticator,
+# PasswordAuthenticator}.
+#
+# - AllowAllAuthenticator performs no checks - set it to disable authentication.
+# - PasswordAuthenticator relies on username/password pairs to authenticate
+# users. It keeps usernames and hashed passwords in system_auth.credentials table.
+# Please increase system_auth keyspace replication factor if you use this authenticator.
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
-# authorization backend, implementing IAuthorizer; used to limit access/provide permissions
+# Authorization backend, implementing IAuthorizer; used to limit access/provide permissions
+# Out of the box, Cassandra provides org.apache.cassandra.auth.{AllowAllAuthorizer,
+# CassandraAuthorizer}.
+#
+# - AllowAllAuthorizer allows any action to any user - set it to disable authorization.
+# - CassandraAuthorizer stores permissions in system_auth.permissions table. Please
+# increase system_auth keyspace replication factor if you use this authorizer.
authorizer: org.apache.cassandra.auth.AllowAllAuthorizer
+# Validity period for permissions cache (fetching permissions can be an
+# expensive operation depending on the authorizer, CassandraAuthorizer is
+# one example). Defaults to 2000, set to 0 to disable.
+# Will be disabled automatically for AllowAllAuthorizer.
+permissions_validity_in_ms: 2000
+
# The partitioner is responsible for distributing rows (by key) across
# nodes in the cluster. Any IPartitioner may be used, including your
# own as long as it is on the classpath. Out of the box, Cassandra
# provides org.apache.cassandra.dht.{Murmur3Partitioner, RandomPartitioner
# ByteOrderedPartitioner, OrderPreservingPartitioner (deprecated)}.
-#
+#
# - RandomPartitioner distributes rows across the cluster evenly by md5.
# This is the default prior to 1.2 and is retained for compatibility.
# - Murmur3Partitioner is similar to RandomPartioner but uses Murmur3_128
@@ -88,7 +114,7 @@ data_file_directories:
- <%= File.join(node.cassandra.data_root_dir, 'data') %>
# commit log
-commitlog_directory: <%= File.join(node.cassandra.data_root_dir, 'commitlog') %>
+commitlog_directory: <%= File.join(node.cassandra.commitlog_dir, 'commitlog') %>
# policy for data disk failures:
# stop: shut down gossip and Thrift, leaving the node effectively dead, but
@@ -168,7 +194,7 @@ row_cache_provider: SerializingCacheProvider
# saved caches
saved_caches_directory: <%= File.join(node.cassandra.data_root_dir, 'saved_caches') %>
-# commitlog_sync may be either "periodic" or "batch."
+# commitlog_sync may be either "periodic" or "batch."
# When in batch mode, Cassandra won't ack writes until the commit log
# has been fsynced to disk. It will wait up to
# commitlog_sync_batch_window_in_ms milliseconds for other writes, before
@@ -185,8 +211,8 @@ commitlog_sync_period_in_ms: 10000
# The size of the individual commitlog file segments. A commitlog
# segment may be archived, deleted, or recycled once all the data
-# in it (potentally from each columnfamily in the system) has been
-# flushed to sstables.
+# in it (potentally from each columnfamily in the system) has been
+# flushed to sstables.
#
# The default size is 32, which is almost always fine, but if you are
# archiving commitlog segments (see commitlog_archiving.properties),
@@ -197,7 +223,7 @@ commitlog_segment_size_in_mb: 32
# any class that implements the SeedProvider interface and has a
# constructor that takes a Map<String, String> of parameters will do.
seed_provider:
- # Addresses of hosts that are deemed contact points.
+ # Addresses of hosts that are deemed contact points.
# Cassandra nodes use this list of hosts to find each other and learn
# the topology of the ring. You must change this if you are running
# multiple nodes!
@@ -205,11 +231,12 @@ seed_provider:
parameters:
# seeds is actually a comma-delimited list of addresses.
# Ex: "<ip1>,<ip2>,<ip3>"
- - seeds: "<%= node[:cassandra][:seeds] %>"
+ #- seeds: "127.0.0.1"
+ - seeds: "<%= node[:cassandra][:seeds].join(",") %>"
# emergency pressure valve: each time heap usage after a full (CMS)
# garbage collection is above this fraction of the max, Cassandra will
-# flush the largest memtables.
+# flush the largest memtables.
#
# Set to 1.0 to disable. Setting this lower than
# CMSInitiatingOccupancyFraction is not likely to be useful.
@@ -225,8 +252,8 @@ flush_largest_memtables_at: 0.75
# Cassandra will reduce cache maximum _capacity_ to the given fraction
# of the current _size_. Should usually be set substantially above
# flush_largest_memtables_at, since that will have less long-term
-# impact on the system.
-#
+# impact on the system.
+#
# Set to 1.0 to disable. Setting this lower than
# CMSInitiatingOccupancyFraction is not likely to be useful.
reduce_cache_sizes_at: 0.85
@@ -241,8 +268,8 @@ reduce_cache_capacity_to: 0.6
# On the other hand, since writes are almost never IO bound, the ideal
# number of "concurrent_writes" is dependent on the number of cores in
# your system; (8 * number_of_cores) is a good rule of thumb.
-concurrent_reads: 32
-concurrent_writes: 32
+concurrent_reads: <%= node[:cassandra][:concurrent_reads] %>
+concurrent_writes: <%= node[:cassandra][:concurrent_writes] %>
# Total memory to use for memtables. Cassandra will flush the largest
# memtable when this much memory is used.
@@ -289,7 +316,7 @@ ssl_storage_port: 7001
# Address to bind to and tell other Cassandra nodes to connect to. You
# _must_ change this if you want multiple nodes to be able to
# communicate!
-#
+#
# Leaving it blank leaves it up to InetAddress.getLocalHost(). This
# will always do the Right Thing *if* the node is properly configured
# (hostname, name resolution, etc), and the Right Thing is to use the
@@ -308,7 +335,7 @@ listen_address: <%= node[:cassandra][:listen_address] %>
# transport is considered beta.
# Please note that the address on which the native transport is bound is the
# same as the rpc_address. The port however is different and specified below.
-start_native_transport: true
+start_native_transport: false
# port for the CQL native transport to listen for clients on
native_transport_port: 9042
# The minimum and maximum threads for handling requests when the native
@@ -324,10 +351,10 @@ start_rpc: true
# The address to bind the Thrift RPC service to -- clients connect
# here. Unlike ListenAddress above, you *can* specify 0.0.0.0 here if
# you want Thrift to listen on all interfaces.
-#
+#
# Leaving this blank has the same effect it does for ListenAddress,
# (i.e. it will be based on the configured hostname of the node).
-rpc_address: <%= node.cassandra.rpc_address %>
+rpc_address: <%= node[:cassandra][:rpc_address] %>
# port for Thrift to listen for clients on
rpc_port: 9160
@@ -370,6 +397,18 @@ rpc_server_type: sync
# rpc_send_buff_size_in_bytes:
# rpc_recv_buff_size_in_bytes:
+# Uncomment to set socket buffer size for internode communication
+# Note that when setting this, the buffer size is limited by net.core.wmem_max
+# and when not setting it it is defined by net.ipv4.tcp_wmem
+# See:
+# /proc/sys/net/core/wmem_max
+# /proc/sys/net/core/rmem_max
+# /proc/sys/net/ipv4/tcp_wmem
+# /proc/sys/net/ipv4/tcp_wmem
+# and: man tcp
+# internode_send_buff_size_in_bytes:
+# internode_recv_buff_size_in_bytes:
+
# Frame size for thrift (maximum field length).
thrift_framed_transport_size_in_mb: 15
@@ -390,7 +429,7 @@ incremental_backups: false
snapshot_before_compaction: false
# Whether or not a snapshot is taken of the data before keyspace truncation
-# or dropping of column families. The STRONGLY advised default of true
+# or dropping of column families. The STRONGLY advised default of true
# should be used to provide data safety. If you set this flag to false, you will
# lose data on truncation or drop.
auto_snapshot: true
@@ -424,7 +463,7 @@ in_memory_compaction_limit_in_mb: 64
# Multi-threaded compaction. When enabled, each compaction will use
# up to one thread per core, plus one thread per sstable being merged.
-# This is usually only useful for SSD-based hardware: otherwise,
+# This is usually only useful for SSD-based hardware: otherwise,
# your concern is usually to get compaction to do LESS i/o (see:
# compaction_throughput_mb_per_sec), not more.
multithreaded_compaction: false
@@ -530,11 +569,11 @@ cross_node_timeout: false
#
# You can use a custom Snitch by setting this to the full class name
# of the snitch, which will be assumed to be on your classpath.
-endpoint_snitch: SimpleSnitch
+endpoint_snitch: <%= node[:cassandra][:snitch] %>
# controls how often to perform the more expensive part of host score
# calculation
-dynamic_snitch_update_interval_in_ms: 100
+dynamic_snitch_update_interval_in_ms: 100
# controls how often to reset all host scores, allowing a bad host to
# possibly recover
dynamic_snitch_reset_interval_in_ms: 600000
@@ -564,7 +603,7 @@ request_scheduler: org.apache.cassandra.scheduler.NoScheduler
# NoScheduler - Has no options
# RoundRobin
# - throttle_limit -- The throttle_limit is the number of in-flight
-# requests per client. Requests beyond
+# requests per client. Requests beyond
# that limit are queued up until
# running requests can complete.
# The value of 80 here is twice the number of
@@ -631,12 +670,15 @@ client_encryption_options:
enabled: false
keystore: conf/.keystore
keystore_password: cassandra
+ # require_client_auth: false
+ # Set trustore and truststore_password if require_client_auth is true
+ # truststore: conf/.truststore
+ # truststore_password: cassandra
# More advanced defaults below:
# protocol: TLS
# algorithm: SunX509
# store_type: JKS
# cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA]
- # require_client_auth: false
# internode_compression controls whether traffic between nodes is
# compressed.
@@ -649,4 +691,4 @@ internode_compression: all
# Disabling it will result in larger (but fewer) network packets being sent,
# reducing overhead from the TCP protocol itself, at the cost of increasing
# latency if you block for cross-datacenter responses.
-inter_dc_tcp_nodelay: true
+inter_dc_tcp_nodelay: true
Something went wrong with that request. Please try again.