In [2]:
import cassandra
from cassandra.cluster import Cluster

In [3]:
cluster = Cluster(["127.0.0.1"])
session = cluster.connect()

# Cap Theorem

 the CAP theorem, also named Brewer's theorem after computer scientist Eric Brewer, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:[1][2][3]

- Consistency: Every read receives the most recent write or an error

- Availability: Every request receives a (non-error) response – without the guarantee that it contains the most recent write

- Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

In particular, the CAP theorem implies that in the presence of a network partition, one has to choose between consistency and availability. Note that consistency as defined in the CAP theorem is quite different from the consistency guaranteed in ACID database transactions[4]. 

# Replication

A keyspace is created with a strategy. For development work, the SimpleStrategy class is acceptable. For production work, the NetworkTopologyStrategy class must be set. To change the strategy, two steps are required. Altering the distribution of nodes within multiple datacenters when data is present should be accomplished by adding a datacenter, and then adding data to the new nodes in the new datacenter and removing nodes from the old datacenter.

In [7]:
session.execute("CREATE KEYSPACE IF NOT EXISTS acme_co WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1}")

<cassandra.cluster.ResultSet at 0x1ff19f4c898>

NetworkTopologyStrategy

Use NetworkTopologyStrategy when you have (or plan to have) your cluster deployed across multiple datacenters. This strategy specifies how many replicas you want in each datacenter.

NetworkTopologyStrategy places replicas in the same datacenter by walking the ring clockwise until reaching the first node in another rack. NetworkTopologyStrategy attempts to place replicas on distinct racks because nodes in the same rack (or similar physical grouping) often fail at the same time due to power, cooling, or network issues.

When deciding how many replicas to configure in each datacenter, the two primary considerations are (1) being able to satisfy reads locally, without incurring cross data-center latency, and (2) failure scenarios. The two most common ways to configure multiple datacenter clusters are:

Two replicas in each datacenter: This configuration tolerates the failure of a single node per replication group and still allows local reads at a consistency level of ONE.
    
Three replicas in each datacenter: This configuration tolerates either the failure of one node per replication group at a strong consistency level of LOCAL_QUORUM or multiple node failures per datacenter using consistency level ONE.

Asymmetrical replication groupings are also possible. For example, you can have three replicas in one datacenter to serve real-time application requests and use a single replica elsewhere for running analytics.


In [None]:
session.execute("ALTER KEYSPACE acme_co WITH REPLICATION = \
  {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2}")

* dc1, dc2 are your different datacenters

In [8]:
session.set_keyspace("acme_co")