# Concurrency Control

## Remote Procedure Calls

Problem: Hard to guarantee we make the RPC exactly once
* Request can get dropped (no call at all)
* Reply message can be dropped (Caller might call again)
* Called process fails before executing
* Called process fails after executing
    * How's the caller supposed to tell the difference?
* Called function might get duplicated

### Semantics
* At most once (Java RMI)
    * Retransmit request if you don't get a reply
    * Executor filters out duplicate requests
    * Retransmit reply if you get a duplicate request
* At least once (Sun RPC)
    * Retransmit request if you don't get a reply
    * Executor doesn't filters out duplicate requests
    * Re-execute if you get a duplicate request
* Maybe, i.e. best-effort (CORBA)
    * Only transmit request once
    * No duplicate requests possible

![RPC Semantics Table](img/RPC_Semantics.png)

### Idempotent Operations
* Operations can be repeated multiple times, without any side effects
* Can be used with at-least-once semantics

### Implementation
![RPC implmentation method](RPC_model.png)
* Client-side
    * caller(): Calling function
    * Client stub: same function signature as callee()
        * Allows same caller() code for LPC & RPC
    * Communication Module: Forwards requests and replies to hosts
* Server-side
    * Dispatcher: Routes requests to server stubs
    * Server stub: Calls callee(), places return value on local stack
    * callee(): Function to be remotely called
    
* Programmer writes caller() and callee() functions
* Remaining code is automatically generated by framework

### Marshalling
* Different architectures have different internal data representations
* Little vs Big endian
    * Little: Low-value bytes stored in lowest memory addresses
    * Big: High-value bytes stored in lowest memory addresses
* RPC Middleware maintains a common data representation (CDR)
* *Marshalling* is the conversion from platform-dependent formats to the CDR format
* *Unmarshalling* is going from CDR format back to the platform's native format

## Transactions

Ultimately you have to commit all changes or abort and commit nothing.
We define the boundaries of this choice in *transactions*.
E.g., writing to a server, accumulate a few changes, then apply all changes at once...or none at all.

Remember **ACID**?

__atomicity__ All or nothing principle: a transaction should either i) complete successfully, so its effects are recorded in the server objects; or ii) the transaction has no effect at all.

__consistency__ If the server starts in a consistent state, the transactions ends the server in a consistent state.
    * Bank transfer between servers - a withdrawal in one account should be reflected as a deposit in another

__isolation__: Need a transaction to be indivisible (atomic) from the point of view of other transactions.
    * no access to intermediate results/states of other transactions
        * Other transactions only see the state before or after my transaction has been applied.
        * Implicit notion of causality here - how do you decide which state you see, pre or post?
    * Free from interference by operations of other transactions
    
__durability__ Effects of a successful transaction are permanent

Remember: you also want to be able to support as many simultaneous transactions as possible.

### What could go wrong?

1. Lost Update Problem
    * Overriding updates between two transactions
        * $T_{1}$ writes $x = x + 1$
        * $T_{2}$ writes $x = x + 2$
        * Final value of $x$ is...$x + 1$
1. Inconsistent Retrieval Problem
    * Reading in the middle of another transaction's writes
    

## Serial Equivalence

Again, we want to run as many concurrent transactions per second (TPS) as possible.
But, we don't want to drop ACID...compliance.

An *interleaving*, we'll call it *O*, of transaction operations is *serially equivalent* iff, some ordering, *O'*, of all transactions:
    * Provides the same final state as the original interleaving *O*
    * *But* the transactions occurred in a batch (consecutively)
Read that again. We're saying we can call a batch of transactions equivalent to a *serial ordering* of the same transactions if they both provide the same end-result.

### Checking for Serial Equivalence
* We say an operation has an *effect* on...
    * *Server objects* if it is a write
    * *Client objects* if it is a read of a returned value
* Operations are called *conflicting* if their *combined effect* depends on their execution order
    * read(x) and write(x) => CONFLICTING
    * write(x) and read(x) => CONFLICTING
    * write(x) and write(x) => CONFLICTING
    * read(x) and read(x) => NOT CONFLICTING
    * read(x) and write(y) => NOT CONFLICTING
* Transactions can be called serially equivalent if, for every pair of conflicting operations between the transactions, we can give them an ordering

Example of conflicting operations:
![Example of conflicting operations](img/serial_equivalence_conflict_example.png)

#### Handling Conflicts
* As you prepare to commit a transaction T
    * Check for serial equivalence with all other overlapping transactions 
    * If not serially equivalent
        * Abort T
        * Rollback T
            * Can't remove once we've committed; __durability__ must be preserved

But we're still wasting work if we're aborting transactions. Can we avoid creating conflicting transactions in the first place?

## Pessimistic Concurrency

__Pessimistic__ Prevent transactions from accessing the same object; locking

__Optimistic__ Allow transactions to write, but check later (maybe at commit time). 

### Being Pessimistic: Exclusive Locking
Every object, O, has a lock
Only one transaction can operate within a lock at any time
Transactions must `lock(O)` before running. Once you have the lock, you can read/write until you release it with `unlock(O)`.

Real-life operations are often read-heavy.
Pessimistic locking lowers concurrency, because now we can't have read-read operations.
So, we'll differentiate with Lock modes:
* `read_lock(O)`: multiple transactions allowed to read, but read only.
* `write_lock(O)`: exclusive lock, no other transactions allowed to read or write.
    * Can "promote" a read-lock to a write-lock; blocks if other transactions have read-locks as well.

### Two-phase locking
States: A transaction cannot acquire (or promote) any locks after it has started releasing locks.

Transactions have two phases
1. Growing: only acquiring or promoting locks
1. Shrinking: Only releasing locks.
    * Strict two-phase locking: Releasing only occurs at commit of transaction.
**Safe**, because now we can't have conflicting (order of) operations, and thus violate serial equivalence.

But, now we've exposed ourselves to a risk of deadlocks. Two "growing" transactions may have mutually-desired locks, and neither can release until it finishes growing.

### Deadlocks
Three *necessary* conditions to have a deadlock
*necessary*: If you have a deadlock, all conditions are present. But, there presence doesn't *mean* you have a Deadlock.

1. Some objects are accessed in exlucisve lock modes
    * write-modes
1. Transactions holding locks cannot be preempted
    * Can't force a transaction to release a lock
1. Circular wait (cycle) in the wait-for graph

#### Fighting Deadlocks
1. Lock timeout
    * Abort transaction if lock can't be acquired in a timely fashion
2. Deadlock Detection:
    * Track the Wait-for graph
        * Global snaphost algorithm, anybody? Chandy-Lamport, hmmm?
    * If you find a cycle, abort one or more transactions
Either way, you still have deadlocks for some period of time.
So, try number 3:

3. Deadlock Prevention: Violate one or more necessary deadlock conditions
    1. Some objects are accessed in exlucisve lock modes
        * Allow read-only access to objects
    1. Transactions holding locks cannot be preempted
        * Allow special-case preemption
    1. Circular wait (cycle) in the wait-for graph
        * Lock all objects in the beginning; if any fail, abort the transaction

## Optimistic Concurrency Control
Let 'em all transact, and sort 'em out later.

Higher transactions/second than pessimistic.
Prefered when you expect conflicts to be rare.

### Basic Approach
* Write and read objects at will
* Check for serial equivalence at commit time
* If you have to abort, roll back as needed
    * If that abort contaminates other transactions, abort those too!
        * pro-abort...ing

### A more timely approach: Timestamp ordering
* Assign each transaction an ID, T_ID
* T_ID determines *serialization order* position
* Check the following two proerties for each T
    1. T's write to object O is allowed iff transactions that have read or written to O had lower T_IDs than T
    1. T's read to object O is allowed iff O was last written by a transaction with a lower T_ID than T
* Maintain read & write timestamps for each object

### Multi-version Concurrency Control
* For each object
    * Maintain a per-transaction version of the object
        * called a *tentative* version
    * And a committed object version
* Each tentative version has a timestamp
    * Some maintain a read & write timestamp per tentative
* On a read or write, find the "immediately previous" tentative version to operate upon

#### Eventual Consistency in Key-Value stores
* Very similar to the consistency in key-value stores like Cassandra, Riak, Dynamo DB
* But they're not transactional systems, so it's still different. 
    * No serial equivalence.
* Cassandra, DynamoDB
    * Notion of *Last-write wins*
        * Overwrite if write is newer than object's timestamp
        * Unsynchronized clocks may cause older writes to appear newer than they are, so watch out
* Riak
    * Vector clocks
        * Implements causal ordering
        * Detects whether 
            1. incoming write is newer than current value, and 
            1. if they conflict
                * Create a *sibling value* on the client-side
        * Can get very big; as many entries as number of clients
            * Size-based pruning
                * Maintain a cap
            * Time-based pruning
                * Get rid of really old entries

## Replication Control

__replication control__ how to handle operations (or transactions) when there objects are stored at *multiple servers*, with or without replication.

__replication__ an object has identical copies, each maintained by a separate server; copies are called "replicas". We may also use "replica" to refer to the server storing the object's replica.

### Why Replicate?
* Fault-tolerance
    * With $k$ replicas, we can tolerate $k-1$ failures
* Load balancing
    * Divide load by $k$ across all servers
* Basically, higher availability
    * Say each server is down $f$ of the time; say, 0.05 (5%)
    * With no replication, availability of objects per server is $1 - f$, or 95% in our case
    * With $k$ replicas, availability increases to $1-f^{k}$
        * With three replicas, we go to $1 - 0.05^{3} = 0.999875$
        
![Table of Availability Probabilities](img/nines_availability.png)

### Challenges
* Replication transparency
    * Client's interact with servers as if there's only one object
* Replication Consistency
    * All clients see a single consistent copy of data, in spite of replication
    * Transactions call this ACID complicance
    
#### Transparency
Typically we use a *front-end*, such as a CDN or web server cluster.
![](img/replicant_transparency.png)

#### Consistency
We need a way to forward updates from front-ends to replica group

* Passive: use a primary replica (master)
* Active: treat all replicas identically

Both have a concept of __Replicated State Machines__
* Each replica's code runs the same "state machine" (version of program)
    * State machines are equal if the same set of inputs generates the same outputs and end state for all state machines.
   
##### Passive Replication
* Master is elected leader
* Master imposes a "total ordering" on all updates
    
##### Active Replication
* Front-end multicasts all updates to all replicas
* Which type of multicast determines a lot
    * FIFO
    * Causal
    * Total
    * Hybrid
* Using Total or Hybrid (\*-Total) ordering + Replicated State Machines approach => All replicas reflect the same sequence of updates to the object

#### Handling Failures
* Virtual Synchrony: Maintaining consistent world-views across all servers
* Virtual Synchrony + Total Ordering: 
    * All replicas see all failures/joins/leaes and all multicasts in the same order
    * Could also use causal (or FIFO) ordering if application can tolerate it
    
#### Transactions

We want to be able to act as if we're interfacing with a single set of objects.

__One-copy-serializable__ Concurrent transactions in a replicated database are equivalent to their serial-equivalent execution on a single, non-replicated database. 

### Distributed Transactions

Transactions across distributed servers are a little trickier; not only do we need to check for conflicting operations on a single object, we also need to ensure we deconflict for all objects across all servers.

We want to commit to all servers or none at all; *consensus*!
Here, it's called the *Atomic Commit Problem*

#### One-phase commit
Use a Coordinator server to monitor the transactions commit.
* Transaction T writes to Coordinator
* Coordinator tells all servers to commit or abort
* __Problems__
    * Individual servers have no say; any server(s) may have a corrupted object, not allowing them to commit (while all other severs do! Violates consistency)
    * Servers crash! May never receive/complete commit
        * Violates consistency, durability

#### Two-phase commit
Most widely-used model.

![Two-phase commit model](img/two-phase_commit.png)

Still has as a Coordinator server

1. Coordinator sends PREPARE message
1. Servers write updates to disk
    1. Data is now duable, in event of failure
    1. Responds with YES or NO
1. Coordinator... 
    1. Aborts if any NO messages are received, or timeouts before receiving all votes
    1. Else, sends COMMIT message to all servers
        1. Servers write commit from disk to permanent storage
        1. Servers respond OK
1. Coordinator reponds with ACK
    
##### Failures
* Server $S_{i}$ votes No
    * $S_{i}$ can immediately abort transaction. Coordinator would've aborted anyway!
* $S_{i}$ votes YES
    * $S_{i}$ must still wait for COMMIT
* $S_{i}$ crashes
    * $S_{i}$ can read first-phase transaction from disk upon recovery
* Coordinator crashes
    * Coordinator keeps logs of all decisions and received/sent messages
* Message losses
    * PREPARE 
        * Can timeout and ABORT the transaction early or re-send PREPARE
    * Dropping a YES/NO message, Coordinator must ABORT. Pessimistic, but safe.
    * Dropping COMMIT or ABORT
        * Servers can poll the Coordinator (pull)
        
#### Paxos
Since Atomic Commit is a consensus problem, we can use a consensus algorithm.
*However*, we need to make sure everyone ABORTs if any server sends NO

##### Ordering updates
* A server proposes a received update for the next sequence number
* Group reaches consensus. Or doesn't, in which case a new message is proposed.
