# Introduction to NoSQL databases

This document introduces the concept and usage of NoSQL databases.

|![nosql](../img/nosql-geek-and-poke.jpg)|
|:---:|
|NoSQL comic [Geek and Poke](https://geekandpoke.typepad.com/geekandpoke/2011/01/nosql.html)|

## What is NoSQL

The term NoSQL refers to a variety of database systems that *do not* follow all the rules in a relational (SQL) database system.

NoSQL typically stands for *not only SQL* or non-relational.

Examples of NoSQL systems:
- Apache Cassandra: commerce, recommandation systems
- MongoDB: text-based content
- Redis: queues, messaging

*Note:* Many of these systems can handle most of the applications listed.  Their *main* use case is listed.

## Key features of NoSQL  

1. (Horizontal) scalability: distribute operations over many nodes

2. Redundancy: replicate and distribute data over many nodes

3. Flexibility:
   - looser restriction on concurrency than most relational database systems
   - ability to add new features/attributes (columns) dynamically

## CAP Theorem 
The CAP theorem (Fox and Brewer, 1999, Brewer, 2012) describes the following performance properties of a system:  

- consistency (C) equivalent to having a single up-to-date
copy of the data;

- high availability (A) of that data (for updates); and

- tolerance to network partitions (P).

On a high-level, a general system only satisfies **two** of the three properties.

*Note:* Tension spectra among the three properties.

|![acid-vs-base](../img/acid-vs-base.png)|
|:---:|
|ACID and BASE example schema (luminousmen)|

## ACID and BASE properties

| **Property**       | **ACID** (Relational DBs)  (e.g. bank transactions)                    | **BASE** (NoSQL DBs)  (e.g. social media post edit)     |
|--------------------|-----------------------------------------------|------------------------------------------------|
| **Atomicity**      | Transactions are all-or-nothing.              | Not guaranteed; partial updates allowed.      |
| **Consistency**    | Ensures data consistency at all times.        | Eventually consistent; temporary inconsistencies may occur. |
| **Isolation**      | Transactions are executed independently.      | Operations may be interleaved for efficiency. |
| **Durability**     | Data is permanently saved upon transaction commit. | Data availability is prioritized over strict durability. |
| **Availability**   | Strict availability within transaction rules. | Highly available; no strict transaction requirements. |
| **Scalability**    | Less scalable due to strict transaction rules. | Highly scalable, optimized for distributed computing. |

## Main forms of NoSQL database systems [ch.5, Reis, 2022]

1. Document databases
   - Storing data in JSON-like documents.
   - e.g., MongoDB and Couchbase.

2. Key-value stores 
   - Storing data as a collection of key-value pairs.
   - e.g., Redis and DynamoDB.

3. Graph databases
   - Storing data as vertices and edges.
   - e.g., Neo4j and Amazon Neptune.

4. **Column-family (wide-column) stores**
   - Storing data in columns rather than rows
   - Scalable data storage.
   - e.g., Apache Cassandra and ScyllaDB.

# References
Fox, A., & Brewer, E. A. (1999, March). Harvest, yield, and scalable tolerant systems. *In Proceedings of the seventh workshop on hot topics in operating systems* (pp. 174-178). IEEE.  
Brewer, E. (2012). CAP twelve years later: How the" rules" have changed. *Computer*, 45(2), 23-29.  