# Introduction to NoSQL databases

This document introduces the concept and usage of NoSQL databases.

```{figure} ../img/nosql-geek-and-poke.jpg
---
width: 70%
name: nosql
---
NoSQL comic [Geek and Poke](https://geekandpoke.typepad.com/geekandpoke/2011/01/nosql.html)
```

## What is NoSQL

The term NoSQL refers to a variety of database systems that *do not* follow all the rules in a relational (SQL) database system.

NoSQL typically stands for *not only SQL* or non-relational.

Examples of NoSQL systems:
- Apache Cassandra: commerce, recommandation systems
- MongoDB: text-based content
- Redis: queues, messaging

*Note:* Many of these systems can handle most of the applications listed.  Their *main* use case is listed.

## Key features of NoSQL  
1. (Horizontal) scalability: distribute operations over many nodes
2. Redundancy: replicate and distribute data over many nodes
3. Flexibility:
   - looser restriction on concurrency than most relational database systems
   - ability to add new features/attributes (columns) dynamically

## CAP Theorem 
The CAP theorem ({cite:t}`fox1999harvest`, {cite:t}`brewer2012cap`) describes the following performance properties of a system:  

- consistency (C) equivalent to having a single up-to-date
copy of the data;
- high availability (A) of that data (for updates); and
- tolerance to network partitions (P).

On a high-level, a general system only satisfies **two** of the three properties.

*Note:* Tension spectra among the three properties.

## ACID and BASE properties

::::{grid}
:gutter: 2

:::{grid-item-card} ACID
- Atomicity: All operations in a single transaction succeed or fail together.
- Consistency: When multiple reads occur simultaneously, the data are consistent for all.
- Isolation: A new transaction waits until the previous transaction is complete.
- Durability: All committed records are maintained.
:::

:::{grid-item-card} BASE
- Basically Available: Database is accessible at (almost) all times. One user does not wait for other transactions to complete before a read/update.
- Soft state: Temporary states of data may change (without external triggers). (e.g. social media post edit)
- Eventually consistent: A data record will become consistent when all concurrent updates are complete.
:::


### Summary table from [AWS Concepts](https://aws.amazon.com/compare/the-difference-between-acid-and-base-database/)

|    | ACID | BASE |
| :- |  :-  |  :-  |
| Scale | Scales vertically. | Scales horizontally. |
| Flexibility | Less flexible. Blocks specific records from other applications when processing. | More flexible. Allows multiple applications to update the same record simultaneously. |
| Performance | Performance decreases when processing large volumes of data. | Capable of handling large, unstructured data with high throughput. |
| Synchronization | Yes. Adds delay when synchronizing. | No synchronization at the database level. |

## Main forms of NoSQL database systems [ch.5, {cite:t}`reis2022fundamentals`]
1. Document databases
   - Storing data in JSON-like documents.
   - e.g., MongoDB and Couchbase.
2. Key-value stores 
   - Storing data as a collection of key-value pairs.
   - e.g., Redis and DynamoDB.
3. Graph databases
   - Storing data as vertices and edges.
   - e.g., Neo4j and Amazon Neptune.
4. **Column-family (wide-column) stores**
   - Storing data in columns rather than rows
   - Scalable data storage.
   - e.g., Apache Cassandra and ScyllaDB.

```{bibliography}
:filter: docname in docnames
```