## Introduction to NoSQL    

### What is a NoSQL Database?

A NoSQL database is a non-relational database.  Since some NoSQL databases allow commands written using SQL a more accurate term would be a "non relational" database.   

**Types and examples of NoSQL databases**   


There have been various approaches to classify NoSQL databases, each
with different categories and subcategories, some of which overlap. What
follows is a basic classification by data model, with examples:

-   **Column**: Accumulo, Cassandra, Druid, HBase, Vertica.   
-   **Document**: Apache CouchDB, ArangoDB, BaseX,
    Clusterpoint, Couchbase, Cosmos DB, IBM Domino, MarkLogic,
    MongoDB, OrientDB, Qizx, RethinkDB     
-   **Key-value**: Aerospike, Apache Ignite, ArangoDB,
    Couchbase, Dynamo, FairCom c-treeACE, FoundationDB,
    InfinityDB, MemcacheDB, MUMPS, Oracle NoSQL Database,
    OrientDB, Redis, Riak, Berkeley DB, SDBM/Flat File dbm,
    ZooKeeper    
-   **Graph**: AllegroGraph, ArangoDB, InfiniteGraph, Apache
    Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso   
-   **Multi-model**: Apache Ignite, ArangoDB, Couchbase,
    FoundationDB, InfinityDB, MarkLogic, OrientDB   

A more detailed classification see the free book _NoSQL Databases_ [http://www.christof-strauch.de/nosqldbs](http://www.christof-strauch.de/nosqldbs) by Christof Strauch.   

### Key-value store

Key-value (KV) stores use the associative array (also known as a map
or dictionary) as their fundamental data model. In this model, data is
represented as a collection of key-value pairs, such that each possible
key appears at most once in the collection.   

The key-value model is one of the simplest non-trivial data models, and
richer data models are often implemented as an extension of it. The
key-value model can be extended to a discretely ordered model that
maintains keys in lexicographic order. This extension is
computationally powerful, in that it can efficiently retrieve selective
key *ranges*.   

Key-value stores can use consistency models ranging from eventual
consistency to serializability. Some databases support ordering of
keys. There are various hardware implementations, and some users
maintain data in memory (RAM), while others employ solid-state drives
(SSD) or rotating disks (aka Hard Disk Drive (HDD)).

Examples include ArangoDB, InfinityDB, Riak, Oracle NoSQL Database,
Redis, and dbm.

* A KV-DB is essentially a lookup table that often uses hashing to speed up retrieval.  
* KV-DBs scale easily and have high performance.    


### Document store

The central concept of a document store is the notion of a \"document\".
While each document-oriented database implementation differs on the
details of this definition, in general, they all assume that documents
encapsulate and encode data (or information) in some standard formats or
encodings. Encodings in use include XML, YAML, and JSON as well as
binary forms like BSON. Documents are addressed in the database via a
unique *key* that represents that document. One of the other defining
characteristics of a document-oriented database is that in addition to
the key lookup performed by a key-value store, the database also offers
an API or query language that retrieves documents based on their
contents.

Different implementations offer different ways of organizing and/or
grouping documents:

-   Collections
-   Tags
-   Non-visible metadata
-   Directory hierarchies

Compared to relational databases, for example, collections could be
considered analogous to tables and documents analogous to records. But
they are different: every record in a table has the same sequence of
fields, while documents in a collection may have fields that are
completely different.

* Document databases such as MongoDB and CouchDBare very similar to columnar databases but allow for much deeper nesting of information.  
* Performance is often an issue with these databases.  
* XML Stores are an example of a document database.  
* Document databases often use JavaScript as the native query language with data being exchanged between the client and the server using JSON object.   


### Columnar databases   

* Columnar databases, such as HBase, are similar to key-value databases in that they store keys with information.
* Rather than storing a single value, a columnar database stores multiple pieces of information – similar to a record.  
* The columns do not have to be of the same data type.  
* Unlike relational schemas, column-based stores do not require a pre-structured table.   
* Each record is comprised of one or more columns containing the information and each column of each record can be different.   
* Columnar databases allow very large and un-structured data to be managed.   


### Graph

This kind of database is designed for data whose relations are well
represented as a graph consisting of elements interconnected with a
finite number of relations between them. The type of data could be
social relations, public transport links, road maps, network topologies,
etc.

* Graph databases such as Neo4J use “graphs” with nodes and edges connecting each other through relationships.     
* These databases are best when the data is represented by “networks” and a “deep connection” must be tracked, such as social networks.    

### ACID database transactions
 
The characteristics of these four ACID properties as defined by Reuter and
Härder are as follows:

#### Atomicity

Atomicity requires that each transaction be \"all or nothing\": if one
part of the transaction fails, then the entire transaction fails, and
the database state is left unchanged. An atomic system must guarantee
atomicity in each and every situation, including power failures, errors
and crashes. To the outside world, a committed transaction appears (by
its effects on the database) to be indivisible (\"atomic\"), and an
aborted transaction does not happen.

#### Consistency

The consistency property ensures that any transaction will bring the
database from one valid state to another. Any data written to the
database must be valid according to all defined rules, including
constraints, cascades, triggers, and any combination thereof. This
does not guarantee correctness of the transaction in all ways the
application programmer might have wanted (that is the responsibility of
application-level code), but merely that any programming errors cannot
result in the violation of any defined rules.

#### Isolation

The isolation property ensures that the concurrent execution of
transactions results in a system state that would be obtained if
transactions were executed sequentially, i.e., one after the other.
Providing isolation is the main goal of concurrency control. Depending
on the concurrency control method (i.e., if it uses strict - as opposed
to relaxed - serializability), the effects of an incomplete
transaction might not even be visible to another transaction.

#### Durability

The durability property ensures that once a transaction has been
committed, it will remain so, even in the event of power loss,
crashes, or errors. In a relational database, for instance, once a
group of SQL statements execute, the results need to be stored
permanently (even if the database crashes immediately thereafter). To
defend against power loss, transactions (or their effects) must be
recorded in a non-volatile memory.

ACID provides principles governing how changes are applied to a database. In a very simplified way, it states):

(A) when you do something to change a database the change should work or fail as a whole  
(C) the database should remain consistent  
(I) if other things are going on at the same time they shouldn't be able to see things mid-update  
(D) if the system blows up (hardware or software) the database needs to be able to pick itself back up   


Relational databases usually guarantee ACID properties related to how reliably transactions (both reads and writes) are processed. MySQL and PostgreSQL are examples of database that provide these properties as a selling point.

The NoSQL movement trades off ACID compliance for other properties, such as 100% availability, or speed.  


### CAP theorem  

In theoretical computer science, the **CAP theorem**, also named
**Brewer\'s theorem** after computer scientist Eric Brewer, states
that it is impossible for a distributed data store to simultaneously
provide more than two out of the following three guarantees:

  *Consistency*                                           
  
Every read receives the most recent write or an error.

   
  *Availability*                
  
 Every request receives a (non-error) response – without guarantee that it contains the most recent write.   
 
 
 *Partition tolerance*  
 
 The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes
 
In other words, the CAP theorem states that in the presence of a network
partition, one has to choose between consistency and availability. Note
that consistency as defined in the CAP theorem is quite different from
the consistency guaranteed in ACID database transactions.


See _Please stop calling databases CP or AP_ (“CP” (consistent but not available under network partitions), “AP” (available but not consistent under network partitions)  [https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html](https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html)  

### SQL vs NoSQL Databases

**Reasons to use a SQL Database**

_ACID compliancy_ 

You need to ensure ACID compliancy (Atomicity, Consistency, Isolation, Durability). Generally, NoSQL databases sacrifice ACID compliancy for flexibility and processing speed.

_Stable structured data_ 

If your data is structured and unchanging, the time it takes to model and normalize your database and to create purely relatonal tables will allow one to ensure ACID compliancy.  


**Reasons to use a NoSQL Database**


* Storing large volumes of data that often have little to no structure.  

* Making the most of cloud computing and storage. 

Cloud-based storage is an excellent cost-saving solution, but requires data to be easily spread across multiple servers to scale up.


* Rapid development.   

NoSQL is naturally suited to Agile development, quick iterations, or or frequent updates to the data structure/schema.  


## GOTO 2012 • Introduction to NoSQL • Martin Fowler

![GOTO 2012 • Introduction to NoSQL • Martin Fowler](http://nikbearbrown.com/YouTube/MachineLearning/IMG/GOTO_2012_Introduction_to_NoSQL_Martin_Fowler.png)


GOTO 2012 • Introduction to NoSQL • Martin Fowler [https://youtu.be/qI_g07C_Q5I](https://youtu.be/qI_g07C_Q5I)



Update:  October 23, 2017