![](https://raw.githubusercontent.com/matiasbattocchia/clases-aprendizaje-automatico/master/nosql/img/meli.png)

# Introducción a bases de datos NoSQL 2

## Replicación

Ver más: https://docs.mongodb.com/manual/replication

Un *replica set* en MongoDB es un grupo de procesos que mantienen el mismo *dataset*. Las réplicas provee redundancia y tolerancia a fallas, son la base de todos los *deploys* en producción.

![](img/replica_compass.png)

### Nodo primario

![](img/replica_set.svg)

El **nodo primario** recibe todas las operaciones de escritura y lectura; solo puede haber uno.

### Nodos secundarios

![](img/replica_secondary.svg)

Los **nodos secundarios** replican las operaciones del primario en sus datasets, de tal manera de reflejarlo.

Si el nodo primario no está disponible, un secundario iniciará ocupar su lugar.

### Tolerancia a fallas

![](img/replica_failover.svg)

Cuando un primario no se comunica con los otros nodos dentro de un cierto periodo (10 segundos por defecto), un secundario llama a elección para nominarse como el nuevo primario.

El *replica set* **no puede procesar operaciones de escritura** durante la elección.

### Write concern

![](img/replica_concern.svg)

Las operaciones de escritura requieren un **reconocimiento de persistencia**. Este reconocimiento es configurable; cuando su valor es "mayoría", la escritura se confirma cuando las operaciones se hayan propagado en la mayoría de los nodos.

### Preferencia de lectura


![](img/eventual_consistency.svg)

Opcionalmente, es posible habilitar la lectura desde nodos secundarios para aumentar la capacidad de lectura.

MongoDB se vuelve **eventualmente consistente**, permitiendo leer datos potencialmente desactualizados.

### Distribución geográfica

![](img/replica_zones.svg)

Distribuir nodos en distintos centros de datos mejora la redundancia y la tolerancia a fallas.

![](img/replica_read_preference.svg)

Los clientes pueden configurarse para preferir leer desde nodos secundarios para mejorar la latencia.

## Sharding

Ver más: https://docs.mongodb.com/manual/sharding

Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations. (Escalibilidad horizontal).

### Sharded cluster

![](img/sharding_cluster.svg)

consists of the following components:

* Shard: Each shard contains a subset of the sharded data. Each shard can be deployed as a replica set.
* Router: mongos: The mongos acts as a query router, providing an interface between client applications and the sharded cluster.
* Config servers: Config servers store metadata and configuration settings for the cluster.

In [None]:
### Connexión

![]()
Clients should never connect to a single shard in order to perform read or write operations.

Unsharded collections are stored on a primary shard. Each database has its own primary shard.



In [None]:
## Shard keys



In [12]:
hash('1')

-1644486365914666577

In [13]:
hash('2')

5573024216774967732

In [None]:
### Estrategia de sharding

#### Por rango

![](img/sharding_range.svg)

A range of shard keys whose values are "close" are more likely to reside on the same chunk. This allows for targeted operations 

In [None]:
#### Por hash

![](img/sharding_hash.svg)

Chunks

MongoDB partitions sharded data into chunks. Each chunk has an inclusive lower and exclusive upper range based on the shard key.

In an attempt to achieve an even distribution of chunks across all shards in the cluster, a balancer runs in the background to migrate chunks across the shards .

Shard Key Index

MongoDB uses the shard key to distribute the collection's documents across shards. The shard key consists of a field or multiple fields in the documents.

You select the shard key when sharding a collection.

A document's shard key value determines its distribution across the shards.

The choice of shard key affects the performance, efficiency, and scalability of a sharded cluster.


Targeted Operations vs. Broadcast Operations

Generally, the fastest queries in a sharded environment are those that mongos route to a single shard, using the shard key.

For queries that don't include the shard key, mongos must query all shards.


## Teorema CAP

![](img/cap_parts.png)

Fuente: https://cryptographics.info/cryptographics/blockchain/cap-theorem

![](img/cap_triangle.png)

### MongoDB

* Por defecto, todas las operaciones de lectura y escritura van al nodo primario — **consistencia fuerte**.
* Maneja fallas en la red, guardando la misma data en nodos secundarios — **tolerancia a la partición**.
* Compromete la **disponibilidad** durante la votación de un nuevo nodo primario.

El *Domain Name System* (DNS) de internet es un buen ejemplo de sistema de alta disponibilidad y consistencia eventual.

## Características de las bases NoSQL

Naturaleza distribuida
* Escalabilidad horizontal (almacenamiento y cómputo)
* Tolerancia a la partición
* Disponibilidad (solo algunas DBs)

Modelos de datos flexibles
* Consultas más rápidas (datos que se acceden juntos se guardan juntos)
* Facilidad de desarrollo

### Posibles inconvenientes

* Falta de transacciones entre múltiples registros.
* Duplicación de información.

## Tipos de bases NoSQL

https://www.mongodb.com/nosql-explained

What are the Types of NoSQL Databases?

Over time, four major types of NoSQL databases emerged: document databases, key-value databases, wide-column stores, and graph databases. Let’s examine each type.

    Document databases store data in documents similar to JSON (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or objects, and their structures typically align with objects developers are working with in code. Because of their variety of field value types and powerful query languages, document databases are great for a wide variety of use cases and can be used as a general purpose database. They can horizontally scale-out to accomodate large data volumes. MongoDB is consistently ranked as the world’s most popular NoSQL database according to DB-engines and is an example of a document database. For more on document databases, visit What is a Document Database?.

    Key-value databases are a simpler type of database where each item contains keys and values. A value can typically only be retrieved by referencing its key, so learning how to query for a specific key-value pair is typically simple. Key-value databases are great for use cases where you need to store large amounts of data but you don’t need to perform complex queries to retrieve it. Common use cases include storing user preferences or caching. Redis and DynanoDB are popular key-value databases.

    Wide-column stores store data in tables, rows, and dynamic columns. Wide-column stores provide a lot of flexibility over relational databases because each row is not required to have the same columns. Many consider wide-column stores to be two-dimensional key-value databases. Wide-column stores are great for when you need to store large amounts of data and you can predict what your query patterns will be. Wide-column stores are commonly used for storing Internet of Things data and user profile data. Cassandra and HBase are two of the most popular wide-column stores.

    Graph databases store data in nodes and edges. Nodes typically store information about people, places, and things while edges store information about the relationships between the nodes. Graph databases excel in use cases where you need to traverse relationships to look for patterns such as social networks, fraud detection, and recommendation engines. Neo4j and JanusGraph are examples of graph databases.