## De una base de datos relacional a una única tabla de DynamoDB

De todas las sesiones que he visto en AWS re:Invent 2018, mi favorita es sin duda esta desconcertante descarga de conocimientos NoSQL del tecnólogo principal de AWS y mago certificado del espacio exterior Rick Houlihan.

Rick destapa una caja de Pandora que muchos de los que diseñamos tablas de DynamoDB intentamos evitar: el hecho de que DynamoDB no es solo un almacén de valores clave para simples búsquedas de elementos. Si se diseña correctamente, una única tabla de DynamoDB puede gestionar los patrones de acceso de una base de datos relacional legítima de varias tablas sin sudar la gota gorda.

Esa pequeña frase "diseñada adecuadamente" es la advertencia, por supuesto. El vídeo de Rick, y la documentación relacionada en la que sospecho que ha participado, están repletos de consejos sobre cómo construir una tabla de DynamoDB que iguale el rendimiento de consulta de su base de datos relacional a una escala horizontal arbitraria. Sin embargo, no voy a mentir, es un material pesado, especialmente para nosotros, los magos del espacio exterior no certificados.

Por eso, en este artículo, quiero explicar paso a paso algunas consideraciones sobre el diseño de tablas individuales de DynamoDB. No cubriremos todos los patrones de diseño posibles, pero espero que empiece a hacerse una idea de los posibles casos de uso y las inevitables compensaciones. Concluiremos con la pregunta definitiva: ¿es todo esto una buena idea cuando las bases de datos relacionales siguen estando ahí?

### De RDB a DynamoDB: un ejemplo práctico
¿Qué base de datos relacional deberíamos dinamizar? Decidí utilizar el ejemplo más SQL que se me ocurrió: Northwind, la clásica base de datos relacional utilizada para enseñar el producto Microsoft Access allá por los años 90.
Aquí está el ERD completo de Northwind. No es enorme, pero es al menos tan complejo como los requisitos de datos de muchos microservicios modernos que quizá desee respaldar con DynamoDB.

### Patrones de búsqueda

* Obtener empleado por ID de empleado
* Obtener los subordinados directos de un empleado
* Obtener productos descatalogados
* Listar todos los pedidos de un producto determinado
* Obtener los 25 pedidos más recientes
* Obtener expedidores (*shippers*) por nombre
* Obtener clientes por nombre de contacto
* Listar todos los productos incluidos en un pedido
* Obtener proveedores (*suppliers*) por país y región

In [1]:
import pandas as pd
import numpy as np
from spdynamodb import DynamoTable
import json
from decimal import Decimal
from time import sleep
import math
import boto3

In [3]:
dt = DynamoTable()
try:
    dt.select_table('northwind')
    print(dt)
except:
    dt.create_table(
        table_name='northwind',
        partition_key='PK',
        partition_key_type='S',
        sort_key="SK",
        sort_key_type="S",
        provisioned=True,
        rcu=1,
        wcu=1        
)

In [4]:
dt.create_global_secondary_index(
    att_name="GSI1-PK",
    att_type="S",
    sort_index="GSI1-SK",
    sort_type="S",
    i_name="GSI1"
)

### Employees table

In [5]:
df_employees = pd.read_csv('northwind-data/employees.csv')
df_employees['PK'] = "EMPLOYEE#" + df_employees['employeeID'].astype(str)
df_employees['SK'] = "EMPLOYEE#" + df_employees['employeeID'].astype(str)

def func_map(x):
    return {
        'Address': {
            'City': x['city'],
            'Country': x['country'],
            'PostalCode': x['postalCode'],
            'Region': x['region'],
            'Street': x['address']
        },
        'Phone': x['homePhone'],
        'Extension': x['extension']
    }

df_employees['address_map'] = df_employees.apply(lambda x: func_map(x), axis=1)
df_employees.drop(columns=['titleOfCourtesy', 'photo', 'photoPath', 'city', 'region', 'country', 'postalCode', 'address', 'homePhone', 'extension'], inplace=True)
df_employees.rename(columns={'address_map': 'address'}, inplace=True)
df_employees['birthDate'] = pd.to_datetime(df_employees['birthDate']).astype(str)
df_employees['hireDate'] = pd.to_datetime(df_employees['birthDate']).astype(str)

def func_map_gsi_pk(x):
    if math.isnan(x['reportsTo']):
        return "null"
    else:
        return "EMPLOYEE#" + str(int(x['reportsTo']))

def func_map_gsi_sk(x):
    if math.isnan(x['reportsTo']):
        return "null"
    else:
        return "EMPLOYEE#" + str(int(x['employeeID']))

df_employees['GSI1-PK'] = df_employees.apply(lambda x: func_map_gsi_pk(x), axis=1)
df_employees['GSI1-SK'] = df_employees.apply(lambda x: func_map_gsi_sk(x), axis=1)
# Replace nan with None
df_employees.replace({np.nan: None}, inplace=True)
df_employees['EntityType'] = "employee"
df_employees.head()

Unnamed: 0,employeeID,lastName,firstName,title,birthDate,hireDate,notes,reportsTo,PK,SK,address,GSI1-PK,GSI1-SK,EntityType
0,1,Davolio,Nancy,Sales Representative,1948-12-08,1948-12-08,Education includes a BA in psychology from Col...,2.0,EMPLOYEE#1,EMPLOYEE#1,"{'Address': {'City': 'Seattle', 'Country': 'US...",EMPLOYEE#2,EMPLOYEE#1,employee
1,2,Fuller,Andrew,Vice President Sales,1952-02-19,1952-02-19,Andrew received his BTS commercial in 1974 and...,,EMPLOYEE#2,EMPLOYEE#2,"{'Address': {'City': 'Tacoma', 'Country': 'USA...",,,employee
2,3,Leverling,Janet,Sales Representative,1963-08-30,1963-08-30,Janet has a BS degree in chemistry from Boston...,2.0,EMPLOYEE#3,EMPLOYEE#3,"{'Address': {'City': 'Kirkland', 'Country': 'U...",EMPLOYEE#2,EMPLOYEE#3,employee
3,4,Peacock,Margaret,Sales Representative,1937-09-19,1937-09-19,Margaret holds a BA in English literature from...,2.0,EMPLOYEE#4,EMPLOYEE#4,"{'Address': {'City': 'Redmond', 'Country': 'US...",EMPLOYEE#2,EMPLOYEE#4,employee
4,5,Buchanan,Steven,Sales Manager,1955-03-04,1955-03-04,Steven Buchanan graduated from St. Andrews Uni...,2.0,EMPLOYEE#5,EMPLOYEE#5,"{'Address': {'City': 'London', 'Country': 'UK'...",EMPLOYEE#2,EMPLOYEE#5,employee


In [6]:
dt.batch_pandas(dataframe=df_employees)

### Shippers table

In [33]:
df_shippers = pd.read_csv('northwind-data/shippers.csv')
df_shippers['PK'] = "SHIPPER"
df_shippers['SK'] = "SHIPPER#" + df_shippers['shipperID'].astype(str)
df_shippers['GSI1-PK'] = "SHIPPER"
df_shippers['GSI1-SK'] = "SHIPPER#" + df_shippers['companyName'].astype(str) + "#" + df_shippers['shipperID'].astype(str)
df_shippers['EntityType'] = "shipper"
df_shippers.head()

Unnamed: 0,shipperID,companyName,phone,PK,SK,GSI1-PK,GSI1-SK,EntityType
0,1,Speedy Express,(503) 555-9831,SHIPPER,SHIPPER#1,SHIPPER,SHIPPER#Speedy Express#1,shipper
1,2,United Package,(503) 555-3199,SHIPPER,SHIPPER#2,SHIPPER,SHIPPER#United Package#2,shipper
2,3,Federal Shipping,(503) 555-9931,SHIPPER,SHIPPER#3,SHIPPER,SHIPPER#Federal Shipping#3,shipper


In [34]:
dt.batch_pandas(dataframe=df_shippers)

### Orders Table

In [10]:
df_orders = pd.read_csv('northwind-data/orders.csv')
df_orders['PK'] = "ORDER#" + df_orders['orderID'].astype(str)
df_orders['SK'] = "CUSTOMER#" + df_orders['customerID'].astype(str)
df_orders['EntityType'] = "order"

company_shippers = df_shippers['companyName'].to_dict()
df_orders['shipVia'] = df_orders['shipVia'].map(company_shippers)
df_orders['orderDate'] = pd.to_datetime(df_orders['orderDate']).astype(str)
df_orders['requiredDate'] = pd.to_datetime(df_orders['requiredDate']).astype(str)
df_orders['shippedDate'] = pd.to_datetime(df_orders['shippedDate']).astype(str)
df_orders['GSI1-PK'] = "ORDER"
df_orders['GSI1-SK'] = "ORDERDATE#" + pd.to_datetime(df_orders['orderDate'], format='%Y-%m-%d').astype(str) + "#" + df_orders['orderID'].astype(str)
df_orders.replace({np.nan: None}, inplace=True)
df_orders.head()

Unnamed: 0,orderID,customerID,employeeID,orderDate,requiredDate,shippedDate,shipVia,freight,shipName,shipAddress,shipCity,shipRegion,shipPostalCode,shipCountry,PK,SK,EntityType,GSI1-PK,GSI1-SK
0,10248,VINET,5,1996-07-04,1996-08-01,1996-07-16,,32.38,Vins et alcools Chevalier,59 rue de l'Abbaye,Reims,,51100,France,ORDER#10248,CUSTOMER#VINET,order,ORDER,ORDERDATE#1996-07-04#10248
1,10249,TOMSP,6,1996-07-05,1996-08-16,1996-07-10,United Package,11.61,Toms Spezialitäten,Luisenstr. 48,Münster,,44087,Germany,ORDER#10249,CUSTOMER#TOMSP,order,ORDER,ORDERDATE#1996-07-05#10249
2,10250,HANAR,4,1996-07-08,1996-08-05,1996-07-12,Federal Shipping,65.83,Hanari Carnes,Rua do Paço 67,Rio de Janeiro,RJ,05454-876,Brazil,ORDER#10250,CUSTOMER#HANAR,order,ORDER,ORDERDATE#1996-07-08#10250
3,10251,VICTE,3,1996-07-08,1996-08-05,1996-07-15,United Package,41.34,Victuailles en stock,2 rue du Commerce,Lyon,,69004,France,ORDER#10251,CUSTOMER#VICTE,order,ORDER,ORDERDATE#1996-07-08#10251
4,10252,SUPRD,4,1996-07-09,1996-08-06,1996-07-11,Federal Shipping,51.3,Suprêmes délices,Boulevard Tirou 255,Charleroi,,B-6000,Belgium,ORDER#10252,CUSTOMER#SUPRD,order,ORDER,ORDERDATE#1996-07-09#10252


In [12]:
dt.batch_pandas(dataframe=df_orders.sample(50))

### Orders Detail Table

In [35]:
# Order Details Table
df_order_details = pd.read_csv('northwind-data/order_details.csv')
df_order_details['PK'] = "ORDER#" + df_order_details['orderID'].astype(str)
df_order_details['SK'] = "PRODUCT#" + df_order_details['productID'].astype(str)
df_order_details['GSI1-PK'] = "PRODUCT#" + df_order_details['productID'].astype(str)
df_order_details['GSI1-SK'] = "ORDER#" + df_order_details['orderID'].astype(str)
df_order_details['EntityType'] = "orderItem"
df_order_details.drop(columns=['orderID', 'productID'], inplace=True)
df_orders.replace({np.nan: None}, inplace=True)
df_order_details.head()

Unnamed: 0,unitPrice,quantity,discount,PK,SK,GSI1-PK,GSI1-SK,EntityType
0,14.0,12,0.0,ORDER#10248,PRODUCT#11,PRODUCT#11,ORDER#10248,orderItem
1,9.8,10,0.0,ORDER#10248,PRODUCT#42,PRODUCT#42,ORDER#10248,orderItem
2,34.8,5,0.0,ORDER#10248,PRODUCT#72,PRODUCT#72,ORDER#10248,orderItem
3,18.6,9,0.0,ORDER#10249,PRODUCT#14,PRODUCT#14,ORDER#10249,orderItem
4,42.4,40,0.0,ORDER#10249,PRODUCT#51,PRODUCT#51,ORDER#10249,orderItem


In [37]:
dt.batch_pandas(dataframe=df_order_details)

### Customers table

In [17]:
df_customers = pd.read_csv('northwind-data/customers.csv')
df_customers['PK'] = "CUSTOMER"
df_customers['SK'] = "CUSTOMER#" + df_customers['customerID'].astype(str)
df_customers['GSI1-PK'] = "CUSTOMER"
df_customers['GSI1-SK'] = "CUSTOMER#" + df_customers['contactName'].astype(str) + "#" + df_customers['customerID'].astype(str)

def func_map(x):
    return {
        'Address': {
            'City': x['city'],
            'Country': x['country'],
            'PostalCode': x['postalCode'],
            'Region': x['region'],
            'Street': x['address']
        },
        'Fax': x['fax'],
        'Phone': x['phone']
    }
df_customers['Address'] = df_customers.apply(lambda x: func_map(x), axis=1)
df_customers['EntityType'] = "customer"
df_customers.drop(['city', 'country', 'postalCode', 'region', 'address', 'fax', 'phone'], axis=1, inplace=True)
df_orders.replace({np.nan: None}, inplace=True)
df_customers.head()

Unnamed: 0,customerID,companyName,contactName,contactTitle,PK,SK,GSI1-PK,GSI1-SK,Address,EntityType
0,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,CUSTOMER,CUSTOMER#ALFKI,CUSTOMER,CUSTOMER#Maria Anders#ALFKI,"{'Address': {'City': 'Berlin', 'Country': 'Ger...",customer
1,ANATR,Ana Trujillo Emparedados y helados,Ana Trujillo,Owner,CUSTOMER,CUSTOMER#ANATR,CUSTOMER,CUSTOMER#Ana Trujillo#ANATR,"{'Address': {'City': 'México D.F.', 'Country':...",customer
2,ANTON,Antonio Moreno Taquería,Antonio Moreno,Owner,CUSTOMER,CUSTOMER#ANTON,CUSTOMER,CUSTOMER#Antonio Moreno#ANTON,"{'Address': {'City': 'México D.F.', 'Country':...",customer
3,AROUT,Around the Horn,Thomas Hardy,Sales Representative,CUSTOMER,CUSTOMER#AROUT,CUSTOMER,CUSTOMER#Thomas Hardy#AROUT,"{'Address': {'City': 'London', 'Country': 'UK'...",customer
4,BERGS,Berglunds snabbköp,Christina Berglund,Order Administrator,CUSTOMER,CUSTOMER#BERGS,CUSTOMER,CUSTOMER#Christina Berglund#BERGS,"{'Address': {'City': 'Luleå', 'Country': 'Swed...",customer


In [19]:
dt.batch_pandas(dataframe=df_customers)

### Products table

In [21]:
df_products = pd.read_csv('northwind-data/products.csv')
df_products['PK'] = "PRODUCT"
df_products['SK'] = "PRODUCT#" + df_products['productID'].astype(str)
df_products['GSI1-PK'] = "PRODUCT"
df_products['GSI1-SK'] = "PRODUCT#DIS-" + df_products['discontinued'].astype(str) + "#" + df_products['productID'].astype(str)
df_products['discontinued'] = df_products['discontinued'].astype(bool)
df_products.replace({np.nan: None}, inplace=True)
df_products.head()

Unnamed: 0,productID,productName,supplierID,categoryID,quantityPerUnit,unitPrice,unitsInStock,unitsOnOrder,reorderLevel,discontinued,PK,SK,GSI1-PK,GSI1-SK
0,1,Chai,1,1,10 boxes x 20 bags,18.0,39,0,10,False,PRODUCT,PRODUCT#1,PRODUCT,PRODUCT#DIS-0#1
1,2,Chang,1,1,24 - 12 oz bottles,19.0,17,40,25,False,PRODUCT,PRODUCT#2,PRODUCT,PRODUCT#DIS-0#2
2,3,Aniseed Syrup,1,2,12 - 550 ml bottles,10.0,13,70,25,False,PRODUCT,PRODUCT#3,PRODUCT,PRODUCT#DIS-0#3
3,4,Chef Anton's Cajun Seasoning,2,2,48 - 6 oz jars,22.0,53,0,0,False,PRODUCT,PRODUCT#4,PRODUCT,PRODUCT#DIS-0#4
4,5,Chef Anton's Gumbo Mix,2,2,36 boxes,21.35,0,0,0,True,PRODUCT,PRODUCT#5,PRODUCT,PRODUCT#DIS-1#5


In [24]:
dt.batch_pandas(dataframe=df_products.sample(50))

### Suppliers table

In [38]:
df_suppliers = pd.read_csv('northwind-data/suppliers.csv')
df_orders.replace({np.nan: None}, inplace=True)
df_suppliers['PK'] = "SUPPLIER"
df_suppliers['SK'] = "SUPPLIER#" + df_suppliers['supplierID'].astype(str)
df_suppliers['GSI1-PK'] = "SUPPLIER"
df_suppliers['GSI1-SK'] = "SUPPLIER#" + df_suppliers['country'].astype(str) + "#" + df_suppliers['city'].astype(str) + "#" + df_suppliers['region'].astype(str)
df_suppliers['EntityType'] = "supplier"
df_suppliers.head()

Unnamed: 0,supplierID,companyName,contactName,contactTitle,address,city,region,postalCode,country,phone,fax,homePage,PK,SK,GSI1-PK,GSI1-SK,EntityType
0,1,Exotic Liquids,Charlotte Cooper,Purchasing Manager,49 Gilbert St.,London,,EC1 4SD,UK,(171) 555-2222,,,SUPPLIER,SUPPLIER#1,SUPPLIER,SUPPLIER#UK#London#nan,supplier
1,2,New Orleans Cajun Delights,Shelley Burke,Order Administrator,P.O. Box 78934,New Orleans,LA,70117,USA,(100) 555-4822,,#CAJUN.HTM#,SUPPLIER,SUPPLIER#2,SUPPLIER,SUPPLIER#USA#New Orleans#LA,supplier
2,3,Grandma Kelly's Homestead,Regina Murphy,Sales Representative,707 Oxford Rd.,Ann Arbor,MI,48104,USA,(313) 555-5735,(313) 555-3349,,SUPPLIER,SUPPLIER#3,SUPPLIER,SUPPLIER#USA#Ann Arbor#MI,supplier
3,4,Tokyo Traders,Yoshi Nagase,Marketing Manager,9-8 Sekimai Musashino-shi,Tokyo,,100,Japan,(03) 3555-5011,,,SUPPLIER,SUPPLIER#4,SUPPLIER,SUPPLIER#Japan#Tokyo#nan,supplier
4,5,Cooperativa de Quesos 'Las Cabras',Antonio del Valle Saavedra,Export Administrator,Calle del Rosal 4,Oviedo,Asturias,33007,Spain,(98) 598 76 54,,,SUPPLIER,SUPPLIER#5,SUPPLIER,SUPPLIER#Spain#Oviedo#Asturias,supplier


In [27]:
dt.batch_pandas(dataframe=df_suppliers)

In [44]:
# Orders Table
df_orders = pd.read_csv('northwind-data/orders.csv')
df_orders['PK'] = "O#" + df_orders['orderID'].astype(str)
df_orders['SK'] = "C#" + df_orders['customerID'].astype(str)
df_orders['EntityType'] = "order"
df_orders.drop(['orderID', 'customerID'], axis=1, inplace=True)

company_shippers = df_shippers['companyName'].to_dict()
df_orders['shipVia'] = df_orders['shipVia'].map(company_shippers)

In [82]:
# Categories table
df_categories = pd.read_csv('northwind-data/categories.csv')
df_categories['categoryID'] = "CATEGORIES#" + df_categories['categoryID'].astype(str)
df_categories

Unnamed: 0,categoryID,categoryName,description,picture
0,CATEGORIES#1,Beverages,Soft drinks coffees teas beers and ales,0x151C2F00020000000D000E0014002100FFFFFFFF4269...
1,CATEGORIES#2,Condiments,Sweet and savory sauces relishes spreads and s...,0x151C2F00020000000D000E0014002100FFFFFFFF4269...
2,CATEGORIES#3,Confections,Desserts candies and sweet breads,0x151C2F00020000000D000E0014002100FFFFFFFF4269...
3,CATEGORIES#4,Dairy Products,Cheeses,0x151C2F00020000000D000E0014002100FFFFFFFF4269...
4,CATEGORIES#5,Grains/Cereals,Breads crackers pasta and cereal,0x151C2F00020000000D000E0014002100FFFFFFFF4269...
5,CATEGORIES#6,Meat/Poultry,Prepared meats,0x151C2F00020000000D000E0014002100FFFFFFFF4269...
6,CATEGORIES#7,Produce,Dried fruit and bean curd,0x151C2F00020000000D000E0014002100FFFFFFFF4269...
7,CATEGORIES#8,Seafood,Seaweed and fish,0x151C2F00020000000D000E0014002100FFFFFFFF4269...


In [3]:
# Categories table
df_categories = pd.read_csv('northwind-data/categories.csv')
df_categories['categoryID'] = "CATEGORIES#" + df_categories['categoryID'].astype(str)
# Order Details table
df_order_details = pd.read_csv('northwind-data/order_details.csv')
df_order_details['orderID'] = "ORDERS#" + df_order_details['orderID'].astype(str)
# Orders table
df_orders = pd.read_csv('northwind-data/orders.csv')
df_orders['orderID'] = "ORDERS#" + df_orders['orderID'].astype(str)


In [8]:
df_customers

Unnamed: 0,customerID,companyName,contactName,contactTitle,PK,Address,MK
0,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,C#ALFKI,"{'Address': {'City': 'Berlin', 'Country': 'Ger...",customer
1,ANATR,Ana Trujillo Emparedados y helados,Ana Trujillo,Owner,C#ANATR,"{'Address': {'City': 'México D.F.', 'Country':...",customer
2,ANTON,Antonio Moreno Taquería,Antonio Moreno,Owner,C#ANTON,"{'Address': {'City': 'México D.F.', 'Country':...",customer
3,AROUT,Around the Horn,Thomas Hardy,Sales Representative,C#AROUT,"{'Address': {'City': 'London', 'Country': 'UK'...",customer
4,BERGS,Berglunds snabbköp,Christina Berglund,Order Administrator,C#BERGS,"{'Address': {'City': 'Luleå', 'Country': 'Swed...",customer
...,...,...,...,...,...,...,...
86,WARTH,Wartian Herkku,Pirkko Koskitalo,Accounting Manager,C#WARTH,"{'Address': {'City': 'Oulu', 'Country': 'Finla...",customer
87,WELLI,Wellington Importadora,Paula Parente,Sales Manager,C#WELLI,"{'Address': {'City': 'Resende', 'Country': 'Br...",customer
88,WHITC,White Clover Markets,Karl Jablonski,Owner,C#WHITC,"{'Address': {'City': 'Seattle', 'Country': 'US...",customer
89,WILMK,Wilman Kala,Matti Karttunen,Owner/Marketing Assistant,C#WILMK,"{'Address': {'City': 'Helsinki', 'Country': 'F...",customer


In [7]:
df_categories.rename(columns={'categoryID': 'PK', 'categoryName': 'SK', 'description': 'DATA'}, inplace=True)
#dt.batch_pandas(dataframe=df_categories)

NameError: name 'dt' is not defined

In [53]:
def convert_key(row):
    if row['region'] == 'nan':
        return str(row['country']).upper() + '#' \
            + str(row['region']).upper() + '#' \
            + str(row['city']).upper() + '#' \
            + str(row['address']).upper()
    else:
        return str(row['country']).upper() + '#' \
            + str(row['city']).upper() + '#' \
            + str(row['address']).upper()
            
df_customers['DATA'] = df_customers.apply(lambda x: convert_key(x), axis=1)
df_customers.rename(columns={'customerID': 'PK', 'contactName': 'SK'}, inplace=True)
dt.batch_pandas(dataframe=df_customers)