## KSUID (K-Sortable Unique Identifiers)

KSUID es la abreviatura de K-Sortable Unique IDentifier. Es un tipo de identificador único global similar a un UUID RFC 4122, construido desde cero para ser ordenado "naturalmente" por generación de marca de tiempo sin ningún tipo de lógica especial.

Fue desarrollado originalmente por [Segment](https://segment.com/) y ahora es de código abierto. Segment lo usa para identificar de forma única los eventos de los usuarios en su plataforma.

### ID únicos y ordenables

Una necesidad común es tener IDs únicos y ordenables.  Esto ocurre cuando se necesita un identificador único para un elemento (e idealmente un mecanismo que sea amigable con la URL) pero también se quiere poder ordenar un grupo de estos elementos cronológicamente.

Hay algunas opciones en este ámbito, pero yo prefiero la implementación de KSUID de la gente de Segment. Un KSUID es un un identificador único al que se antepone una marca de tiempo, pero que también contiene suficiente aleatoriedad para que las colisiones sean muy improbables. En total, se obtiene una cadena de 27 caracteres que es más única que un UUIDv4, al tiempo que conserva la ordenación lexicográfica.

**¿Por qué utilizar un KSUID?**

* Ordenable por fecha y hora
* 128 bits de datos aleatorios
* Representaciones portables y clasificables por lexografía

In [1]:
#!pip install svix-ksuid
from ksuid import Ksuid
from ksuid import KsuidMs

In [2]:
import boto3
from botocore.exceptions import ClientError
from spdynamodb import DynamoTable
import json
from decimal import Decimal
from datetime import datetime, timedelta
import time
import random

In [None]:
ksuid = Ksuid()
ksuidms = KsuidMs()

In [None]:
print(f"Base62: {ksuid}")
print(f"Datetime: {ksuid.datetime}")
print(f"Timestamp: {ksuid.timestamp}")
print(f"Payload: {ksuid.payload}")

In [None]:
print(f"Base62: {ksuidms}")
print(f"Datetime: {ksuidms.datetime}")
print(f"Timestamp: {ksuidms.timestamp}")
print(f"Payload: {ksuidms.payload}")

In [None]:
text = '2ScwNkkSeNpOwYxOUsYbr7t16qP'
ksuid_1 = Ksuid.from_base62(data=text)
print(f'Datetime: {ksuid_1.datetime}')
print(f'Timestamp: {ksuid_1.timestamp}')

### Ejemplo con DynamoDB

In [4]:
dt=DynamoTable()
try:
    dt.select_table('SampleSessionTable')
    print(dt)
except:
    dt.create_table(
        table_name='SampleSessionTable',
        partition_key='PK',
        partition_key_type='S'
    )

Table created successfully!


In [5]:
names = ['Alex DeBrie', 'James Dean', 'Laura Dern', 'Oliver Stone', 'Marlon Brando', 'Jane Fonda', 'Jack Nicholson', 'Meryl Streep', 'Robert DeNiro', 'Sigourney Weaver', 'Dustin Hoffman', 'Faye Dunaway', 'Harrison Ford', 'Katharine Hepburn', 'Al Pacino', 'Bette Davis', 'Gene Hackman', 'Ingrid Bergman', 'Anthony Hopkins', 'Greta Garbo', 'Tom Hanks', 'Jodie Foster', 'Cary Grant', 'Vivien Leigh', 'Denzel Washington', 'Grace Kelly', 'Daniel Day-Lewis', 'Sharon Stone', 'Sidney Poitier', 'Nicole Kidman', 'Spencer Tracy', 'Julia Roberts', 'James Stewart', 'Diane Keaton', 'Sean Penn', 'Natalie Portman', 'Gregory Peck', 'Cate Blanchett', 'Laurence Olivier', 'Julianne Moore', 'Humphrey Bogart', 'Kate Winslet', 'Peter OToole', 'Helen Mirren', 'Clark Gable', 'Charlize Theron', 'Paul Newman', 'Audrey Hepburn', 'Denzel Washington', 'Greta Garbo', 'Tom Hanks', 'Jodie Foster', 'Cary Grant', 'Vivien Leigh', 'Denzel Washington', 'Grace Kelly', 'Daniel Day-Lewis', 'Sharon Stone', 'Sidney Poitier', 'Nicole Kidman', 'Spencer Tracy', 'Julia Roberts', 'James Stewart', 'Diane Keaton', 'Sean Penn']
user = ['a_debrie', 'j_dean', 'l_dern', 'o_stone', 'm_brando', 'j_fonda', 'j_nicholson', 'm_streep', 'r_deniro', 's_weaver', 'd_hoffman', 'f_dunaway', 'h_ford', 'k_hepburn', 'a_pacino', 'b_davis', 'g_hackman', 'i_bergman', 'a_hopkins', 'g_garbo', 't_hanks', 'j_foster', 'c_grant', 'v_leigh', 'd_washington', 'g_kelly', 'd_day-lewis', 's_stone', 's_poitier', 'n_kidman', 's_tracy', 'j_roberts', 'j_stewart', 'd_keaton', 's_penn', 'n_portman', 'g_peck', 'c_blanchett', 'l_olivier', 'j_moore', 'h_bogart', 'k_winslet', 'p_otoole', 'h_mirren', 'c_gable', 'c_theron', 'p_newman', 'a_hepburn', 'd_washington', 'g_garbo', 't_hanks', 'j_foster', 'c_grant', 'v_leigh', 'd_washington', 'g_kelly', 'd_day-lewis', 's_stone', 's_poitier', 'n_kidman', 's_tracy', 'j_roberts', 'j_stewart', 'd_keaton', 's_penn']
all_items = []
pk = []

# Generate random date 
start_date = datetime(2023, 1, 1, 00, 00, 00)
end_date = datetime(2023, 12, 31, 23, 59, 59)
rand_date = [start_date + (end_date - start_date) * random.random() for i in range(int(1400))]

for i in range(len(user)):
    ksuid = Ksuid()
    name = random.choice(names)
    r_date = random.choice(rand_date)
    r_date_expire = r_date + timedelta(days=7)
    Item={
            'PK': str(ksuid),
            'UserName': user[i],
            'Name': names[i],
            'SessionToken': str(ksuid),
            'CreatedAt': r_date.strftime("%Y-%m-%d %H:%M:%S"),
            'ExpiredAt': r_date_expire.strftime("%Y-%m-%d %H:%M:%S")
    }
    all_items.append(Item)
    pk.append('POST#' + str(ksuid))

In [6]:
# Save to json file
with open('session_sample.json', 'w') as outfile:
    json.dump(all_items, outfile, indent=4)

# Write to DynamoDB table
dt.load_json('session_sample.json')

Data loaded successfully from session_sample.json.


### Crear un índice global secundario

In [15]:
dt.create_global_secondary_index(
    att_name="UserName",
    att_type="S",
    i_name="UserNameIndex"
)

status = dt.check_status_gsi()
if status == 'CREATING':
    print("Global secondary index is being created, this may take a few minutes...")
    start = time.time()
    while status == 'CREATING':
        status = dt.check_status_gsi()
        time.sleep(30)
end = time.time()
minute = (end - start) / 60
print("Global secondary index created. Time elapsed: {0:.2f} minute".format(minute))

Global secondary index is being created, this may take a few minutes...
Global secondary index created. Time elapsed: 10.11 minute


In [33]:
# Query
dynamodb_client = boto3.client('dynamodb', region_name='us-east-1')
user_name = 's_stone'

response = dynamodb_client.query(
    TableName='SampleSessionTable',
    IndexName='UserNameIndex',
    KeyConditionExpression='UserName = :user',
    ExpressionAttributeValues={
        ':user': {
            'S': user_name
        }
    }
)

#### Check tokens

In [34]:
utc_now = datetime.utcnow()
date_expire = utc_now - timedelta(days=7)
date_expire = date_expire.strftime("%Y-%m-%d %H:%M:%S")

In [35]:
print(user_name, "\n====================")
for item in response['Items']:
    if item['ExpiredAt']['S'] < date_expire:
        print(item['PK']['S'], item['ExpiredAt']['S'], "- Expired")
    else:
        print(item['PK']['S'], item['ExpiredAt']['S'])

s_stone 
2Sg1eNrVLHPFLpvOVcBvVsEH7yI 2023-08-04 11:44:45
2Sg1eS1fKUpwir5iTDxCZGvNZBP 2023-11-25 04:50:27


#### Ckeck session

In [20]:
user_name = 's_stone'
token = '2Sg1eNrVLHPFLpvOVcBvVsEH7yI'

response = dynamodb_client.get_item(
        TableName=dt.table_name,
        Key={
            'PK': {
                "S": token
            } 
        }
    )

if response.get('Item'):
    if response['Item']['UserName']['S'] == user_name:
        if response['Item']['ExpiredAt']['S'] > date_expire:
            print("Success!!")
        else:
            print("Token expired")
    else:
        print("User name not match with token")
else:
    print("Token not found in database")

Success!!


#### Delete all tokens from user

In [21]:
dynamodb_client = boto3.client('dynamodb', region_name='us-east-1')
user_name = 's_penn'

response = dynamodb_client.query(
    TableName='SampleSessionTable',
    IndexName='UserNameIndex',
    KeyConditionExpression='UserName = :user',
    ExpressionAttributeValues={
        ':user': {
            'S': user_name
        }
    }
)

In [22]:
for item in response['Items']:
    dynamodb_client.delete_item(
        TableName=dt.table_name,
        Key={
            'PK': {
                "S": item['PK']['S']
            }
        }
    )

In [23]:
response = dynamodb_client.query(
    TableName='SampleSessionTable',
    IndexName='UserNameIndex',
    KeyConditionExpression='UserName = :user',
    ExpressionAttributeValues={
        ':user': {
            'S': user_name
        }
    }
)

print("Tokens from", user_name, "-", len(response['Items']))

#### Delete all expired tokens

In [11]:
params = {
    "TableName": dt.table_name,
    "FilterExpression": "#3deb0 < :3deb0",
    "ExpressionAttributeNames": {"#3deb0": "CreatedAt"},
    "ExpressionAttributeValues": {":3deb0": {"S":date_expire}},
	"Limit": 100
}

In [13]:
total_items = 100
count = 0

while total_items > 0:
    last_evaluated = dynamodb_client.scan(**params) 
    len_items = len(last_evaluated['Items'])
    if len_items == 0 and count == 0:
        print("No items to delete.")
        break
    elif len_items == 0 and count != 0:
        print("Total items deleted:", count)
        break
    else:
        print("Scanned {0} items".format(len_items))
        for item in last_evaluated['Items']:
            try:
                dynamodb_client.delete_item(
                    TableName=dt.table_name,
                    Key={
                        'PK': item['PK']
                    }
                )
                count += 1
            except ClientError as error:
                print(f"Something went wrong while updating item {item['PK']} - {item['SK']}")
                print(error.response['ResponseMetadata'])
            
        if last_evaluated.get('LastEvaluatedKey'):
            if params.get('ExclusiveStartKey') == last_evaluated.get('LastEvaluatedKey'):
                break
            params['ExclusiveStartKey'] = last_evaluated.get('LastEvaluatedKey')
    total_items -= len_items
    if total_items <= 0:
        print("Total items deleted:", count)

No items to delete.


#### Create new token

In [39]:
user_name = 's_stone'

response = dynamodb_client.query(
    TableName='SampleSessionTable',
    IndexName='UserNameIndex',
    KeyConditionExpression='UserName = :user',
    ExpressionAttributeValues={
        ':user': {
            'S': user_name
        }
    }
)
token = response['Items'][0]['PK']['S']

In [41]:
r_date = random.choice(rand_date)
r_date_expire = r_date + timedelta(days=7)
try:
    response = dynamodb_client.put_item(
        TableName=dt.table_name,
        Item={
            'PK': {
                'S': token
            },
            'CreatedAt': {
                'S': r_date.strftime("%Y-%m-%d %H:%M:%S")
            },
            'ExpiredAt': {
                'S': r_date_expire.strftime("%Y-%m-%d %H:%M:%S")
            },
            'UserName': {
                'S': 's_penn'
            },
            'Name': {
                'S': 'Sean Penn'
            }
        },
        ConditionExpression='attribute_not_exists(PK)'
    ) 
except ClientError as error:
    if error.response['Error']['Code'] == 'ConditionalCheckFailedException':
        print("Token already exists")
    else:
        print(f"Something went wrong while updating item {item['PK']} {error}")
        