# Tokenize Data with Transform Secrets Engine

Learn how the Transform secrets engine's data tokenization works to provide
  maximum resistance to data being compromised.

> **NOTE:** Transform secrets engine requires [Vault Enterprise Advanced Data
Protection (ADP)](https://www.hashicorp.com/products/vault/pricing/) license. To
explore Vault Enterprise features, you can sign up for a free 30-day trial from
[here](http://vaultproject.io/trial).

## Challenge

When encrypting sensitive data, preservation of the original data format or
length may be required to meet certain industry standards such as
[HIPAA](https://www.hhs.gov/hipaa/index.html) or
[PCI](https://www.pcisecuritystandards.org/). To fulfill this requirement, the
transform secrets engine performs [format preserving encryption
(FPE)](/tutorials/vault/transform?in=vault/adp).

There are organizations that care more about the irreversibility of the
tokenized data and not so much about preserving the original data format.
Therefore, the transform secrets engine's FPE transformation may not meet the
governance, risk and compliance (GRC) strategy they are looking for due to the
use of reversible cryptography to perform FPE.

Tokenization replaces sensitive data with unique values (tokens) that are unrelated to the original value.  Those tokens cannot risk exposing the plaintext satisfying the PCI-DSS guidance.

## Transit vs Transform Matrix

| FEATURE | TRANSIT | TRANSFORM FPE | TRANSFORM MASKING | TRANSFORM TOKENIZATION
| ---  | :-: | :-: | :-: | :-: |
| - | Two Way | Two Way | One Way | One Way (by default)
| Stateful | No | No | No | Yes <br>(internal/external)
| Format Preserved | No | Yes | Yes | No
| Custom Metadata | No | No | No | Yes
| Algorithm(s) | Multiple | NIST FF3-1 | N/A (pseudonymous) | AES256-GCM96
| Key Rotation | Yes | N/A | N/A | Yes
| Deduplication | Optional <br>(w/convergent encryption) | Optional <br>(w/ supplied tweak) | Yes | No
| Batch Input Support | Yes | Yes | Yes | Yes
| Entropy Augmentation Support | Yes | Yes | Yes | Yes

## Solution

Transform secrets engine has a data transformation method to **tokenize**
sensitive data stored outside of Vault. Tokenization replaces sensitive data
with unique values (tokens) that are unrelated to the original value in any
algorithmic sense. Therefore, those tokens cannot risk exposing the plaintext
satisfying the PCI-DSS guidance.

<img alt=Tokenization src=https://learn.hashicorp.com/img/vault/vault-tokenization-1.png width=640>

#### Characteristics of the tokenization transformation:

- **Non-reversible identification:** Protect data pursuant to requirements for
  data irreversibility (PCI-DSS, GDPR, etc.)

- **Integrated Metadata:** Supports metadata for identifying data type and
  purpose

- **Extreme scale and performance:** Support for performantly managing billions
  of tokens across clouds as well as on-premise

## Prerequisites

To perform the tasks described in this tutorial you need to have:

* Running Vault Enterprise **v1.6** or later with Advanced Data Protection module license
  * See [Start Vault Server](./100-Setup-Vault.ipynb)
  * If you are using cloning this repo, then the license file needs to be placed in `hc_demos-jupyter/HashiStack/vault/config/vault.hclic`
* Docker

> **NOTE:** To explore Vault Enterprise features, you can [sign up for a free
30-day trial](http://vaultproject.io/trial).

Customize the values for `VAULT_ADDR` and `VAULT_TOKEN` if needed.

In [None]:
export CONSUL_DC=west CONSUL_DC_2=east
export COMPOSE_PROJECT_NAME=hashi
export COMPOSE_FILE=docker-compose.yml:docker-compose-hashi.yml:docker-compose-hashi-dev.yml:docker-compose-proxy.yml

export VAULT_ADDR=http://127.0.0.1:8200
export VAULT_TOKEN=root

In [None]:
# Restart Vault Cluster
pushd ../HashiStack
docker-compose \
  up --force-recreate -d \
  vault_s1
popd

Confirm Vault Enterprise is up and licensed.

In [None]:
vault status
vault read sys/license

In [None]:
ls ../HashiStack/vault/config/vault.hclic

### Policy requirements

> **NOTE:** For the purpose of this tutorial, you can use `root` token to work
with Vault. However, it is recommended that root tokens are only used for just
enough initial setup or in emergencies. As a best practice, use tokens with
appropriate set of policies based on your role in the organization.

To perform all tasks demonstrated in this tutorial, your policy must include the
following permissions:

```hcl
# Work with transform secrets engine
path "transform/*" {
  capabilities = [ "create", "read", "update", "delete", "list" ]
}

# Enable secrets engine
path "sys/mounts/*" {
  capabilities = [ "create", "read", "update", "delete", "list" ]
}

# List enabled secrets engine
path "sys/mounts" {
  capabilities = [ "read", "list" ]
}
```

If you are not familiar with policies, complete the
[policies](https://learn.hashicorp.com/tutorials/vault/policies) tutorial.

## Setup the Transform secrets engine

Create a role named, `mobile-pay` which is attached to `credit-card`
transformation. The tokenized value has a fixed maximum time-to-live (TTL) of 24
hours.

Sample flow

<img src=https://upload.wikimedia.org/wikipedia/commons/thumb/3/3e/How_mobile_payment_tokenization_works.png/640px-How_mobile_payment_tokenization_works.png >

## Setup

### Enable the Transform secrets engine

In [None]:
vault secrets enable transform || true

### Roles, Transformations, Templates and Alphabets

### Create a Role

Create a role named `mobile-pay` which is attached to transformation named `credit-card`.  

<img src=https://learn.hashicorp.com/img/vault/vault-tokenization-2.png width=640 >

In [None]:
vault write transform/role/mobile-pay transformations=credit-card

The role is created but the `credit-card` transformation does not exist, yet.

### Create a transformation

Create a transformation named `credit-card`, which sets the generated token's time-to-live (TTL) to 24 hours.

In [None]:
vault write transform/transformation/credit-card \
  type=tokenization \
  max_ttl=24h \
  allowed_roles=mobile-pay

* The `max_ttl` is an optional parameter which allows you to control how long the token should stay valid.
* **NOTE:** Set the `allowed_roles` parameter to a wildcard (`*`) to allow all roles or with globs at the end for pattern matching (e.g. `mobile-*`).

Display details about the `credit-card` transformation.

In [None]:
vault read transform/transformations/tokenization/credit-card

Sample Output
```
Key              Value
---              -----
allowed_roles    [mobile-pay]
mapping_mode     default
max_ttl          0s
stores           [builtin/internal]
templates        <nil>
type             tokenization
```

Notice that the `type` is set to `tokenization`.

### Transform secrets

After the secrets engine is configured, this can be used to encode and decode input values

In [None]:
vault write transform/encode/mobile-pay value=1111-2222-3333-4444 \
  ttl=8h \
  metadata="type=Amex" \
  metadata="organization=HashiCorp"

In [None]:
vault write transform/encode/mobile-pay -format=json value=1111-2222-3333-4444 \
  transformation=credit-card \
  ttl=8h \
  metadata="type=Amex" \
  metadata="organization=HashiCorp" \
  metadata="Purpose=Travel" \
  | tee /tmp/tokenization.out

The `ttl` value is an optional parameter. Remember that the `max_ttl` was set to 24 hours when you created the `credit-card` transformation. You can overwrite that value to make the token's TTL to be shorter.

The output displays the encoded value.

```shell
Key              Value
---              -----
encoded_value    eRwUjS2L9dnBpuvRKGPvEq3399sm41GGXikoh1sNKivXxeyrej9vp2quuCULqSPpz7UTLgmtM
```

### Decode some input value using the /decode endpoint with a named role:

Set the generated token value in a `MY_TOKEN` environment variable for testing.

In [None]:
export MY_ENCODED_CCN=$(jq -r .data.encoded_value /tmp/tokenization.out)

Retrieve the metadata of the token.

In [None]:
vault write transform/metadata/mobile-pay value=$MY_ENCODED_CCN

Notice that `expiration_time` is displayed. Since you have overwritten the `max_ttl`, the `ttl` is set to 8 hours.

Validate the token value.

In [None]:
vault write transform/validate/mobile-pay value=$MY_ENCODED_CCN transformation=credit-card

Validate that the credit card number has been tokenized already.

In [None]:
vault write transform/tokenized/mobile-pay value=1111-2222-3333-4444 transformation=credit-card

Retrieve the original plaintext credit card value.

In [None]:
vault write transform/decode/mobile-pay value=$MY_ENCODED_CCN

Sample Output

```shell
Key              Value
---              -----
decoded_value    1111-2222-3333-4444
```

### Setup external token storage
Tokenization is a stateful procedure to facilitate mapping between tokens and various cryptographic values.  This could put a lot of load on the Vault's storage backend.  You have an option to use external storage to presist data for tokenization tranformation.

<img src="https://learn.hashicorp.com/img/vault/vault-tokenization-3.png" width=640 >

To demonstrate, run a PostgreSQL database. 
Create a new transformation, named "passport", which uses this PostgreSQL as its storage.


Unlike format preserving encryption (FPE) transformation, tokenization is a
stateful procedure to facilitate mapping between tokens and various
cryptographic values (one way HMAC of the token, encrypted metadata, etc.)
including the encrypted plaintext itself which must be persisted.

At scale, this could put a lot of additional load on the Vault's storage
backend. To avoid this, you have an option to use external storage to persist
data for tokenization transformation.

-> **NOTE:** Currently, PostgreSQL and MySQL are supported as external storage 
for tokenization.

To demonstrate, run a PostgreSQL database in a Docker container. Create a new
transformation named, "passport" which uses this PostgreSQL as its storage
rather than using the Vault's storage backend.

<img src="https://learn.hashicorp.com/img/vault/vault-tokenization-3.png" width=640 >

Run [PostgreSQL Docker image](https://hub.docker.com/_/postgres) in a
container.

Start a `postgres` instance which listens to port `5432`, and the superuser
(`root`) password is set to `rootpassword`.

In [None]:
export POSTGRES_USER=root
export POSTGRES_PASSWORD=rootpassword

pushd ../HashiStack && \
docker-compose \
  up --force-recreate -d \
  db && \
popd

Verify that the postgres container is running.

In [None]:
docker ps --format "table {{.ID}}\t{{.Image}}\t{{.Names}}\t{{.Ports}}"

```
CONTAINER ID        IMAGE            ...         PORTS                    NAMES
befcf913da91        postgres         ...         0.0.0.0:5432->5432/tcp   postgres
```

1.  Create a new role, "global-id".

In [None]:
vault write transform/role/global-id transformations=passport

```
Success! Data written to: transform/role/global-id
```

2.  Create a store which points to the postgres.

In [None]:
vault write transform/stores/postgres \
      type=sql \
      driver=postgres \
      supported_transformations=tokenization \
      connection_string="postgresql://{{username}}:{{password}}@db/root?sslmode=disable" \
      username=root \
      password=rootpassword

In [None]:
vault read transform/stores/postgres

3.  Create a schema in postgres to store tokenization artifacts.

In [None]:
vault write transform/stores/postgres/schema transformation_type=tokenization \
       username=root password=rootpassword

4.  Create a new transformation named, "passport" which points to the postgres
    store.

In [None]:
vault write transform/transformations/tokenization/passport \
       allowed_roles=global-id stores=postgres

5. Verify that there are no entries via the `postgres` container.

In [None]:
docker exec -it postgres psql -U root -c "select * from tokens;"

```shell
storage_token | key_version | ciphertext | encrypted_metadata | fingerprint | expiration_time
---------------+-------------+------------+--------------------+-------------+-----------------
(0 rows)
```

6. Encode some test data.

In [None]:
vault write transform/encode/global-id \
       transformation=passport \
       value="123456789"

**Example output:**

```plaintext
Key              Value
---              -----
encoded_value    Q4tYgFXHxUS3PnQLiUnyH2JfGeEZQDFXMMaFXLU6MZfiix1tjqwgNX
```

1.  From the postgres container, check the data entry.

In [None]:
docker exec -it postgres psql -U root -c "select * from tokens;"

**Example output:**

```shell
 storage_token        | key_version |       ciphertext          | encrypted_metadata | ...
--------------------------+-------------+---------------------------+--------------------+-...
\x128aa3c24699...snip... |           1 | \x1ee7cc3505e31...snip... |                    | ...
(1 row)
```

As you encode more data, the table entry grows.

## Summary

Transformation secrets engine introduced tokenization transformation feature
which replaces sensitive data with unique value (token) that are unrelated to
the original value in any algorithmic sense. This can help organizations to meet
certain industry standards.

If retaining the original data format is important, refer to the [Transform
Secrets Engine](https://learn.hashicorp.com/tutorials/vault/transform) to learn about the format preserving
encryption (FPE) transformation.

### Help and Reference

- [Transform Secrets Engine (API)](https://www.vaultproject.io/api-docs/secret/transform)
- [Transform Secrets Engine](https://www.vaultproject.io/docs/secrets/transform)

## Clean Up

In [None]:
docker stop postgres

In [None]:
pushd ../HashiStack && \
docker-compose down
popd

In [None]:
docker ps --format "table {{.ID}}\t{{.Image}}\t{{.Names}}\t{{.Ports}}"

## End