## Overview

Using Gretel's Transformation tools we walk-through an example of replacing a field value with a fake entity using the original value. These transformations are based on the source value, and transform values deterministically without requiring a lookup table or database. While this demonstration runs as a notebook, this same pipeline can be deployed into a variety of different data stacks.

A seed value is required for the transformers. Using the same seed will help ensure that you get the same fake value for any given input value.

In [1]:
!pip install -Uqq gretel-client


[K     |████████████████████████████████| 71kB 2.0MB/s 
[K     |████████████████████████████████| 1.0MB 7.3MB/s 
[K     |████████████████████████████████| 133kB 37.9MB/s 
[K     |████████████████████████████████| 61kB 6.2MB/s 
[K     |████████████████████████████████| 61kB 6.8MB/s 
[K     |████████████████████████████████| 133kB 31.0MB/s 
[K     |████████████████████████████████| 6.8MB 22.9MB/s 
[K     |████████████████████████████████| 71kB 7.4MB/s 
[?25h  Building wheel for smart-open (setup.py) ... [?25l[?25hdone
[31mERROR: google-colab 1.0.0 has requirement requests~=2.23.0, but you'll have requests 2.25.0 which is incompatible.[0m
[31mERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.[0m
[31mERROR: botocore 1.19.16 has requirement urllib3<1.26,>=1.25.4; python_version != "3.4", but you'll have urllib3 1.24.3 which is incompatible.[0m


In [2]:
from gretel_client.transformers import DataPath, DataTransformPipeline
from gretel_client.transformers import FakeConstantConfig

SEED = 8675309

SOURCE = [
    {
        "activity": "Wedding Crasher",
        "guest": "Seamus O'Toole",
        "location": "Washington DC",
    },
    {
        "activity": "Wedding Crasher",
        "guest": "Bobby O'Shea",
        "location": "Baltimore"
    },
]

# Deterministically replace field values with new, fake names and cities.
guest_xf = FakeConstantConfig(seed=SEED, fake_method="name")
location_xf = FakeConstantConfig(seed=SEED, fake_method="city")

paths = [
    DataPath(input="guest", xforms=[guest_xf]),
    DataPath(input="location", xforms=[location_xf]),
    DataPath(input="*"),
]

pipe = DataTransformPipeline(paths)

results = []

for record in SOURCE:
    results.append(pipe.transform_record(record))

assert results == [
    {
        "activity": "Wedding Crasher",
        "guest": "Sean Johnson",
        "location": "Smithtown"
    },
    {
        "activity": "Wedding Crasher",
        "guest": "Christopher Obrien",
        "location": "Katiebury",
    },
]

print(results)

[{'activity': 'Wedding Crasher', 'guest': 'Sean Johnson', 'location': 'Smithtown'}, {'activity': 'Wedding Crasher', 'guest': 'Christopher Obrien', 'location': 'Katiebury'}]
