# Gretel Transformers Walkthrough

Welcome to the Gretel Transformers walkthrough! In this tutorial we will take you through the process of creating a data pipeline to apply a variety of transformations to your data.

This tutorial assumes you have already uploaded data to Gretel.

Let's get started!

## Configuration

- If using Google Colab, we recommend you change to a GPU runtime.

- Input your Gretel URI String

- Create your Gretel Synthetic Configuration Template
  - See [our documentation](https://gretel-synthetics.readthedocs.io/en/stable/api/config.html) for additional config options

In [None]:
from pathlib import Path
import getpass
import os
import pprint

pp = pprint.PrettyPrinter(indent=3)
gretel_uri = os.getenv("GRETEL_URI") or getpass.getpass("Your Gretel URI")

## Create a Gretel Project Instance

In the code below, we will utilize the gretel-client to create an instance of a project that will be used to syntesize data from. 

In [None]:
# !pip install gretel-client --upgrade
# !pip install "gretel-client[fpe]==0.7.0.rc2"

In [None]:
from gretel_client import project_from_uri
from gretel_client.demo_helpers import RandomTransformerPipeline

project = project_from_uri(gretel_uri)
random_pipeline = RandomTransformerPipeline(project)

In [None]:
# We can see how many records we've ingested and how many fields we've discovered, just to show the
# project is active.
print(f'Total Records Received: {project.record_count}\n')
print(f'Total Fields Discovered: {project.field_count}')

## Choose Entity types to transform

Gretel supports a range of transformations for numeric and string data.  Below we leverage methods of the Project class to find some representative fields and apply sample transformers to them.  First, let's go looking for some common entity types.


In [None]:
random_pipeline.detect_entities()

## Changing identifying entities with string transformations

Now we start building up our pipeline.  We will define transformers and then specify the fields they act on with a data path.  We will build up a list of these to make our data pipeline.  Let's start with some tranformations we might want to do on identifiers -- redact them, encrypt them, fake them or just drop them.  We will choose at random.


In [None]:
random_pipeline.build_anonymizing_transforms()

## Rounding numeric latitudes and longitudes

Let's keep going.  We will use some of the same transformers for string locations.  For numeric, let's round the values.


In [None]:
random_pipeline.build_location_transforms()

## Shifting date values

Finally, in addition to the transformers above, you can also shift dates.  Here we keep it simple, but there are options to modify the shift based on another input field and you can specify other formats.


In [None]:
random_pipeline.build_date_shift_transforms()

## The pipeline in action, er, book with a pipeline animal

Now we can create our data pipeline.  We will run some sample records through it.


In [None]:
from gretel_client.transformers import DataTransformPipeline, DataRestorePipeline, DataPath

# Add one last catch all.  If you have a noisy data set you could make this another DropConfig. This will pass
# any un-transformed field through the pipeline unchanged.
random_pipeline.data_paths.append(DataPath(input="*"))

# Make the pipe
pipe = DataTransformPipeline(random_pipeline.data_paths)

# Bonus trick for the end
restore_pipe = DataRestorePipeline(random_pipeline.data_paths)


In [None]:
# Sample records from your project
records = project.sample()
pp.pprint(records)

In [None]:
# Those same records transformed
transformed_records = []
for rec in records:
    transformed_records.append(pipe.transform_record(rec))
pp.pprint(transformed_records)

In [None]:
# Recover anything that used the SECRET
restored_records = [restore_pipe.transform_record(rec) for rec in transformed_records]
print(restored_records)