<a target="_parent" href="https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/transform/transform_relational_database.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Transform a Database with Gretel Relational

This notebook uses [Gretel Relational Transform](https://docs.gretel.ai/reference/relational) to redact Personal Identifiable Information (PII) in a sample telecommunications database. Gretel Workflows all you to connect directly to your data source to extract training data, train and generate data using one or more Gretel Models, and (optionally) write your generated data back to your data destination.

Try running the example below and compare the transformed vs real world data. With this sample database, the notebook takes approximately 5 minutes to run.

To run this notebook, you will need [your API key from the Gretel Console](https://console.gretel.ai/users/me/key).

**Telecom Database Schema**

<img src="https://gretel-blueprints-pub.s3.us-west-2.amazonaws.com/rdb/telecom_db.png"  width="70%" height="70%">

## Getting Started
These cells install `gretel-client`, import the required modules, define helper functions, and then prompt you to enter your API key to log into Gretel.

In [None]:
# Install required packages

%%capture
!pip install -Uqq gretel-client

In [None]:
# Imports

import pandas as pd
import yaml
import time

from gretel_client import configure_session
from gretel_client import create_or_get_unique_project
from gretel_client.config import get_session_config
from gretel_client.rest_v1.api.connections_api import ConnectionsApi
from gretel_client.rest_v1.api.workflows_api import WorkflowsApi
from gretel_client.rest_v1.models import (
    CreateConnectionRequest,
    CreateWorkflowRunRequest,
    CreateWorkflowRequest,
)
from gretel_client.workflows.logs import print_logs_for_workflow_run

In [None]:
# @title Helper functions
# Helpers for running workflows from the notebook


def run_workflow(config: str):
    """Create a workflow, and workflow run from a given yaml config. Blocks and
    prints log lines until the workflow reaches a terminal state.

    Args:
        config: The workflow config to run.
    """
    print("Validating actions in the config...")
    config_dict = yaml.safe_load(config)

    for action in config_dict["actions"]:
        print(f"Validating action {action['name']}")
        response = workflow_api.validate_workflow_action(action)
        print(f"Validation response: {response}")

    workflow = workflow_api.create_workflow(
        CreateWorkflowRequest(project_id=project.project_guid, config=config_dict, name=config_dict["name"])
    )

    workflow_run = workflow_api.create_workflow_run(
        CreateWorkflowRunRequest(workflow_id=workflow.id)
    )

    print(f"workflow: {workflow.id}")
    print(f"workflow run id: {workflow_run.id}")

    print_logs_for_workflow_run(workflow_run.id, session)

In [None]:
# Log into Gretel
configure_session(api_key="prompt", cache="yes", validate=True)

## Designate Project for your Relational Workflow

In [None]:
session = get_session_config()
connection_api = session.get_v1_api(ConnectionsApi)
workflow_api = session.get_v1_api(WorkflowsApi)

project = create_or_get_unique_project(name="Transform-Telecom-Database")

## Configure and Run your Relational Workflow
Gretel Workflows provide an easy to use, config driven API for automating and operationalizing Gretel. A Gretel Workflow is constructed by actions that are composed to create a pipeline for processing data with Gretel. To learn more, check out [our docs](https://docs.gretel.ai/reference/workflows).

### Define Source Data via Connector
Gretel Workflows work hand-in-hand with our connectors, allowing you to connect directly to the data that will be transformed. The first step in any workflow is a `read` action where the training data is extracted from your chosen connection.

For this example, we are using a sample MySQL source connection to read our input telecom database. To transform your own database, you can [create a connection in the Gretel Console](https://console.gretel.ai/connections) and replace the `input_connection_uid` parameter below with your own connection UID.

In [None]:
input_connection_uid = "sample_mysql_telecom" # @param {type:"string"}
connection_type = connection_api.get_connection(input_connection_uid).dict()['type']

### Define Workflow configuration

Workflows are defined using a YAML config that specifies the data connections and models used in a sequence of actions.

In this example, the workflow is composed by the following actions:
1. `mysql_destination` configured to extract a database via a MySQL connection.
2. `gretel_tabular` which redacts PII in the extracted database using the Gretel Transform.

While not included in this notebook, you can also chain different models together based on specific use cases or privacy needs. And, you can use a destination action to write the outputs of model(s) via a destination connection. Workflows can also be scheduled to run on a recurring basis using triggers.

To learn more about how to define Workflow configs, check out [our config syntax docs](https://docs.gretel.ai/reference/workflows/config-syntax).

In [None]:
workflow_config = f"""\
name: my-{connection_type}-workflow

actions:
  - name: extract
    type: {connection_type}_source
    connection: {input_connection_uid}
    config:
      sync:
        mode: full

  - name: model-train-run
    type: gretel_tabular
    input: extract
    config:
      project_id: {project.project_guid}
      train:
        model: "transform/transform_v2"
        dataset: "{{outputs.extract.dataset}}"
      run:
        num_records_multiplier: 1.0

"""
print(workflow_config)

### Run Workflow

In [None]:
run_workflow(workflow_config)

## View Results

In [None]:
# @title Outputs
# @markdown Download output artifacts by clicking link:
output_url = project.get_artifact_link(project.artifacts[-1]['key'])
print(output_url)

In [None]:
# @markdown Or view the results within the notebook by running this cell.
import urllib.request
urllib.request.urlretrieve(project.get_artifact_link(project.artifacts[-1]['key']), "/content/workflow-output.tar.gz")
!gunzip /content/workflow-output.tar.gz
!tar -xzvf /content/workflow-output.tar

In [None]:
#@title Compare Source and Transformed Table from Database
table = "location" #@param {type:"string"}
from IPython.display import display, HTML

source_table = pd.read_csv(f"https://gretel-blueprints-pub.s3.amazonaws.com/rdb/{table}.csv").head(10)
trans_table = pd.read_csv(f"/content/transformed_{table}.csv").head(10)

print("\033[1m Source Table:")
display(source_table)
print("\n\n\033[1m Transformed Table:")
display(trans_table)