# Mailshake transformation script #
This transformation script reads all opens and clicks from your Mailshake campaigns and outputs a simplified CSV for each in the `etl-output` folder. This is intended as a starting point to manipulate the data.

In [1]:
import gluestick as gs
import pandas as pd
import os

Let's establish the standard hotglue input/output directories

In [2]:
# standard directory for hotglue
ROOT_DIR = os.environ.get("ROOT_DIR", ".")
INPUT_DIR = f"{ROOT_DIR}/sync-output"
OUTPUT_DIR = f"{ROOT_DIR}/etl-output"

Let's start by reading the data. 

We will use the [gluestick](https://pypi.org/project/gluestick/) package to read the raw data in the input folder into a dictionary of pandas dataframes using the `read_csv_folder` function.

In [53]:
# Read input data
input_data = gs.read_csv_folder(INPUT_DIR)

Let's parse the JSON objects (campaign + recipient)

In [70]:
clicks = (input_data["clicks"]
        .pipe(gs.explode_json_to_rows, "campaign", max_level=1)
        .pipe(gs.explode_json_to_rows, "recipient", max_level=1)
        .pipe(lambda x: x.rename(columns={'recipient.id': 'target', 'campaign.title': 'key', 'actionDate': 'date', 'id': 'value'})))

opens = (input_data["opens"]
        .pipe(gs.explode_json_to_rows, "campaign", max_level=1)
        .pipe(gs.explode_json_to_rows, "recipient", max_level=1)
        .pipe(lambda x: x.rename(columns={'recipient.id': 'target', 'campaign.title': 'key', 'actionDate': 'date', 'id': 'value'})))

Let's get rid of any duplicate data (same recipient opening the email twice)

In [72]:
clicks = clicks.astype({'date': 'datetime64'}).drop_duplicates(subset=['key', 'target'])[['value', 'date', 'key']]
opens = opens.astype({'date': 'datetime64'}).drop_duplicates(subset=['key', 'target'])[['value', 'date', 'key']]

We want to monitor month over month change in campaign interaction

In [73]:
clicks['date'] = clicks['date'].dt.to_period('M')
opens['date'] = opens['date'].dt.to_period('M')

In [74]:
clicks = clicks.groupby(by=['key', 'date']).count()
opens = opens.groupby(by=['key', 'date']).count()

In [76]:
clicks.to_csv(f"{OUTPUT_DIR}/Clicks.csv")
opens.to_csv(f"{OUTPUT_DIR}/Opens.csv")