# Medium transformation script #
This transformation script reads the Medium CSV generated by chrome extension into a format recognizable by Databox.

In [1]:
import gluestick as gs
import pandas as pd
import os

Let's establish the standard hotglue input/output directories

In [2]:
# standard directory for hotglue
ROOT_DIR = os.environ.get("ROOT_DIR", ".")
INPUT_DIR = f"{ROOT_DIR}/sync-output"
OUTPUT_DIR = f"{ROOT_DIR}/etl-output"

Let's start by reading the data. 

We will use the [gluestick](https://pypi.org/project/gluestick/) package to read the raw data in the input folder into a dictionary of pandas dataframes using the `read_csv_folder` function.

In [3]:
# Read input data
input_data = gs.read_csv_folder(INPUT_DIR)

Let's write the output with a normalized key, and print the first 5 rows of data.

In [32]:
views = input_data["Medium"][['Views', 'Date']]
reads = input_data["Medium"][['Reads', 'Date']]

In [33]:
views = views.astype({'Date': 'datetime64'})
reads = reads.astype({'Date': 'datetime64'})

In [34]:
views['Date'] = views['Date'].dt.to_period('M')
reads['Date'] = reads['Date'].dt.to_period('M')

In [35]:
views = views.groupby(by=['Date']).sum()
reads = reads.groupby(by=['Date']).sum()

In [36]:
views['key'] = '-'
reads['key'] = '-'

In [37]:
views = views.reset_index()
reads = reads.reset_index()

In [38]:
views = views.rename(columns={'Date': 'date', 'Views': 'value'})
reads = reads.rename(columns={'Date': 'date', 'Reads': 'value'})

In [42]:
views.to_csv(f"{OUTPUT_DIR}/MediumViews.csv", index=False)
reads.to_csv(f"{OUTPUT_DIR}/MediumReads.csv", index=False)