# Tealium EventDB data analysis

We have our real-time data analysed, but how about analyzing the historical raw data? Follow with the next steps to see how it can be done on Databricks. 

In our simple analysis we'll get the raw event data from Tealium EventDB and visualize a "conversion funnel" for ecommerce checkout.

## Loading data from CSV

Tealium EventDB stores data on AWS Redshift, so we need to load it first. While it is possible to connect Databricks directly to AWS we will go the easy way and just read a CSV file for now.

In [3]:
# Pandas is fully supported out-of-the-box, but doesn't support Spark APIs. If you need performance use either native PySpark or Koalas (https://github.com/databricks/koalas) 
import pandas as pd

# Load raw dta
df = pd.read_csv("https://raw.githubusercontent.com/stanasiukcom/ptc-tealium-azure/master/data/ptc_events.csv")

In [4]:
# Inspect our data set
df.head()

## Analyzing checkout funnel

In [6]:
# Subset only billing data
df_billing = df[df['udo_form_name'] == 'billing'][['eventid', 'udo_form_name', 'udo_field_name', 'udo_field_index']]
df_billing

In [7]:
# Get counts of field interactions

df_billing_counts = df_billing['udo_field_index'].value_counts().reset_index()
df_billing_counts.columns = ['field_index', 'occurences']

# Sort them according with their actual position in the form
df_billing_counts = df_billing_counts.sort_values(by=['field_index'])

df_billing_counts

In [8]:
# Display a "conversion funnel"
display(df_billing_counts)