# Adobe Analytics - Analytics Data Feed

This script processes Adobe Analytics Analytics Data Feed lookup tables into a single clickstream data file.

This script assumes that each lookup file has two fields, a key and a value and replaces the original column values on hit_data with the value from the lookup table.

You can add more lookuk variables/tables by editing the lookup_variables dictionary. You may need to adjust paths for the data files.

<h2>Documentation</h2>

Unzip the Adobe Analytics clickstream data feed in a folder, copy this script to the folder and move all the tsv files into a <strong>data/</strong> folder.

Execute this script with python 2.7

<ul><li><a href="https://marketing.adobe.com/resources/help/en_US/reference/analytics-data-feed.html">Analytics Data  Feed</a></li>
    <li><a href="https://marketing.adobe.com/resources/help/en_US/reference/datafeeds-reference.html">Data Column Reference</a></li>
    <li><a href="https://github.com/joaolcorreia/Adobe-Analytics-Clickstream-Data-Feed">This script on GitHub</a>
</ul>


Author: <a href="https://joaocorreia.io">João Correia</a>


---

In [None]:
import pandas as pd

### Open the column headers

In [None]:
column_headers = pd.read_csv("data/column_headers.tsv", sep="\t")

### Open the events table with columns headers

In [None]:
hit_data = pd.read_csv("data/hit_data.tsv", sep="\t", names=column_headers, low_memory=False)

## List of lookup files

### Define all the lookup columns and their lookup files

In [None]:
lookup_variables = {
    'connection_type':{ 'file':'connection_type.tsv'}    
    ,'browser':{'file':'browser.tsv'}  
    ,'color': { 'file': 'color_depth.tsv'}
    ,'country': { 'file': 'country.tsv'}  
    ,'javascript': { 'file': 'javascript_version.tsv'}  
    ,'language': { 'file': 'languages.tsv'} 
    ,'os': {'file': 'operating_systems.tsv'}
    ,'plugins': {'file': 'plugins.tsv'}
    ,'ref_type': {'file': 'referrer_type.csv'}
    ,'resolution': {'file': 'resolution.tsv'}
    ,'search_engine': {'file': 'search_engines.tsv'}
    }



### Quick Fixes

#### referrer_types.tsv three columns
This file has three columns with a lowercased classification. We discard the last one, which is the one with less information.

In [None]:
referrer_type = pd.read_csv("data/referrer_type.tsv", sep="\t", usecols=[0,1],names=['key','value'])
referrer_type.to_csv("data/referrer_type.csv",sep="\t",index=False, header=False)

### Lookup Function

In [None]:
# Lookup and replace each lookup column value
# We assume lookup tables with two columns, key and value

def process_column (column_name, datafile):
    c = pd.read_csv("data/" + datafile, sep="\t", names=["key","value"],index_col='key')
    c = c.to_dict(orient='series')['value']
    mask = hit_data[column_name].isin(c.keys())
    hit_data.loc[mask, column_name] = hit_data.loc[mask, column_name].map(c)
    return "processed: " + datafile

### Process each lookup columns

In [None]:
for key, value in lookup_variables.iteritems():
    print process_column(key,value['file'])

### Save to a tab separated file

In [None]:
hit_data.to_csv("clickstream.tsv",sep="\t",index=False)