<h3> The Problem </h3>

My background before I entered the Masters in Business Analytics program was Seattle University was in event production and I continue to produce events to this day. Ticketing Providers take a large percentage of every transaction and I sought out alternate options to avoid what are often exporbitant fees.
While I was impressed by Linktree's 'Request Payment' Feature from a Patron/UX perspective, there were some disadvantages. As someone who throws events often with hundreds of attendees, this information wasn't provided
in a centralized database anywhere either in an online portal or as an exportable csv (at least at the time of this project). While using Linktree, which take a miniscule fee of 0.5% for Pro Plan Transactions would save at a minimumum hundreds of dollars in ticketing fees, this was a major issue that would have resulted in hours and hours of manual data entry.

<h3> The Solution </h3>
Every time a user submitted a payment, I received an email notification. While the information was buried inside the body of the email, it was there.
Google's Takeout feature provides an MBOX File containing all email information. From here, I converted this MBOX File to a CSV, but all the important information was contained in just a single cell on each row
and this information was structured in a way that didn't allow for easier parsing of information through a program such as Excel.

Having this information structured in a more coherent manner would allow for easier tracking of financial information for the event and allowed for a simpler list that could be used at the door when checking
patrons in. In addition, providing a streamlined CSV containing information such as the day of the week of a ticket purchase and the time would provide a suitable foundation for conducting further analysis. 

In [1]:
import pandas as pd

data = pd.read_csv(r'C:\Users\nickb\Documents\Research_TheNight\Great Day Night For It\Sep42020_Great Day For It\TicketExport1.csv')

In [2]:
data.head(1)

Unnamed: 0,Date,Subject,Message
0,27/08/2022 12:18,Request received! via Linktree,Linktree ( https://url1741.linktr.ee/ls/click?...


In [3]:
# Extracting Only Linktree Payment Emails from Email Export
data = data[data['Subject'] == 'Request received! via Linktree']

# Dropping Subject Colunmn
data = data.drop(columns=['Subject'])

# Extracting Tier Value
data['Ticket Tier'] = data['Message'].str.split('Tier', 1).str[1].str[1]

# Creating Email Column
data['Email'] = data['Message'].str.split('From', 1).str[1:]

import re as re

def find_email(text):
    email = re.findall(r'[\w\.-]+@[\w\.-]+',str(text))
    return ",".join(email)

data['Email'] = data['Email'].apply(lambda x: find_email(x))

# Removes first n character
data['Email'] = data['Email'].str[1:]

# Stripping Linktree Emails, so only users emails remain
data['Email'] = data['Email'].map(lambda x: x.rstrip(',support@linktr.ee'))

# Creating Name Column
data['Ticket Name'] = data['Message'].str.split('Special instructions or details', 1).str[0:2].str[1]

# Creating Amount Paid Column
data['Amount Paid'] = data['Message'].str.split('$', 1).str[0:2].str[1]
data['Amount Paid'] = data['Amount Paid'].str.split('\(USD').str[0]

# Converting Date Formats
data['Date'] = pd.to_datetime(data['Date'])
data['Date Purchased'] = data['Date'].dt.date
data['Time Purchased'] = data['Date'].dt.time
data['DOW Purchased'] = data['Date'].dt.day_name()

# Sorting in Chronological Order
data.sort_values(by=['Date Purchased', 'Time Purchased'])

# Found missing value.Lynn Nakamura had a missing value, this assigns it there
data.loc[data['Date'] == '2022-08-22 22:43:00','Amount Paid'] = 25.67

data["Amount Paid"] = pd.to_numeric(data['Amount Paid'], errors='ignore')

# Gets Rid of New Lines, Back Slashes, etc.
data = data.replace('\r','', regex=True)
data = data.replace('\n','', regex=True)
data = data.replace('-','', regex=True)

# Removes View at linktree information cluttering column
data['Ticket Name'] = data['Ticket Name'].str.split('View').str[0]

data['Email'] = data['Email'].str.split(',').str[0]



In [42]:
data.head(1)

Unnamed: 0,Date,Message,Ticket Tier,Email,Ticket Name,Amount Paid,Date Purchased,Time Purchased,DOW Purchased
0,2022-08-27 12:18:00,Linktree ( https://url1741.linktr.ee/ls/click?...,4,julianfrawley@gmail.com,Julian Frawleyjulianfrawley@gmail.com,30.0,2022-08-27,12:18:00,Saturday


While the solution isn't perfect (see 'Ticket Name'), it was a massive improvement on what came before.

In [136]:
# Exporting CSV
data.to_csv (r'C:\Users\nickb\Documents\Research_TheNight\Sep42020_Great Day For It\PowderTicketingAug16\powder_ticketing_afterprocessing.csv', index = False, header=True)