In [1]:
import os
import pandas as pd
import numpy as np
import json


#pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

## Overview

The Houston Rockets collect a wide range of data including ticket transactions, retail sales, and fan surveys. However, this data comes from various sources with differing formats, making it difficult to truly understand who our fans are and track how they’re interacting with the Houston Rockets.

Use the available data sources to create a unified database table (i.e., a single table that our Business Intelligence & Innovation team could leverage to build fan segments and determine their behaviors).

**Requirements**

Using your programming method of choice, create a unified database table that could be used as the basis for dashboards and reporting on fan segments and their behaviors.

At minimum, the database table should include the following:

- A unique identifier for each fan as the primary key
- Fan identifiers from each data source
- Fields containing contact information for each fan (email, phone number, and zip code)

The following calculated fields:
- Number of ticket transactions
- Number of retail transactions
- Number of survey responses

At least four additional calculated fields. For example:
- Average ticket price for each fan
- Fan total spend


Project is available on Github, or similar SVN service, with a README on how to locally view and/or run your project. If a private repo, which we would encourage, please add @mkamla as a collaborator when your project is ready for review.
Timely completion of the project. Preferably no more than 7 days from delivery of project details.

**Evaluation Criteria**

Aside from adherence to the requirements, below are specific aspects that will be evaluated:

- Inclusion of supporting files, documentation and scripts used to generate the unified table
- Thoughtful consideration to datapoints that are relevant to a sports, entertainment and/or event business

**Bonus Points**
Additional consideration will be given to projects that include details about your methodology or approach, insights uncovered, supplemental tables and creativity in incorporating external resources that are additive to the project requirements and may reside outside the scope of this document.

The difference between ordinary and extraordinary is a little extra.



## Acquistion

In [263]:
# Import JSON file
json_data = pd.read_json('retail.json', orient='columns')

# Import CSV files
survey_data = pd.read_csv('surveys.csv')
ticket_data = pd.read_csv('tickets.csv')


# Print the first few rows of each data frame

print("JSON - retail data:")
print(json_data.head())

print("CSV - survey data:")
print(survey_data.head())

print("CSV - ticket data :")
print(ticket_data.head())

JSON - retail data:
                                              retail
0  {'transaction_id': 1, 'email': 'user18@rockets...
1  {'transaction_id': 2, 'email': 'user142@rocket...
2  {'transaction_id': 3, 'email': 'user182@rocket...
3  {'transaction_id': 4, 'email': 'user492@rocket...
4  {'transaction_id': 5, 'email': 'user101@rocket...
CSV - survey data:
   Submission ID                                          Attribute            Value
0              1                                           phone_no     290-551-1299
1              1                                           event_id             3220
2              1             how_satisfied_were_you_with_this_event                2
3              1  how_satisfied_were_you_with_your_retail_experi...                3
4              1  how_likely_are_you_to_attend_this_event_in_the...  5 - Very Likely
CSV - ticket data :
   transaction_id  account_no                email    zip      phone_no  section  row  qty  total_price  event_id

In [3]:
print("Info - retail data :")
json_data.info()

print("Info - survey data :")
survey_data.info()

print("Info - ticket data :")
ticket_data.info()

Info - retail data :
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   retail  2000 non-null   object
dtypes: object(1)
memory usage: 15.8+ KB
Info - survey data :
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12000 entries, 0 to 11999
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Submission ID  12000 non-null  int64 
 1   Attribute      12000 non-null  object
 2   Value          12000 non-null  object
dtypes: int64(1), object(2)
memory usage: 281.4+ KB
Info - ticket data :
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   transaction_id  10000 non-null  int64 
 1   account_no      10000 non-null  object
 2   email           10

**Takeaways**

- Retail JSON 

    - No nulls.
    - Currently 2000 rows and 1 column.

     **Things to Do**

    - Need to convert to Dataframe with the following fields:
        - transaction_id, email, account_no , product_type, quantity, unit_price , shipping cost. 
        - Dataframe will have 2000 rows with 7 columns.
        

- Ticket data

    - No nulls
    - Dataframe is 10,000 rows and 11 columns

- Survey data

    - No nulls
    - Currently 12,000 rows and 3 columns.

    **Things to Do**
    
    - Need to pivot data on 'Submission ID' index and 'Attribute' being columns. Final shape will be determine after transforming the data table.
    - normalize some data fields

**MVP**

- Master Fan Dataframe

    - need Unique ID for each fan as primary key
    - ID for each data table 
    - Contact information for each fan
        - email
        - phone number
        - zip code
    - Fields calculating 
        - Number of ticket transactions
        - Number of retail transactions
        - Number of surveys completed
    - Additional fields (Need 4)
        - Avg ticket price per fan
        - Fan overall total (ticket + retail)
        - Mode of seating (section) attendence (Maybe) per fan
        - last additional field to be determine.



## Prepare the data

### Retail Data

In [4]:
# Read JSON data from file
with open('retail.json') as f:
    json_data = json.load(f)

# Extract data from JSON key 'retail'
data = json_data["retail"]

# Create DataFrame from JSON data
retail_data = pd.DataFrame(json_data["retail"], columns=["transaction_id",
                                 "email",
                                 "account_no",
                                 "product_type",
                                 "quantity",
                                 "unit_price",
                                 "shipping_cost"])

retail_data.head()


Unnamed: 0,transaction_id,email,account_no,product_type,quantity,unit_price,shipping_cost
0,1,user18@rockets.com,E894194JJ481,Jersey,2,96,5.76
1,2,user142@rockets.com,G684186GK636,Misc,5,9,1.35
2,3,user182@rockets.com,X898402TO472,Jersey,3,98,8.82
3,4,user492@rockets.com,R226999ZA574,Jersey,4,104,12.48
4,5,user101@rockets.com,Q640255YC818,Jersey,3,98,8.82


**To Do**

- Create new column of for transaction total
- rename account_no as retail_account_no . This will act as my identifier for fan in retail_data table. Will merge column in master fan data frame
- possibly merge fan unique identifier from unified database table after creation.

In [5]:
# Add new column with transaction total calculation
retail_data["transaction_total"] = retail_data["quantity"] * retail_data["unit_price"] + retail_data["shipping_cost"]
# Rename the 'account_no' column as 'retail_account_no'
retail_data = retail_data.rename(columns={'account_no': 'retail_account_no'})

retail_data

Unnamed: 0,transaction_id,email,retail_account_no,product_type,quantity,unit_price,shipping_cost,transaction_total
0,1,user18@rockets.com,E894194JJ481,Jersey,2,96,5.76,197.76
1,2,user142@rockets.com,G684186GK636,Misc,5,9,1.35,46.35
2,3,user182@rockets.com,X898402TO472,Jersey,3,98,8.82,302.82
3,4,user492@rockets.com,R226999ZA574,Jersey,4,104,12.48,428.48
4,5,user101@rockets.com,Q640255YC818,Jersey,3,98,8.82,302.82
...,...,...,...,...,...,...,...,...
1995,1996,user88@rockets.com,H383584PU325,Hat,2,24,1.44,49.44
1996,1997,user410@rockets.com,M618220JQ428,Misc,6,7,1.26,43.26
1997,1998,user326@rockets.com,L452536ZY996,Jersey,2,104,6.24,214.24
1998,1999,user193@rockets.com,U673743FK544,Misc,6,9,1.62,55.62


In [6]:
retail_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   transaction_id     2000 non-null   int64  
 1   email              2000 non-null   object 
 2   retail_account_no  2000 non-null   object 
 3   product_type       2000 non-null   object 
 4   quantity           2000 non-null   int64  
 5   unit_price         2000 non-null   int64  
 6   shipping_cost      2000 non-null   float64
 7   transaction_total  2000 non-null   float64
dtypes: float64(2), int64(3), object(3)
memory usage: 125.1+ KB


In [7]:
retail_data.describe()

Unnamed: 0,transaction_id,quantity,unit_price,shipping_cost,transaction_total
count,2000.0,2000.0,2000.0,2000.0,2000.0
mean,1000.5,3.485,41.2935,3.4487,115.6337
std,577.494589,2.251291,37.405988,3.114414,109.284326
min,1.0,1.0,3.0,1.0,4.0
25%,500.75,2.0,15.0,1.2,41.2
50%,1000.5,3.0,24.0,2.4,82.4
75%,1500.25,4.0,81.25,4.05,139.05
max,2000.0,10.0,120.0,14.4,494.4


**Takeaways**

- Converted the JSON object to an appropriate dataframe by extracting data from the 'retail' key.
- added new calculation column with the sum of transaction adding subtotal (quanitity * unit_price) and shipping cost.
- renamed 'account_no' column as 'retail_account_no' .

### Ticket Data

In [8]:
ticket_data.head()

Unnamed: 0,transaction_id,account_no,email,zip,phone_no,section,row,qty,total_price,event_id,channel
0,1,A87144476G,user400@rockets.com,77066,280-379-5220,109,9,1,200,3223,Web
1,2,A66578188Z,user141@rockets.com,76673,490-491-8071,101,10,4,800,3221,Box Office
2,3,A11689958W,user98@rockets.com,77031,244-805-9413,100,18,8,1600,3237,Box Office
3,4,A47432461Z,user213@rockets.com,76136,826-458-9773,400,7,1,50,3240,Web
4,5,A80089942I,user472@rockets.com,75559,803-733-6051,414,17,1,25,3215,Box Office


In [9]:
# descriptive statistics of dataframe
ticket_data.describe()

Unnamed: 0,transaction_id,zip,section,row,qty,total_price,event_id
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,5000.5,74865.5226,265.1336,10.5038,4.5127,452.0935,3230.9293
std,2886.89568,9468.587983,150.170437,5.740756,2.30386,439.832019,11.791845
min,1.0,2622.0,100.0,1.0,1.0,10.0,3211.0
25%,2500.75,75757.0,115.0,6.0,2.0,100.0,3221.0
50%,5000.5,77017.0,400.0,11.0,5.0,300.0,3231.0
75%,7500.25,77384.0,415.0,16.0,7.0,750.0,3241.0
max,10000.0,78662.0,430.0,20.0,8.0,1600.0,3251.0


In [10]:
# shows records with 'zip' that have a length of four characters
ticket_data[ticket_data['zip'].astype(str).str.len() == 4]

Unnamed: 0,transaction_id,account_no,email,zip,phone_no,section,row,qty,total_price,event_id,channel
554,555,A50479336F,user482@rockets.com,2622,481-211-1175,406,10,5,250,3241,Box Office
734,735,A50479336F,user482@rockets.com,2622,481-211-1175,101,3,6,1200,3211,Web
1137,1138,A50479336F,user482@rockets.com,2622,481-211-1175,123,7,1,150,3212,Web
1144,1145,A92662306D,user15@rockets.com,2622,767-897-2261,412,10,1,25,3232,Box Office
1247,1248,A50479336F,user482@rockets.com,2622,481-211-1175,110,16,6,1200,3231,BackOffice
1886,1887,A92662306D,user15@rockets.com,2622,767-897-2261,429,3,3,30,3215,Box Office
2451,2452,A50479336F,user482@rockets.com,2622,481-211-1175,403,4,3,150,3227,Box Office
3092,3093,A50479336F,user482@rockets.com,2622,481-211-1175,100,14,1,200,3230,Web
3763,3764,A92662306D,user15@rockets.com,2622,767-897-2261,103,13,7,1400,3233,Box Office
3798,3799,A92662306D,user15@rockets.com,2622,767-897-2261,111,3,4,680,3221,BackOffice


In [11]:
# shows phone numbers that is associated with zip 2622.
pd.DataFrame({'phone_no': ticket_data[ticket_data['zip'] == 2622]['phone_no'].unique()})


Unnamed: 0,phone_no
0,481-211-1175
1,767-897-2261


In [12]:
# verification of email is only associated with zip 2622
ticket_data[ticket_data['email'] == 'user482@rockets.com']

Unnamed: 0,transaction_id,account_no,email,zip,phone_no,section,row,qty,total_price,event_id,channel
554,555,A50479336F,user482@rockets.com,2622,481-211-1175,406,10,5,250,3241,Box Office
734,735,A50479336F,user482@rockets.com,2622,481-211-1175,101,3,6,1200,3211,Web
1137,1138,A50479336F,user482@rockets.com,2622,481-211-1175,123,7,1,150,3212,Web
1247,1248,A50479336F,user482@rockets.com,2622,481-211-1175,110,16,6,1200,3231,BackOffice
2451,2452,A50479336F,user482@rockets.com,2622,481-211-1175,403,4,3,150,3227,Box Office
3092,3093,A50479336F,user482@rockets.com,2622,481-211-1175,100,14,1,200,3230,Web
4890,4891,A50479336F,user482@rockets.com,2622,481-211-1175,402,2,8,400,3222,BackOffice
5385,5386,A50479336F,user482@rockets.com,2622,481-211-1175,113,19,6,1020,3233,BackOffice
5982,5983,A50479336F,user482@rockets.com,2622,481-211-1175,118,10,6,900,3225,Box Office
6092,6093,A50479336F,user482@rockets.com,2622,481-211-1175,117,10,7,1050,3244,BackOffice


In [13]:
# verification of email is only associated with zip 2622
ticket_data[ticket_data['email'] == 'user15@rockets.com']

Unnamed: 0,transaction_id,account_no,email,zip,phone_no,section,row,qty,total_price,event_id,channel
1144,1145,A92662306D,user15@rockets.com,2622,767-897-2261,412,10,1,25,3232,Box Office
1886,1887,A92662306D,user15@rockets.com,2622,767-897-2261,429,3,3,30,3215,Box Office
3763,3764,A92662306D,user15@rockets.com,2622,767-897-2261,103,13,7,1400,3233,Box Office
3798,3799,A92662306D,user15@rockets.com,2622,767-897-2261,111,3,4,680,3221,BackOffice
4297,4298,A92662306D,user15@rockets.com,2622,767-897-2261,114,6,7,1190,3224,Web
5667,5668,A92662306D,user15@rockets.com,2622,767-897-2261,425,14,6,90,3218,Box Office
6548,6549,A92662306D,user15@rockets.com,2622,767-897-2261,403,7,3,150,3218,Box Office
7218,7219,A92662306D,user15@rockets.com,2622,767-897-2261,422,16,7,105,3231,Box Office
7328,7329,A92662306D,user15@rockets.com,2622,767-897-2261,430,15,5,50,3223,Web
8135,8136,A92662306D,user15@rockets.com,2622,767-897-2261,120,5,5,750,3214,BackOffice


After reviewing descriptive statistics:

Zip has a min of 2622. It is unusual for a valid US zipcode to have four characters. After research of zip, it is asscociated with various countries. 

Decided to research each unique phone number associated with zip code.

- 481 area code is associated with Houston, TX.
- 767 area code is associated with the entire island nation of Dominica (The Commonweath of Dominica)

After futher data validation,  fan accounts with emails user15@rockets.com and user482@rockets.com are only associated with zip code 2622.

- These two observations considered anomalies. Best course of action would be to reach out to each fan for data verification of zip code and possible country.

In [14]:
# Rename the 'account_no' column as 'ticketing_account_no'
ticket_data = ticket_data.rename(columns={'account_no': 'ticketing_account_no'})

ticket_data

Unnamed: 0,transaction_id,ticketing_account_no,email,zip,phone_no,section,row,qty,total_price,event_id,channel
0,1,A87144476G,user400@rockets.com,77066,280-379-5220,109,9,1,200,3223,Web
1,2,A66578188Z,user141@rockets.com,76673,490-491-8071,101,10,4,800,3221,Box Office
2,3,A11689958W,user98@rockets.com,77031,244-805-9413,100,18,8,1600,3237,Box Office
3,4,A47432461Z,user213@rockets.com,76136,826-458-9773,400,7,1,50,3240,Web
4,5,A80089942I,user472@rockets.com,75559,803-733-6051,414,17,1,25,3215,Box Office
...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,A98035804M,user332@rockets.com,77616,524-512-1663,422,17,6,90,3244,Web
9996,9997,A62759828F,user146@rockets.com,76853,862-357-3734,400,15,8,400,3221,Web
9997,9998,A96104538T,user222@rockets.com,77086,840-386-8705,115,11,3,510,3236,Box Office
9998,9999,A38147058N,user495@rockets.com,76135,290-551-1299,403,18,7,350,3238,BackOffice


**Takeaways**

- Discovered two anomalies in the records due to unsual zip code 2622 . Fan accounts with emails user15@rockets.com and user482@rockets.com are only associated with zip code 2622. Will leave records as is for data integrity. Recommendation is to reach out to account holders for verification of information and update if neccessary.

- Renamed column 'account_no' as 'ticketing_account_no'

### Survey data 

In [264]:
survey_data.head(20)

Unnamed: 0,Submission ID,Attribute,Value
0,1,phone_no,290-551-1299
1,1,event_id,3220
2,1,how_satisfied_were_you_with_this_event,2
3,1,how_satisfied_were_you_with_your_retail_experi...,3
4,1,how_likely_are_you_to_attend_this_event_in_the...,5 - Very Likely
5,1,what_is_your_birthdate,33939
6,1,what_is_your_household_income,"Less than $50,000"
7,1,what_is_your_highest_level_of_education_that_y...,Associate's Degree
8,2,phone_no,663-795-4865
9,2,event_id,3242


**Things To Do**

- Pivot dataframe
- normalize necessary columns for uniformity
- merge 'email' into dataframe and possibly unique fan id after creation.

In [265]:
survey_data = survey_data.pivot(index= 'Submission ID', columns= 'Attribute')

In [266]:
survey_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1500 entries, 1 to 1500
Data columns (total 8 columns):
 #   Column                                                                     Non-Null Count  Dtype 
---  ------                                                                     --------------  ----- 
 0   (Value, event_id)                                                          1500 non-null   object
 1   (Value, how_likely_are_you_to_attend_this_event_in_the_future)             1500 non-null   object
 2   (Value, how_satisfied_were_you_with_this_event)                            1500 non-null   object
 3   (Value, how_satisfied_were_you_with_your_retail_experience_at_this_event)  1500 non-null   object
 4   (Value, phone_no)                                                          1500 non-null   object
 5   (Value, what_is_your_birthdate)                                            1500 non-null   object
 6   (Value, what_is_your_highest_level_of_education_that_you_have_att

In [268]:
survey_data.head(5)

Unnamed: 0_level_0,Value,Value,Value,Value,Value,Value,Value,Value
Attribute,event_id,how_likely_are_you_to_attend_this_event_in_the_future,how_satisfied_were_you_with_this_event,how_satisfied_were_you_with_your_retail_experience_at_this_event,phone_no,what_is_your_birthdate,what_is_your_highest_level_of_education_that_you_have_attained,what_is_your_household_income
Submission ID,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
1,3220,5 - Very Likely,2,3,290-551-1299,33939,Associate's Degree,"Less than $50,000"
2,3242,1 - Very Unlikely,5 - Very Satisfied,3,663-795-4865,21535,Vocational School,"Less than $50,000"
3,3217,3,4,2,674-251-1148,35693,Graduate Degree,"$100,000 - $149,000"
4,3215,4,3,3,728-127-6014,37384,Vocational School,"Less than $50,000"
5,3237,3,5 - Very Satisfied,3,238-199-2712,22531,Vocational School,"$250,00 or more"


In [269]:
survey_data.columns

MultiIndex([('Value',                                                         'event_id'),
            ('Value',            'how_likely_are_you_to_attend_this_event_in_the_future'),
            ('Value',                           'how_satisfied_were_you_with_this_event'),
            ('Value', 'how_satisfied_were_you_with_your_retail_experience_at_this_event'),
            ('Value',                                                         'phone_no'),
            ('Value',                                           'what_is_your_birthdate'),
            ('Value',   'what_is_your_highest_level_of_education_that_you_have_attained'),
            ('Value',                                    'what_is_your_household_income')],
           names=[None, 'Attribute'])

In [27]:
# Get value counts for a specific column
column_name = 'how_likely_are_you_to_attend_this_event_in_the_future'
survey_data[column_name].value_counts()

KeyError: 'how_likely_are_you_to_attend_this_event_in_the_future'

**Takeaways**

- Pivot dataframe to have a shape of 1500 rows and 8 columns. Original shape was 12,000 rows and 3 columns.
- 

**To Do**

- Comeback to wrangling data since having issues with MultiIndex object

### Create Acquire function

In [14]:
def import_data(json_file, csv_file1, csv_file2):
    # Import JSON file into a dataframe
    with open(json_file) as f:
        json_data = json.load(f)
    retail_data = pd.DataFrame(json_data["retail"], columns=["transaction_id", "email", "account_no", "product_type", "quantity", "unit_price", "shipping_cost"])

    # Import CSV files into dataframes
    survey_data = pd.read_csv(csv_file1)
    ticket_data = pd.read_csv(csv_file2)
    
    return retail_data, survey_data, ticket_data

In this code, the import_data function takes three file paths as input: json_file, csv_file1, and csv_file2. It reads the JSON file and creates a DataFrame retail_data from the specified JSON data. It also imports the two CSV files and creates separate DataFrames survey_data and ticket_data. Finally, the function returns these three dataframes.

In [20]:
# Usage of function
json_file = 'retail.json'
csv_file1 = 'surveys.csv'
csv_file2 = 'tickets.csv'

retail_data, survey_data, ticket_data = import_data(json_file, csv_file1, csv_file2)

In [16]:
retail_data

Unnamed: 0,transaction_id,email,account_no,product_type,quantity,unit_price,shipping_cost
0,1,user18@rockets.com,E894194JJ481,Jersey,2,96,5.76
1,2,user142@rockets.com,G684186GK636,Misc,5,9,1.35
2,3,user182@rockets.com,X898402TO472,Jersey,3,98,8.82
3,4,user492@rockets.com,R226999ZA574,Jersey,4,104,12.48
4,5,user101@rockets.com,Q640255YC818,Jersey,3,98,8.82
...,...,...,...,...,...,...,...
1995,1996,user88@rockets.com,H383584PU325,Hat,2,24,1.44
1996,1997,user410@rockets.com,M618220JQ428,Misc,6,7,1.26
1997,1998,user326@rockets.com,L452536ZY996,Jersey,2,104,6.24
1998,1999,user193@rockets.com,U673743FK544,Misc,6,9,1.62


In [17]:
ticket_data

Unnamed: 0,transaction_id,account_no,email,zip,phone_no,section,row,qty,total_price,event_id,channel
0,1,A87144476G,user400@rockets.com,77066,280-379-5220,109,9,1,200,3223,Web
1,2,A66578188Z,user141@rockets.com,76673,490-491-8071,101,10,4,800,3221,Box Office
2,3,A11689958W,user98@rockets.com,77031,244-805-9413,100,18,8,1600,3237,Box Office
3,4,A47432461Z,user213@rockets.com,76136,826-458-9773,400,7,1,50,3240,Web
4,5,A80089942I,user472@rockets.com,75559,803-733-6051,414,17,1,25,3215,Box Office
...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,A98035804M,user332@rockets.com,77616,524-512-1663,422,17,6,90,3244,Web
9996,9997,A62759828F,user146@rockets.com,76853,862-357-3734,400,15,8,400,3221,Web
9997,9998,A96104538T,user222@rockets.com,77086,840-386-8705,115,11,3,510,3236,Box Office
9998,9999,A38147058N,user495@rockets.com,76135,290-551-1299,403,18,7,350,3238,BackOffice


In [18]:
survey_data

Unnamed: 0,Submission ID,Attribute,Value
0,1,phone_no,290-551-1299
1,1,event_id,3220
2,1,how_satisfied_were_you_with_this_event,2
3,1,how_satisfied_were_you_with_your_retail_experi...,3
4,1,how_likely_are_you_to_attend_this_event_in_the...,5 - Very Likely
...,...,...,...
11995,1500,how_satisfied_were_you_with_your_retail_experi...,1 - Very Dissatisfied
11996,1500,how_likely_are_you_to_attend_this_event_in_the...,4
11997,1500,what_is_your_birthdate,16911
11998,1500,what_is_your_household_income,"$100,000 - $149,000"


**Takeaways**
 - Created acquire function that imports all data tables from files.

### Create prepare function of source data tables

In [None]:
# Wrangle Retail_data

# Add new column with transaction total calculation
retail_data["transaction_total"] = retail_data["quantity"] * retail_data["unit_price"] + retail_data["shipping_cost"]
# Rename the 'account_no' column as 'retail_account_no'
retail_data = retail_data.rename(columns={'account_no': 'retail_account_no'})

retail_data

## Prepare master dataframe

In [233]:
# Concatenate the 'email' columns from ticket_data and retail_data dataframes
emails = pd.concat([retail_data['email'], ticket_data['email']])

# Create a new dataframe with unique email records
new_df = pd.DataFrame({'email': emails.unique()})

# Display the new dataframe
new_df

Unnamed: 0,email
0,user18@rockets.com
1,user142@rockets.com
2,user182@rockets.com
3,user492@rockets.com
4,user101@rockets.com
...,...
495,user21@rockets.com
496,user262@rockets.com
497,user164@rockets.com
498,user26@rockets.com


In [234]:
# Merge phone_no, zip, retail_account_no, and ticketing_account_no columns from ticket_data and retail_data based on email

new_df = new_df.merge(ticket_data[['email', 'phone_no', 'zip', 'ticketing_account_no']], on='email', how='left')

new_df = new_df.merge(retail_data[['email',  'retail_account_no']], on='email', how='left')

new_df.head(5)

Unnamed: 0,email,phone_no,zip,ticketing_account_no,retail_account_no
0,user18@rockets.com,645-680-2091,77012,A37155507A,E894194JJ481
1,user18@rockets.com,645-680-2091,77012,A37155507A,E894194JJ481
2,user18@rockets.com,645-680-2091,77012,A37155507A,E894194JJ481
3,user18@rockets.com,645-680-2091,77012,A37155507A,E894194JJ481
4,user18@rockets.com,645-680-2091,77012,A37155507A,E894194JJ481


In [77]:
new_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 40185 entries, 0 to 40184
Data columns (total 5 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   email                 40185 non-null  object
 1   phone_no              40185 non-null  object
 2   zip                   40185 non-null  int64 
 3   ticketing_account_no  40185 non-null  object
 4   retail_account_no     40080 non-null  object
dtypes: int64(1), object(4)
memory usage: 1.8+ MB


**Takeaways**

- Dataframe has 5 null values in retail_account_no after merge of dataframes

In [117]:
# Show rows with nulls in retail_account_no
new_df[new_df.isnull().any(axis=1)]

Unnamed: 0,email,phone_no,zip,ticketing_account_no,retail_account_no
40080,user21@rockets.com,276-299-4595,75017,A83268502Y,
40081,user21@rockets.com,276-299-4595,75017,A83268502Y,
40082,user21@rockets.com,276-299-4595,75017,A83268502Y,
40083,user21@rockets.com,276-299-4595,75017,A83268502Y,
40084,user21@rockets.com,276-299-4595,75017,A83268502Y,
...,...,...,...,...,...
40180,user419@rockets.com,279-976-7138,77088,A25263675U,
40181,user419@rockets.com,279-976-7138,77088,A25263675U,
40182,user419@rockets.com,279-976-7138,77088,A25263675U,
40183,user419@rockets.com,279-976-7138,77088,A25263675U,


In [235]:
# Fill null values in the specified column with 'N0RAcc0unt'. Value represent fan does not have retail account.
new_df['retail_account_no'] = new_df['retail_account_no'].fillna('N0RAcc0unt')

In [81]:
# Show rows with nulls in retail_account_no
new_df[new_df.isnull().any(axis=1)]

Unnamed: 0,email,phone_no,zip,ticketing_account_no,retail_account_no


In [236]:
# Drop duplicates based on the email column
new_df = new_df.drop_duplicates(subset=['email'])

In [186]:
# Verification of count of " N0RAcc0unt " in column
new_df['retail_account_no'].value_counts()

N0RAcc0unt      5
V599165AM177    1
W248949LJ628    1
A843898GQ574    1
D237311HB771    1
               ..
F288696BQ421    1
J463594PX468    1
Y197729TC489    1
P361796ZT793    1
I726546OI406    1
Name: retail_account_no, Length: 496, dtype: int64

In [119]:
# show records of fans with no retail account
new_df[new_df['retail_account_no'] == 'N0RAcc0unt']

Unnamed: 0,email,phone_no,zip,ticketing_account_no,retail_account_no


In [237]:
# Reset the index of the dataframe
new_df = new_df.reset_index(drop=True)

new_df

Unnamed: 0,email,phone_no,zip,ticketing_account_no,retail_account_no
0,user18@rockets.com,645-680-2091,77012,A37155507A,E894194JJ481
1,user142@rockets.com,477-236-9428,77011,A53787758X,G684186GK636
2,user182@rockets.com,379-724-3829,77459,A53911439N,X898402TO472
3,user492@rockets.com,585-912-9278,77388,A62517740E,R226999ZA574
4,user101@rockets.com,226-568-7645,77388,A82383061F,Q640255YC818
...,...,...,...,...,...
495,user21@rockets.com,276-299-4595,75017,A83268502Y,N0RAcc0unt
496,user262@rockets.com,517-744-7036,77588,A39532138K,N0RAcc0unt
497,user164@rockets.com,608-312-1646,75759,A85827627A,N0RAcc0unt
498,user26@rockets.com,475-564-4177,77014,A58044447J,N0RAcc0unt


In [238]:
# Create a unique ID column using 'clutch' and location in the dataframe
new_df['unique_id'] = 'clutch_' + (new_df.index + 1).astype(str)

new_df

Unnamed: 0,email,phone_no,zip,ticketing_account_no,retail_account_no,unique_id
0,user18@rockets.com,645-680-2091,77012,A37155507A,E894194JJ481,clutch_1
1,user142@rockets.com,477-236-9428,77011,A53787758X,G684186GK636,clutch_2
2,user182@rockets.com,379-724-3829,77459,A53911439N,X898402TO472,clutch_3
3,user492@rockets.com,585-912-9278,77388,A62517740E,R226999ZA574,clutch_4
4,user101@rockets.com,226-568-7645,77388,A82383061F,Q640255YC818,clutch_5
...,...,...,...,...,...,...
495,user21@rockets.com,276-299-4595,75017,A83268502Y,N0RAcc0unt,clutch_496
496,user262@rockets.com,517-744-7036,77588,A39532138K,N0RAcc0unt,clutch_497
497,user164@rockets.com,608-312-1646,75759,A85827627A,N0RAcc0unt,clutch_498
498,user26@rockets.com,475-564-4177,77014,A58044447J,N0RAcc0unt,clutch_499


In [239]:
# Merge 'new_df' with the sum of 'transaction_total' per 'retail_account_no' from the 'retail_data' DataFrame, and merge the sum of 'total_price' per 'ticketing_account_no' from the 'ticket_data' DataFrame
new_df = new_df.merge(retail_data.groupby('retail_account_no')['transaction_total'].sum().reset_index().rename(columns={'transaction_total': 'retail_spent_sum'}), on='retail_account_no', how='left') \
                .merge(ticket_data.groupby('ticketing_account_no')['total_price'].sum().reset_index().rename(columns={'total_price': 'ticket_spent_sum'}), on='ticketing_account_no', how='left')


In [243]:
# Replace NaN with 0 in column
new_df['retail_spent_sum'].fillna(0, inplace=True)

In [245]:
# Create new column with sum of two columns
new_df['overall_sum'] = new_df['retail_spent_sum'] + new_df['ticket_spent_sum']

In [248]:
# Get sum of 'qty' per 'ticketing_account_no' and merge it into 'new_df' as 'total_qty' column
new_df = new_df.merge(ticket_data.groupby('ticketing_account_no')['qty'].sum().reset_index().rename(columns={'qty': 'total_tickets_purchased'}), on='ticketing_account_no', how='left') 
# Get average of 'ticket_spent_sum' per 'ticketing_account_no' and merge it into 'new_df' as 'avg_per_ticket' column
new_df['avg_per_ticket'] = new_df['ticket_spent_sum']/ new_df['total_tickets_purchased']


In [251]:
# Updates column to round to 2 decimal places
new_df['avg_per_ticket'] = new_df['avg_per_ticket'].round(2)

In [254]:
# # Group by 'ticketing_account_no', find mode of 'section', merge and create 'favorite_section' column in 'new_df'
new_df = new_df.merge(ticket_data.groupby('ticketing_account_no')['section'].agg(lambda x: x.mode().iat[0]).reset_index().rename(columns={'section': 'favorite_section'}), on='ticketing_account_no', how='left')


In [255]:
new_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 500 entries, 0 to 499
Data columns (total 12 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   email                    500 non-null    object 
 1   phone_no                 500 non-null    object 
 2   zip                      500 non-null    int64  
 3   ticketing_account_no     500 non-null    object 
 4   retail_account_no        500 non-null    object 
 5   unique_id                500 non-null    object 
 6   retail_spent_sum         500 non-null    float64
 7   ticket_spent_sum         500 non-null    int64  
 8   overall_sum              500 non-null    float64
 9   total_tickets_purchased  500 non-null    int64  
 10  avg_per_ticket           500 non-null    float64
 11  favorite_section         500 non-null    int64  
dtypes: float64(3), int64(4), object(5)
memory usage: 50.8+ KB


In [256]:
new_df

Unnamed: 0,email,phone_no,zip,ticketing_account_no,retail_account_no,unique_id,retail_spent_sum,ticket_spent_sum,overall_sum,total_tickets_purchased,avg_per_ticket,favorite_section
0,user18@rockets.com,645-680-2091,77012,A37155507A,E894194JJ481,clutch_1,383.16,4685,5068.16,50,93.70,118
1,user142@rockets.com,477-236-9428,77011,A53787758X,G684186GK636,clutch_2,775.59,8460,9235.59,64,132.19,100
2,user182@rockets.com,379-724-3829,77459,A53911439N,X898402TO472,clutch_3,478.42,8655,9133.42,103,84.03,120
3,user492@rockets.com,585-912-9278,77388,A62517740E,R226999ZA574,clutch_4,1434.79,5310,6744.79,51,104.12,103
4,user101@rockets.com,226-568-7645,77388,A82383061F,Q640255YC818,clutch_5,572.68,13340,13912.68,132,101.06,121
...,...,...,...,...,...,...,...,...,...,...,...,...
495,user21@rockets.com,276-299-4595,75017,A83268502Y,N0RAcc0unt,clutch_496,0.00,8670,8670.00,80,108.38,105
496,user262@rockets.com,517-744-7036,77588,A39532138K,N0RAcc0unt,clutch_497,0.00,12160,12160.00,106,114.72,408
497,user164@rockets.com,608-312-1646,75759,A85827627A,N0RAcc0unt,clutch_498,0.00,15895,15895.00,158,100.60,109
498,user26@rockets.com,475-564-4177,77014,A58044447J,N0RAcc0unt,clutch_499,0.00,7930,7930.00,81,97.90,105


## prepare function for master dataframe

In [None]:
# Concatenate the 'email' columns from ticket_data and retail_data dataframes
emails = pd.concat([retail_data['email'], ticket_data['email']])

# Create a new dataframe with unique email records
new_df = pd.DataFrame({'email': emails.unique()})

# Merge phone_no, zip, retail_account_no, and ticketing_account_no columns from ticket_data and retail_data based on email

new_df = new_df.merge(ticket_data[['email', 'phone_no', 'zip', 'ticketing_account_no']], on='email', how='left')

new_df = new_df.merge(retail_data[['email',  'retail_account_no']], on='email', how='left')

# Fill null values in the specified column with 'N0RAcc0unt'. Value represent fan does not have retail account.
new_df['retail_account_no'] = new_df['retail_account_no'].fillna('N0RAcc0unt')

# Drop duplicates based on the email column
new_df = new_df.drop_duplicates(subset=['email'])

# Reset the index of the dataframe
new_df = new_df.reset_index(drop=True)

# Create a unique ID column using 'clutch' and location in the dataframe
new_df['unique_id'] = 'clutch_' + (new_df.index + 1).astype(str)

# Merge 'new_df' with the sum of 'transaction_total' per 'retail_account_no' from the 'retail_data' DataFrame, and merge the sum of 'total_price' per 'ticketing_account_no' from the 'ticket_data' DataFrame
new_df = new_df.merge(retail_data.groupby('retail_account_no')['transaction_total'].sum().reset_index().rename(columns={'transaction_total': 'retail_spent_sum'}), on='retail_account_no', how='left') \
                .merge(ticket_data.groupby('ticketing_account_no')['total_price'].sum().reset_index().rename(columns={'total_price': 'ticket_spent_sum'}), on='ticketing_account_no', how='left')


# Replace NaN with 0 in column
new_df['retail_spent_sum'].fillna(0, inplace=True)

# Create new column with sum of two columns
new_df['overall_sum'] = new_df['retail_spent_sum'] + new_df['ticket_spent_sum']

# Get sum of 'qty' per 'ticketing_account_no' and merge it into 'new_df' as 'total_qty' column
new_df = new_df.merge(ticket_data.groupby('ticketing_account_no')['qty'].sum().reset_index().rename(columns={'qty': 'total_tickets_purchased'}), on='ticketing_account_no', how='left') 
# Get average of 'ticket_spent_sum' per 'ticketing_account_no' and merge it into 'new_df' as 'avg_per_ticket' column
new_df['avg_per_ticket'] = new_df['ticket_spent_sum']/ new_df['total_tickets_purchased']


# Updates column to round to 2 decimal places
new_df['avg_per_ticket'] = new_df['avg_per_ticket'].round(2)

# # Group by 'ticketing_account_no', find mode of 'section', merge and create 'favorite_section' column in 'new_df'
new_df = new_df.merge(ticket_data.groupby('ticketing_account_no')['section'].agg(lambda x: x.mode().iat[0]).reset_index().rename(columns={'section': 'favorite_section'}), on='ticketing_account_no', how='left')


## Number of ticket transactions per fan

In [219]:
ticket_data.head()

Unnamed: 0,transaction_id,ticketing_account_no,email,zip,phone_no,section,row,qty,total_price,event_id,channel
0,1,A87144476G,user400@rockets.com,77066,280-379-5220,109,9,1,200,3223,Web
1,2,A66578188Z,user141@rockets.com,76673,490-491-8071,101,10,4,800,3221,Box Office
2,3,A11689958W,user98@rockets.com,77031,244-805-9413,100,18,8,1600,3237,Box Office
3,4,A47432461Z,user213@rockets.com,76136,826-458-9773,400,7,1,50,3240,Web
4,5,A80089942I,user472@rockets.com,75559,803-733-6051,414,17,1,25,3215,Box Office


In [105]:
ticket_data.groupby('ticketing_account_no').size()

ticketing_account_no
A10151818T    21
A10198659Y    19
A10266428Y    27
A10295405G    24
A10507821G    16
              ..
A99669635A     9
A99696613F    20
A99788444F    19
A99795977T    19
A99941524R    24
Length: 500, dtype: int64

In [260]:
ticket_data['ticketing_account_no'].value_counts()

A11118594C    36
A13295293F    33
A33098988R    33
A21469596P    33
A61704102Z    33
              ..
A69182849N    10
A54472348P    10
A99669635A     9
A62125724P     8
A26374241O     8
Name: ticketing_account_no, Length: 500, dtype: int64

In [122]:
# Create a new column in 'summary_df' with the transaction counts
new_df['ticket_transaction_count'] = ticket_data['ticketing_account_no'].map(ticket_data.groupby('ticketing_account_no').size())

In [104]:
pd.DataFrame(ticket_data.groupby('ticketing_account_no').size())

Unnamed: 0_level_0,0
ticketing_account_no,Unnamed: 1_level_1
A10151818T,21
A10198659Y,19
A10266428Y,27
A10295405G,24
A10507821G,16
...,...
A99669635A,9
A99696613F,20
A99788444F,19
A99795977T,19


In [103]:
len(ticket_data[ticket_data['ticketing_account_no'] == 'A10151818T'])

21

## Number of retail transactions per fan

In [125]:
# Create a new column in 'summary_df' with the transaction counts
new_df['retail_transaction_count'] = retail_data['retail_account_no'].map(retail_data.groupby('retail_account_no').size())

In [126]:
new_df

Unnamed: 0,email,phone_no,zip,ticketing_account_no,retail_account_no,unique_id,ticket_transaction_count,retail_transaction_count
0,user18@rockets.com,645-680-2091,77012,A37155507A,E894194JJ481,clutch_1,12,4
1,user142@rockets.com,477-236-9428,77011,A53787758X,G684186GK636,clutch_2,17,3
2,user182@rockets.com,379-724-3829,77459,A53911439N,X898402TO472,clutch_3,20,4
3,user492@rockets.com,585-912-9278,77388,A62517740E,R226999ZA574,clutch_4,27,5
4,user101@rockets.com,226-568-7645,77388,A82383061F,Q640255YC818,clutch_5,20,4
...,...,...,...,...,...,...,...,...
495,user21@rockets.com,276-299-4595,75017,A83268502Y,,clutch_496,23,8
496,user262@rockets.com,517-744-7036,77588,A39532138K,,clutch_497,21,4
497,user164@rockets.com,608-312-1646,75759,A85827627A,,clutch_498,26,3
498,user26@rockets.com,475-564-4177,77014,A58044447J,,clutch_499,24,5


In [106]:
new_df[new_df['ticketing_account_no'] == 'A10151818T']

Unnamed: 0,email,phone_no,zip,ticketing_account_no,retail_account_no,unique_id,ticket_transaction_count
361,user342@rockets.com,705-179-8425,76865,A10151818T,B312576AQ425,clutch_362,21


In [107]:
retail_data

Unnamed: 0,transaction_id,email,retail_account_no,product_type,quantity,unit_price,shipping_cost,transaction_total
0,1,user18@rockets.com,E894194JJ481,Jersey,2,96,5.76,197.76
1,2,user142@rockets.com,G684186GK636,Misc,5,9,1.35,46.35
2,3,user182@rockets.com,X898402TO472,Jersey,3,98,8.82,302.82
3,4,user492@rockets.com,R226999ZA574,Jersey,4,104,12.48,428.48
4,5,user101@rockets.com,Q640255YC818,Jersey,3,98,8.82,302.82
...,...,...,...,...,...,...,...,...
1995,1996,user88@rockets.com,H383584PU325,Hat,2,24,1.44,49.44
1996,1997,user410@rockets.com,M618220JQ428,Misc,6,7,1.26,43.26
1997,1998,user326@rockets.com,L452536ZY996,Jersey,2,104,6.24,214.24
1998,1999,user193@rockets.com,U673743FK544,Misc,6,9,1.62,55.62


In [128]:
new_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 8 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   email                     500 non-null    object
 1   phone_no                  500 non-null    object
 2   zip                       500 non-null    int64 
 3   ticketing_account_no      500 non-null    object
 4   retail_account_no         495 non-null    object
 5   unique_id                 500 non-null    object
 6   ticket_transaction_count  500 non-null    int64 
 7   retail_transaction_count  500 non-null    int64 
dtypes: int64(3), object(5)
memory usage: 31.4+ KB


In [129]:
# Show rows with nulls in retail_account_no
new_df[new_df.isnull().any(axis=1)]

Unnamed: 0,email,phone_no,zip,ticketing_account_no,retail_account_no,unique_id,ticket_transaction_count,retail_transaction_count
495,user21@rockets.com,276-299-4595,75017,A83268502Y,,clutch_496,23,8
496,user262@rockets.com,517-744-7036,77588,A39532138K,,clutch_497,21,4
497,user164@rockets.com,608-312-1646,75759,A85827627A,,clutch_498,26,3
498,user26@rockets.com,475-564-4177,77014,A58044447J,,clutch_499,24,5
499,user419@rockets.com,279-976-7138,77088,A25263675U,,clutch_500,19,5


In [130]:
retail_data[retail_data['email'] == 'user21@rockets.com']

Unnamed: 0,transaction_id,email,retail_account_no,product_type,quantity,unit_price,shipping_cost,transaction_total


In [112]:
new_df[new_df['retail_account_no'] == 'N0RAcc0unt']

Unnamed: 0,email,phone_no,zip,ticketing_account_no,retail_account_no,unique_id,ticket_transaction_count,retail_transaction_count
495,user21@rockets.com,276-299-4595,75017,A83268502Y,N0RAcc0unt,clutch_496,23,8
496,user262@rockets.com,517-744-7036,77588,A39532138K,N0RAcc0unt,clutch_497,21,4
497,user164@rockets.com,608-312-1646,75759,A85827627A,N0RAcc0unt,clutch_498,26,3
498,user26@rockets.com,475-564-4177,77014,A58044447J,N0RAcc0unt,clutch_499,24,5
499,user419@rockets.com,279-976-7138,77088,A25263675U,N0RAcc0unt,clutch_500,19,5


## Number of survey responses per fan

In [262]:
survey_data.head(30)

Unnamed: 0_level_0,Value,Value,Value,Value,Value,Value,Value,Value
Attribute,event_id,how_likely_are_you_to_attend_this_event_in_the_future,how_satisfied_were_you_with_this_event,how_satisfied_were_you_with_your_retail_experience_at_this_event,phone_no,what_is_your_birthdate,what_is_your_highest_level_of_education_that_you_have_attained,what_is_your_household_income
Submission ID,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
1,3220,5 - Very Likely,2,3,290-551-1299,33939,Associate's Degree,"Less than $50,000"
2,3242,1 - Very Unlikely,5 - Very Satisfied,3,663-795-4865,21535,Vocational School,"Less than $50,000"
3,3217,3,4,2,674-251-1148,35693,Graduate Degree,"$100,000 - $149,000"
4,3215,4,3,3,728-127-6014,37384,Vocational School,"Less than $50,000"
5,3237,3,5 - Very Satisfied,3,238-199-2712,22531,Vocational School,"$250,00 or more"
6,3245,4,2,2,219-685-5588,15017,Vocational School,"$100,000 - $149,000"
7,3211,4,5 - Very Satisfied,5 - Very Satisfied,481-518-8887,30514,High school,"$150,000 - $199,000"
8,3218,3,4,4,412-417-3174,36167,Graduate Degree,"$100,000 - $149,000"
9,3220,5 - Very Likely,2,3,286-242-7937,21380,Bachelor's Degree,"$50,000 - $99,999"
10,3233,2,2,2,891-415-5232,25272,Graduate Degree,"Less than $50,000"


In [194]:
new_df

Unnamed: 0,transaction_id,ticketing_account_no,email,zip,phone_no,section,row,qty,total_price,event_id,channel
0,1,A87144476G,user400@rockets.com,77066,280-379-5220,109,9,1,200,3223,Web
1,2,A66578188Z,user141@rockets.com,76673,490-491-8071,101,10,4,800,3221,Box Office
2,3,A11689958W,user98@rockets.com,77031,244-805-9413,100,18,8,1600,3237,Box Office
3,4,A47432461Z,user213@rockets.com,76136,826-458-9773,400,7,1,50,3240,Web
4,5,A80089942I,user472@rockets.com,75559,803-733-6051,414,17,1,25,3215,Box Office
5,6,A17048992R,user458@rockets.com,77842,564-790-3863,127,20,2,300,3211,Web
6,7,A53621260D,user329@rockets.com,78640,331-192-2512,409,15,7,350,3238,Web
7,8,A39019755T,user391@rockets.com,38587,252-878-3695,427,8,6,60,3223,Box Office
8,9,A20318281V,user444@rockets.com,77201,758-207-1067,425,3,7,105,3214,BackOffice
9,10,A25487937E,user185@rockets.com,77507,884-924-4272,106,9,3,600,3213,BackOffice
