<a href="https://www.kaggle.com/code/nickkrikota/example-data-cleaning-and-eda?scriptVersionId=160174172" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Introduction

This notebook will go through a few examples of how to clean data and perform an exploratory data analysis using a synthetic dataset.

In [1]:
# Import libraries

import pandas as pd
print('Imported Successfully')

Imported Successfully


This dataset was manually edited with a few examples of unclean data to practice data cleaning.

In [2]:
# Load the datafeframe and view shape

customers = pd.read_csv('/kaggle/input/synthetic-customer-data/customers - cleaning.csv')
customers.shape

(777, 10)

In [3]:
# Preview dataframe

customers

Unnamed: 0,Customer ID,Name,Email,Phone Number,Street Address,City,State,Zip Code,Sales Rep,Subscription
0,1,Kathryn Williams,kathrynwilliams@samplemail.com,(550) 269-8345,6122 Debra Court,Stewartville,Hawaii,55169,Patricia Escobar,Standard
1,2,Maxwell Meza,maxwellmeza@samplemail.com,(712) 706-8059,740 Bean Station,Lake Aprilton,Maine,9073,Thomas Murray,Standard
2,3,Jamie Crawford,jamiecrawford@samplemail.com,(278) 738-1122,65114 Tracy Track Suite 604,South Sarahbury,North Dakota,51657,Amber Taylor,Plus
3,4,Raven Hernandez,ravenhernandez@samplemail.com,(376) 539-3142,45083 Cunningham Drive,Jasonstad,Hawaii,13042,Ashley Collins,Standard
4,5,Robert Brown,robertbrown@samplemail.com,(368) 263-0915,3965 Jay Ford,New Ann,New Hampshire,50080,Samantha Horton,Plus
...,...,...,...,...,...,...,...,...,...,...
772,769,Nancy Cole,nancycole@samplemail.com,(134) 653-8529,735 Sutton Square,Lake Meganville,Pennsylvania,47331,Ashley Collins,Plus
773,770,Samuel Barrett,samuelbarrett@samplemail.com,(289) 571-9595,53731 Fitzgerald Keys,South David,North Carolina,22817,Bradley Todd,Standard
774,771,Alex Hayes,alexhayes@samplemail.com,(123) 679-2096,313 Deborah Prairie,Port Joy,California,66171,Samantha Graham,Standard
775,772,Roberto Kennedy,robertokennedy@samplemail.com,(852) 816-8525,48959 Kim Field,South Emily,Texas,94465,Ashley Collins,Standard


# Cleaning the Data

First let's start with looking for duplicates.

In [4]:
# See duplicates

duplicates = customers[customers.duplicated()]
print('Number of duplicates:', customers.duplicated().sum())
duplicates

Number of duplicates: 4


Unnamed: 0,Customer ID,Name,Email,Phone Number,Street Address,City,State,Zip Code,Sales Rep,Subscription
78,78,Daisy Webb,daisywebb@samplemail.com,(941) 622-4690,1137 West Mission,Jerryport,Oklahoma,27259,Tara Peck,Plus
169,168,Jennifer Jones,jenniferjones@samplemail.com,(812) 644-5861,31820 Donald Extension Apt. 976,Lake Suzannefurt,Illinois,98408,Richard Moore,Premium
227,225,William Gonzalez,williamgonzalez@samplemail.com,(337) 497-9812,5950 Cooper Trail Suite 952,Ryanborough,Nevada,81451,Matthew Hunt,Plus
432,429,Jason Davis,jasondavis@samplemail.com,(852) 493-9234,93693 Mark Spur,Alexandriaton,New Jersey,90023,Benjamin Lara,Premium


In [5]:
# Drop duplicates

customers = customers.drop_duplicates()
customers = customers.reset_index(drop=True)
print('Number of duplicates:', customers.duplicated().sum())

Number of duplicates: 0


Now let's see if there are any missing values.

In [6]:
# See number of null values

customers.isna().sum()

Customer ID       0
Name              0
Email             0
Phone Number      2
Street Address    2
City              0
State             0
Zip Code          0
Sales Rep         0
Subscription      0
dtype: int64

Two addresses and phone numbers are missing. You could fill them out to call attention to the person in charge of keeping this data to let them know that they need to reach out to the customers and fill it out, but for now I will simply replace it with 'n/a'

In [7]:
# Fill in empty values with 'n/a'

customers = customers.fillna('n/a')
customers.isna().sum()

Customer ID       0
Name              0
Email             0
Phone Number      0
Street Address    0
City              0
State             0
Zip Code          0
Sales Rep         0
Subscription      0
dtype: int64

In [8]:
# view column names

customers.columns

Index(['Customer ID', 'Name', 'Email', 'Phone Number', 'Street Address',
       'City', 'State', 'Zip Code', 'Sales Rep', 'Subscription'],
      dtype='object')

Let's look at the number of unique values in each column to see if we can find some potential errors.

In [9]:
# See number of unique values for each column

customers.nunique()

Customer ID       773
Name              770
Email             770
Phone Number      772
Street Address    772
City              750
State              56
Zip Code          769
Sales Rep          30
Subscription        3
dtype: int64

In [10]:
# See list of states

customers['State'].unique()

array(['Hawaii', 'Maine', 'North Dakota', 'New Hampshire', 'Alabama',
       'Washington', 'South Carolina', 'Ohio', 'New Mexico', 'Arkansas',
       'West Virginia', 'Vermont', 'Pennsylvania', 'Wisconsin', 'Iowa',
       'Colorado', 'Tennessee', 'Florida', 'Connecticut', 'Utah',
       'Nebraska', 'Alaska', 'Minnesota', 'Texas', 'New York', 'Arizona',
       'Virginia', 'North Carolina', 'Oklahoma', 'Rhode Island',
       'Maryland', 'Missouri', 'Idaho', 'Illinois', 'Kansas',
       'California', 'South Dakota', 'Louisiana', 'Kentucky', 'Montana',
       'Massachusetts', 'Mississippi', 'Michigan', 'New Jersey',
       'Indiana', 'Delaware', ' Idaho', 'Oregon', 'Wyoming', 'Georgia',
       'Nevada', 'NY', 'Illinois ', 'NV', ' Missouri', 'Texas '],
      dtype=object)

It seems like there are more than fifty unique examples of states since some of them are abbreviated and others have trailing or leading white spaces.

In [11]:
# Get rid of trailing and leading spaces in state names

customers['State'] = customers['State'].str.strip()
print('Number of unique states:', customers['State'].nunique())

Number of unique states: 52


In [12]:
# Replace abbreviation with full state names

customers['State'] = customers['State'].str.replace('NY', 'New York').replace('NV', 'Nevada')
print('Number of unique states:', customers['State'].nunique())

Number of unique states: 50


# Merging Dataframes

Now that the data is clean, let's look at how to merge different dataframes.

In [13]:
# Import other datasets

salesreps = pd.read_csv('/kaggle/input/synthetic-customer-data/salesreps.csv')
subs = pd.read_csv('/kaggle/input/synthetic-customer-data/subscriptions.csv')

print(salesreps.shape)
print(subs.shape)

(30, 3)
(3, 3)


In [14]:
# View salesreps

salesreps

Unnamed: 0,Employee ID,Name,Email
0,1,Ricky Baker,rickybaker@company.com
1,2,Ashley Nguyen,ashleynguyen@company.com
2,3,Danielle Summers,daniellesummers@company.com
3,4,Samantha Horton,samanthahorton@company.com
4,5,Paige Jones,paigejones@company.com
5,6,Tara Peck,tarapeck@company.com
6,7,Jennifer Sanchez,jennifersanchez@company.com
7,8,Mark Lindsey,marklindsey@company.com
8,9,Taylor Mason,taylormason@company.com
9,10,Karen Beasley,karenbeasley@company.com


In [15]:
subs

Unnamed: 0,Subscription ID,Subscription,Cost
0,1,Standard,$99
1,2,Plus,$199
2,3,Premium,$499


In [16]:
# Rename columns in the salesreps dataframe to be consistent with the columns in the customers dataframe

salesreps = salesreps.rename(columns={'Name': 'Sales Rep'})
salesreps = salesreps.rename(columns={'Email': 'Sales Rep Email'})
salesreps

Unnamed: 0,Employee ID,Sales Rep,Sales Rep Email
0,1,Ricky Baker,rickybaker@company.com
1,2,Ashley Nguyen,ashleynguyen@company.com
2,3,Danielle Summers,daniellesummers@company.com
3,4,Samantha Horton,samanthahorton@company.com
4,5,Paige Jones,paigejones@company.com
5,6,Tara Peck,tarapeck@company.com
6,7,Jennifer Sanchez,jennifersanchez@company.com
7,8,Mark Lindsey,marklindsey@company.com
8,9,Taylor Mason,taylormason@company.com
9,10,Karen Beasley,karenbeasley@company.com


In [17]:
# Add salesrep emails and employee ID to the customer dataframe

customers = pd.merge(customers, salesreps, on='Sales Rep', how='left')
customers

Unnamed: 0,Customer ID,Name,Email,Phone Number,Street Address,City,State,Zip Code,Sales Rep,Subscription,Employee ID,Sales Rep Email
0,1,Kathryn Williams,kathrynwilliams@samplemail.com,(550) 269-8345,6122 Debra Court,Stewartville,Hawaii,55169,Patricia Escobar,Standard,13,patriciaescobar@company.com
1,2,Maxwell Meza,maxwellmeza@samplemail.com,(712) 706-8059,740 Bean Station,Lake Aprilton,Maine,9073,Thomas Murray,Standard,14,thomasmurray@company.com
2,3,Jamie Crawford,jamiecrawford@samplemail.com,(278) 738-1122,65114 Tracy Track Suite 604,South Sarahbury,North Dakota,51657,Amber Taylor,Plus,20,ambertaylor@company.com
3,4,Raven Hernandez,ravenhernandez@samplemail.com,(376) 539-3142,45083 Cunningham Drive,Jasonstad,Hawaii,13042,Ashley Collins,Standard,12,ashleycollins@company.com
4,5,Robert Brown,robertbrown@samplemail.com,(368) 263-0915,3965 Jay Ford,New Ann,New Hampshire,50080,Samantha Horton,Plus,4,samanthahorton@company.com
...,...,...,...,...,...,...,...,...,...,...,...,...
768,769,Nancy Cole,nancycole@samplemail.com,(134) 653-8529,735 Sutton Square,Lake Meganville,Pennsylvania,47331,Ashley Collins,Plus,12,ashleycollins@company.com
769,770,Samuel Barrett,samuelbarrett@samplemail.com,(289) 571-9595,53731 Fitzgerald Keys,South David,North Carolina,22817,Bradley Todd,Standard,15,bradleytodd@company.com
770,771,Alex Hayes,alexhayes@samplemail.com,(123) 679-2096,313 Deborah Prairie,Port Joy,California,66171,Samantha Graham,Standard,18,samanthagraham@company.com
771,772,Roberto Kennedy,robertokennedy@samplemail.com,(852) 816-8525,48959 Kim Field,South Emily,Texas,94465,Ashley Collins,Standard,12,ashleycollins@company.com


In [18]:
# Add subscription price but not ID

customers = customers.merge(subs, on='Subscription', how='left')
customers = customers.drop(columns='Subscription ID')
customers

Unnamed: 0,Customer ID,Name,Email,Phone Number,Street Address,City,State,Zip Code,Sales Rep,Subscription,Employee ID,Sales Rep Email,Cost
0,1,Kathryn Williams,kathrynwilliams@samplemail.com,(550) 269-8345,6122 Debra Court,Stewartville,Hawaii,55169,Patricia Escobar,Standard,13,patriciaescobar@company.com,$99
1,2,Maxwell Meza,maxwellmeza@samplemail.com,(712) 706-8059,740 Bean Station,Lake Aprilton,Maine,9073,Thomas Murray,Standard,14,thomasmurray@company.com,$99
2,3,Jamie Crawford,jamiecrawford@samplemail.com,(278) 738-1122,65114 Tracy Track Suite 604,South Sarahbury,North Dakota,51657,Amber Taylor,Plus,20,ambertaylor@company.com,$199
3,4,Raven Hernandez,ravenhernandez@samplemail.com,(376) 539-3142,45083 Cunningham Drive,Jasonstad,Hawaii,13042,Ashley Collins,Standard,12,ashleycollins@company.com,$99
4,5,Robert Brown,robertbrown@samplemail.com,(368) 263-0915,3965 Jay Ford,New Ann,New Hampshire,50080,Samantha Horton,Plus,4,samanthahorton@company.com,$199
...,...,...,...,...,...,...,...,...,...,...,...,...,...
768,769,Nancy Cole,nancycole@samplemail.com,(134) 653-8529,735 Sutton Square,Lake Meganville,Pennsylvania,47331,Ashley Collins,Plus,12,ashleycollins@company.com,$199
769,770,Samuel Barrett,samuelbarrett@samplemail.com,(289) 571-9595,53731 Fitzgerald Keys,South David,North Carolina,22817,Bradley Todd,Standard,15,bradleytodd@company.com,$99
770,771,Alex Hayes,alexhayes@samplemail.com,(123) 679-2096,313 Deborah Prairie,Port Joy,California,66171,Samantha Graham,Standard,18,samanthagraham@company.com,$99
771,772,Roberto Kennedy,robertokennedy@samplemail.com,(852) 816-8525,48959 Kim Field,South Emily,Texas,94465,Ashley Collins,Standard,12,ashleycollins@company.com,$99


In [19]:
# Set index to customer ID

customers = customers.set_index('Customer ID')
customers

Unnamed: 0_level_0,Name,Email,Phone Number,Street Address,City,State,Zip Code,Sales Rep,Subscription,Employee ID,Sales Rep Email,Cost
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,Kathryn Williams,kathrynwilliams@samplemail.com,(550) 269-8345,6122 Debra Court,Stewartville,Hawaii,55169,Patricia Escobar,Standard,13,patriciaescobar@company.com,$99
2,Maxwell Meza,maxwellmeza@samplemail.com,(712) 706-8059,740 Bean Station,Lake Aprilton,Maine,9073,Thomas Murray,Standard,14,thomasmurray@company.com,$99
3,Jamie Crawford,jamiecrawford@samplemail.com,(278) 738-1122,65114 Tracy Track Suite 604,South Sarahbury,North Dakota,51657,Amber Taylor,Plus,20,ambertaylor@company.com,$199
4,Raven Hernandez,ravenhernandez@samplemail.com,(376) 539-3142,45083 Cunningham Drive,Jasonstad,Hawaii,13042,Ashley Collins,Standard,12,ashleycollins@company.com,$99
5,Robert Brown,robertbrown@samplemail.com,(368) 263-0915,3965 Jay Ford,New Ann,New Hampshire,50080,Samantha Horton,Plus,4,samanthahorton@company.com,$199
...,...,...,...,...,...,...,...,...,...,...,...,...
769,Nancy Cole,nancycole@samplemail.com,(134) 653-8529,735 Sutton Square,Lake Meganville,Pennsylvania,47331,Ashley Collins,Plus,12,ashleycollins@company.com,$199
770,Samuel Barrett,samuelbarrett@samplemail.com,(289) 571-9595,53731 Fitzgerald Keys,South David,North Carolina,22817,Bradley Todd,Standard,15,bradleytodd@company.com,$99
771,Alex Hayes,alexhayes@samplemail.com,(123) 679-2096,313 Deborah Prairie,Port Joy,California,66171,Samantha Graham,Standard,18,samanthagraham@company.com,$99
772,Roberto Kennedy,robertokennedy@samplemail.com,(852) 816-8525,48959 Kim Field,South Emily,Texas,94465,Ashley Collins,Standard,12,ashleycollins@company.com,$99


Let's make this dataframe a little bit more readable.

In [20]:
# Reorder the columns, drop employee ID

customers = customers[['Name', 'Email', 'Phone Number', 'Street Address', 'City', 'State', 'Zip Code', 'Sales Rep', 'Sales Rep Email', 'Subscription', 'Cost']]
customers

Unnamed: 0_level_0,Name,Email,Phone Number,Street Address,City,State,Zip Code,Sales Rep,Sales Rep Email,Subscription,Cost
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,Kathryn Williams,kathrynwilliams@samplemail.com,(550) 269-8345,6122 Debra Court,Stewartville,Hawaii,55169,Patricia Escobar,patriciaescobar@company.com,Standard,$99
2,Maxwell Meza,maxwellmeza@samplemail.com,(712) 706-8059,740 Bean Station,Lake Aprilton,Maine,9073,Thomas Murray,thomasmurray@company.com,Standard,$99
3,Jamie Crawford,jamiecrawford@samplemail.com,(278) 738-1122,65114 Tracy Track Suite 604,South Sarahbury,North Dakota,51657,Amber Taylor,ambertaylor@company.com,Plus,$199
4,Raven Hernandez,ravenhernandez@samplemail.com,(376) 539-3142,45083 Cunningham Drive,Jasonstad,Hawaii,13042,Ashley Collins,ashleycollins@company.com,Standard,$99
5,Robert Brown,robertbrown@samplemail.com,(368) 263-0915,3965 Jay Ford,New Ann,New Hampshire,50080,Samantha Horton,samanthahorton@company.com,Plus,$199
...,...,...,...,...,...,...,...,...,...,...,...
769,Nancy Cole,nancycole@samplemail.com,(134) 653-8529,735 Sutton Square,Lake Meganville,Pennsylvania,47331,Ashley Collins,ashleycollins@company.com,Plus,$199
770,Samuel Barrett,samuelbarrett@samplemail.com,(289) 571-9595,53731 Fitzgerald Keys,South David,North Carolina,22817,Bradley Todd,bradleytodd@company.com,Standard,$99
771,Alex Hayes,alexhayes@samplemail.com,(123) 679-2096,313 Deborah Prairie,Port Joy,California,66171,Samantha Graham,samanthagraham@company.com,Standard,$99
772,Roberto Kennedy,robertokennedy@samplemail.com,(852) 816-8525,48959 Kim Field,South Emily,Texas,94465,Ashley Collins,ashleycollins@company.com,Standard,$99


# Analyzing the Data

What financial information could we potentially get from customer data?

In [21]:
# Convert cost to an integer to perform revenue calculations

customers['Cost'] = customers['Cost'].str.replace('$', '')
customers['Cost'] = customers['Cost'].astype(int)
customers

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  customers['Cost'] = customers['Cost'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  customers['Cost'] = customers['Cost'].astype(int)


Unnamed: 0_level_0,Name,Email,Phone Number,Street Address,City,State,Zip Code,Sales Rep,Sales Rep Email,Subscription,Cost
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,Kathryn Williams,kathrynwilliams@samplemail.com,(550) 269-8345,6122 Debra Court,Stewartville,Hawaii,55169,Patricia Escobar,patriciaescobar@company.com,Standard,99
2,Maxwell Meza,maxwellmeza@samplemail.com,(712) 706-8059,740 Bean Station,Lake Aprilton,Maine,9073,Thomas Murray,thomasmurray@company.com,Standard,99
3,Jamie Crawford,jamiecrawford@samplemail.com,(278) 738-1122,65114 Tracy Track Suite 604,South Sarahbury,North Dakota,51657,Amber Taylor,ambertaylor@company.com,Plus,199
4,Raven Hernandez,ravenhernandez@samplemail.com,(376) 539-3142,45083 Cunningham Drive,Jasonstad,Hawaii,13042,Ashley Collins,ashleycollins@company.com,Standard,99
5,Robert Brown,robertbrown@samplemail.com,(368) 263-0915,3965 Jay Ford,New Ann,New Hampshire,50080,Samantha Horton,samanthahorton@company.com,Plus,199
...,...,...,...,...,...,...,...,...,...,...,...
769,Nancy Cole,nancycole@samplemail.com,(134) 653-8529,735 Sutton Square,Lake Meganville,Pennsylvania,47331,Ashley Collins,ashleycollins@company.com,Plus,199
770,Samuel Barrett,samuelbarrett@samplemail.com,(289) 571-9595,53731 Fitzgerald Keys,South David,North Carolina,22817,Bradley Todd,bradleytodd@company.com,Standard,99
771,Alex Hayes,alexhayes@samplemail.com,(123) 679-2096,313 Deborah Prairie,Port Joy,California,66171,Samantha Graham,samanthagraham@company.com,Standard,99
772,Roberto Kennedy,robertokennedy@samplemail.com,(852) 816-8525,48959 Kim Field,South Emily,Texas,94465,Ashley Collins,ashleycollins@company.com,Standard,99


In [22]:
# Calculate revenue from the state of Louisiana

LA = customers[customers['State'] == 'Louisiana']
print('Total revenue from the state of Louisiana: $',sum(LA['Cost']))
LA

Total revenue from the state of Louisiana: $ 2688


Unnamed: 0_level_0,Name,Email,Phone Number,Street Address,City,State,Zip Code,Sales Rep,Sales Rep Email,Subscription,Cost
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
71,Christopher Collins,christophercollins@samplemail.com,(143) 452-8346,72624 Matthew Estate Suite 803,Port Annettehaven,Louisiana,5285,Richard Moore,richardmoore@company.com,Plus,199
269,David Oneill,davidoneill@samplemail.com,(601) 256-3942,877 Crystal Burg,Louisport,Louisiana,67619,Stephen Frey,stephenfrey@company.com,Plus,199
330,Tonya Martinez,tonyamartinez@samplemail.com,(136) 734-6774,63854 Jason Springs Apt. 747,South Theresa,Louisiana,19852,Jeffrey Williams,jeffreywilliams@company.com,Premium,499
351,Caitlin Buchanan,caitlinbuchanan@samplemail.com,(434) 619-4642,6952 Morris Spring Apt. 723,Sharonmouth,Louisiana,37093,Ricky Baker,rickybaker@company.com,Plus,199
355,Rita Jordan,ritajordan@samplemail.com,(674) 192-8769,663 Andrade Street,New Donnamouth,Louisiana,68291,Angela Durham,angeladurham@company.com,Plus,199
440,Karl Morris,karlmorris@samplemail.com,(653) 751-8821,337 Ashlee Ports Apt. 973,Port Melanie,Louisiana,49558,Ricky Baker,rickybaker@company.com,Plus,199
458,Todd Andrews,toddandrews@samplemail.com,(047) 380-7700,,Stevensmouth,Louisiana,70890,Tara Peck,tarapeck@company.com,Plus,199
484,Jane Gonzales,janegonzales@samplemail.com,(418) 690-8230,1836 Shelby Groves,Bassport,Louisiana,25282,Christopher Reeves,christopherreeves@company.com,Standard,99
490,Darren Orr,darrenorr@samplemail.com,(937) 711-2464,2958 Phillips Haven,West Jessicastad,Louisiana,25674,Jennifer Sanchez,jennifersanchez@company.com,Standard,99
565,Jerry Lopez,jerrylopez@samplemail.com,(189) 505-9975,01828 Brown Field,Reyesfort,Louisiana,48340,Angela Durham,angeladurham@company.com,Plus,199


In [23]:
# Calculate revenue from customers assigned to Bradley Todd

BT = customers[customers['Sales Rep'] == 'Bradley Todd']
print('Total revenue from customers assigned to Bradley Todd: $', sum(BT['Cost']))
BT

Total revenue from customers assigned to Bradley Todd: $ 5583


Unnamed: 0_level_0,Name,Email,Phone Number,Street Address,City,State,Zip Code,Sales Rep,Sales Rep Email,Subscription,Cost
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
50,Kenneth Owen,kennethowen@samplemail.com,(612) 151-0808,40721 Castro Throughway Apt. 638,North Nancystad,Rhode Island,96120,Bradley Todd,bradleytodd@company.com,Standard,99
86,James Wilson,jameswilson@samplemail.com,(997) 421-7483,407 Diane Meadow,Harperfurt,South Dakota,70914,Bradley Todd,bradleytodd@company.com,Premium,499
132,Jenna Gonzales,jennagonzales@samplemail.com,(285) 361-3692,599 Browning Isle Suite 770,Hendersonburgh,Texas,86611,Bradley Todd,bradleytodd@company.com,Standard,99
140,Jerry Francis,jerryfrancis@samplemail.com,(957) 161-5252,39749 Julie Highway,Joyceshire,Iowa,73709,Bradley Todd,bradleytodd@company.com,Plus,199
190,Judy Sanchez,judysanchez@samplemail.com,(958) 969-1034,7908 Seth Springs Suite 857,Port Ashleyburgh,Massachusetts,44961,Bradley Todd,bradleytodd@company.com,Premium,499
210,Kurt Stone,kurtstone@samplemail.com,(412) 909-0636,54945 Briana Trail Suite 878,Cunninghammouth,New York,64413,Bradley Todd,bradleytodd@company.com,Premium,499
308,Michael Dawson,michaeldawson@samplemail.com,(924) 171-1368,720 White Forest,North Yvonne,Tennessee,54524,Bradley Todd,bradleytodd@company.com,Premium,499
378,Christopher Cantrell,christophercantrell@samplemail.com,(785) 707-3457,3617 Bryce Wall Suite 159,Millerside,Maryland,86350,Bradley Todd,bradleytodd@company.com,Plus,199
483,Justin Baker,justinbaker@samplemail.com,(913) 674-0693,949 Jamie Mountain,Whitneymouth,Ohio,69358,Bradley Todd,bradleytodd@company.com,Premium,499
511,Tammy Davis,tammydavis@samplemail.com,(053) 070-0025,55845 Barbara Viaduct Apt. 482,Lake Erica,Hawaii,30378,Bradley Todd,bradleytodd@company.com,Standard,99


These are just two simple examples of valuable information that could be gained from detailed and cleaned data.

# Conclusion

I hope that this was helpful. There are of course many more ways this data can be manipulated and examined to draw conclusions from it. Feel free to create and share a notebook of your own to get some practice out of this synthetic dataset.