# Exploratory Data Analysis of Customer data


## Overview for the EDA Notebook on E-commerce Customer Data

This notebook will perform Exploratory Data Analysis (EDA) on a dataset containing detailed customer information from an e-commerce platform. The dataset includes the following features:

- **Customer ID**: Unique identifier for each customer.
- **Personal Information**: First name, last name, username, email, gender, and birthdate.
- **Device Information**: Type of device used (iOS/Android), device ID, and device version.
- **Geographical Data**: Home location coordinates (latitude and longitude), location name, and home country.
- **Customer Journey**: First join date of the customer on the platform.

## Objectives:

- **Demographics Analysis**: Analyze customer demographics including gender, age distribution, and geographical location.
- **Device Usage**: Identify device preferences across different customer segments.
- **Behavioral Patterns**: Explore customer onboarding dates to understand growth trends over time.
- **Location Insights**: Visualize customer distribution across regions using geospatial data.

This EDA will provide insights into customer behaviors, preferences, and potential trends, which can help in better understanding the platform's user base and guiding


### Data understanding

In [59]:
# all the packages needed for the EDA
import pandas as pd 
import sys
import os as os

In [60]:
customer_df = pd.read_csv('customer.csv')
customer_df.head()


Unnamed: 0,customer_id,first_name,last_name,username,email,gender,birthdate,device_type,device_id,device_version,home_location_lat,home_location_long,home_location,home_country,first_join_date
0,2870,Lala,Maryati,671a0865-ac4e-4dc4-9c4f-c286a1176f7e,671a0865_ac4e_4dc4_9c4f_c286a1176f7e@startupca...,F,1996-06-14,iOS,c9c0de76-0a6c-4ac2-843f-65264ab9fe63,iPhone; CPU iPhone OS 14_2_1 like Mac OS X,-1.043345,101.360523,Sumatera Barat,Indonesia,2019-07-21
1,8193,Maimunah,Laksmiwati,83be2ba7-8133-48a4-bbcb-b46a2762473f,83be2ba7_8133_48a4_bbcb_b46a2762473f@zakyfound...,F,1993-08-16,Android,fb331c3d-f42e-40fe-afe2-b4b73a8a6e25,Android 2.2.1,-6.212489,106.81885,Jakarta Raya,Indonesia,2017-07-16
2,7279,Bakiman,Simanjuntak,3250e5a3-1d23-4675-a647-3281879d42be,3250e5a3_1d23_4675_a647_3281879d42be@startupca...,M,1989-01-23,iOS,d13dde0a-6ae1-43c3-83a7-11bbb922730b,iPad; CPU iPad OS 4_2_1 like Mac OS X,-8.631607,116.428436,Nusa Tenggara Barat,Indonesia,2020-08-23
3,88813,Cahyadi,Maheswara,df797edf-b465-4a80-973b-9fbb612260c2,df797edf_b465_4a80_973b_9fbb612260c2@zakyfound...,M,1991-01-05,iOS,f4c18515-c5be-419f-8142-f037be47c9cd,iPad; CPU iPad OS 14_2 like Mac OS X,1.299332,115.774934,Kalimantan Timur,Indonesia,2021-10-03
4,82542,Irnanto,Wijaya,36ab08e1-03de-42a8-9e3b-59528c798824,36ab08e1_03de_42a8_9e3b_59528c798824@startupca...,M,2000-07-15,iOS,e46e4c36-4630-4736-8fcf-663db29ca3b0,iPhone; CPU iPhone OS 10_3_3 like Mac OS X,-2.980807,114.924675,Kalimantan Selatan,Indonesia,2021-04-11


In [61]:

customer_df.shape

(100000, 15)

In [62]:
customer_df.dtypes

customer_id             int64
first_name             object
last_name              object
username               object
email                  object
gender                 object
birthdate              object
device_type            object
device_id              object
device_version         object
home_location_lat     float64
home_location_long    float64
home_location          object
home_country           object
first_join_date        object
dtype: object

In [63]:
customer_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 15 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   customer_id         100000 non-null  int64  
 1   first_name          100000 non-null  object 
 2   last_name           100000 non-null  object 
 3   username            100000 non-null  object 
 4   email               100000 non-null  object 
 5   gender              100000 non-null  object 
 6   birthdate           100000 non-null  object 
 7   device_type         100000 non-null  object 
 8   device_id           100000 non-null  object 
 9   device_version      100000 non-null  object 
 10  home_location_lat   100000 non-null  float64
 11  home_location_long  100000 non-null  float64
 12  home_location       100000 non-null  object 
 13  home_country        100000 non-null  object 
 14  first_join_date     100000 non-null  object 
dtypes: float64(2), int64(1), object(12)

In [64]:
customer_df.describe()

Unnamed: 0,customer_id,home_location_lat,home_location_long
count,100000.0,100000.0,100000.0
mean,50000.5,-5.10639,110.936081
std,28867.657797,3.088183,6.343363
min,1.0,-10.845002,95.275319
25%,25000.75,-7.37265,106.860628
50%,50000.5,-6.240087,110.16201
75%,75000.25,-3.092254,113.171187
max,100000.0,5.818355,140.993119


### Data cleaning 

In [68]:
# customer_df.to_csv('customers.csv', index= False)
# processed_customer_df = pd.read_csv('customers.csv', index_col= False)

In [67]:
# processed_customer_df.head()

In [87]:


# customer_df.astype({'home_location_lat': float, 'home_location': str ,'first_name': str, 'last_name': str, 'device_type': str})
customer_df['home_location', 'first_name' , 'last_name', 'device_type'] = customer_df['home_location', 'first_name' , 'last_name', 'device_type'].astype(str)

# customer_df.astype({'home_location': str})

KeyError: ('home_location', 'first_name', 'last_name', 'device_type')

In [86]:
customer_df.info()
# customer_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 15 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   customer_id         100000 non-null  int64  
 1   first_name          100000 non-null  object 
 2   last_name           100000 non-null  object 
 3   username            100000 non-null  object 
 4   email               100000 non-null  object 
 5   gender              100000 non-null  object 
 6   birthdate           100000 non-null  object 
 7   device_type         100000 non-null  object 
 8   device_id           100000 non-null  object 
 9   device_version      100000 non-null  object 
 10  home_location_lat   100000 non-null  float64
 11  home_location_long  100000 non-null  float64
 12  home_location       100000 non-null  object 
 13  home_country        100000 non-null  object 
 14  first_join_date     100000 non-null  object 
dtypes: float64(2), int64(1), object(12)