# Exploratory Data Analysis of Transaction data


## Overview for the EDA Notebook on E-commerce Transaction Data

This notebook will perform Exploratory Data Analysis on a dataset containing transaction information from an e-commerce platform. The dataset includes the following features:


- **created_at**: The timestamp when the transaction was created.
- **customer_id**: Unique identifier for each customer.
- **booking_id**: Unique identifier for each booking or order.
- **session_id**: Unique identifier for the user session during which the transaction occurred.
- **product_metadata**: A list containing metadata about the products purchased, including product IDs, quantities, and item prices.
- **payment_method**: The method used for payment (e.g., Debit Card, Credit Card, OVO).
- **payment_status**: The status of the payment (e.g., Success, Failed).
- **promo_amount**: The amount of promotional discount applied to the transaction.
- **promo_code**: The promotional code used, if any.
- **shipment_fee**: The fee charged for shipping the products.
- **shipment_date_limit**: The promised latest delivery date for the shipment.
- **shipment_location_lat**: Latitude coordinate of the shipment delivery location.
- **shipment_location_long**: Longitude coordinate of the shipment delivery location.
- **total_amount**: The total monetary amount of the transaction after discounts and fees.

### Objectives

- **Transaction Analysis**: Examine the overall transaction patterns, including total sales over time and peak transaction periods.
- **Customer Behavior**: Analyze customer purchasing behaviors, such as frequency of purchases, average transaction values, and repeat customers.
- **Payment Methods**: Identify the most popular payment methods and assess their success rates.
- **Promotion Effectiveness**: Evaluate the impact of promotional codes on sales and customer acquisition.
- **Geographical Insights**: Utilize shipment location data to map out key delivery areas and potential regions for market expansion.
- **Product Trends**: Analyze the product metadata to identify top-selling products, popular combinations, and inventory turnover rates.
- **Shipping Performance**: Assess shipping fees and delivery times to understand their influence on customer satisfaction and total sales.

### Expected Outcomes

This EDA aims to provide actionable insights into the transactional dynamics of the e-commerce platform, supporting strategic decision-making in the following areas:

- **Marketing Strategies**: Optimize promotional campaigns and discounts based on their effectiveness in driving sales.
- **Customer Relationship Management**: Enhance customer retention strategies by understanding purchasing patterns and identifying loyal customers.
- **Payment Processing**: Streamline payment methods by focusing on those preferred by customers and ensuring high success rates.
- **Logistics and Supply Chain**: Improve shipping efficiency and reduce costs by analyzing shipment fees, delivery times, and geographic distribution.
- **Product Management**: Inform inventory planning and product development by identifying best-selling items and market demand trends.
- **Business Growth: Leverage geographical insights to target new markets and tailor services to regional customer needs.



### Data Understanding 

In [1]:
import pandas as pd 
import sys
import os as os
import matplotlib.pyplot as plt

In [2]:
transaction_df = pd.read_csv('transactions.csv')
transaction_df.head()

Unnamed: 0,created_at,customer_id,booking_id,session_id,product_metadata,payment_method,payment_status,promo_amount,promo_code,shipment_fee,shipment_date_limit,shipment_location_lat,shipment_location_long,total_amount
0,2018-07-29T15:22:01.458193Z,5868,186e2bee-0637-4710-8981-50c2d737bc42,3abaa6ce-e320-4e51-9469-d9f3fa328e86,"[{'product_id': 54728, 'quantity': 1, 'item_pr...",Debit Card,Success,1415,WEEKENDSERU,10000,2018-08-03T05:07:24.812676Z,-8.227893,111.969107,199832
1,2018-07-30T12:40:22.365620Z,4774,caadb57b-e808-4f94-9e96-8a7d4c9898db,2ee5ead1-f13e-4759-92df-7ff48475e970,"[{'product_id': 16193, 'quantity': 1, 'item_pr...",Credit Card,Success,0,,10000,2018-08-03T01:29:03.415705Z,3.01347,107.802514,155526
2,2018-09-15T11:51:17.365620Z,4774,6000fffb-9c1a-4f4a-9296-bc8f6b622b50,93325fb6-eb00-4268-bb0e-6471795a0ad0,"[{'product_id': 53686, 'quantity': 4, 'item_pr...",OVO,Success,0,,10000,2018-09-18T08:41:49.422380Z,-2.579428,115.743885,550696
3,2018-11-01T11:23:48.365620Z,4774,f5e530a7-4350-4cd1-a3bc-525b5037bcab,bcad5a61-1b67-448d-8ff4-781d67bc56e4,"[{'product_id': 20228, 'quantity': 1, 'item_pr...",Credit Card,Success,0,,0,2018-11-05T17:42:27.954235Z,-3.602334,120.363824,271012
4,2018-12-18T11:20:30.365620Z,4774,0efc0594-dbbf-4f9a-b0b0-a488cfddf8a2,df1042ab-13e6-4072-b9d2-64a81974c51a,"[{'product_id': 55220, 'quantity': 1, 'item_pr...",Credit Card,Success,0,,0,2018-12-23T17:24:07.361785Z,-3.602334,120.363824,198753


In [4]:
transaction_df.dtypes

created_at                 object
customer_id                 int64
booking_id                 object
session_id                 object
product_metadata           object
payment_method             object
payment_status             object
promo_amount                int64
promo_code                 object
shipment_fee                int64
shipment_date_limit        object
shipment_location_lat     float64
shipment_location_long    float64
total_amount                int64
dtype: object

In [5]:
transaction_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 852584 entries, 0 to 852583
Data columns (total 14 columns):
 #   Column                  Non-Null Count   Dtype  
---  ------                  --------------   -----  
 0   created_at              852584 non-null  object 
 1   customer_id             852584 non-null  int64  
 2   booking_id              852584 non-null  object 
 3   session_id              852584 non-null  object 
 4   product_metadata        852584 non-null  object 
 5   payment_method          852584 non-null  object 
 6   payment_status          852584 non-null  object 
 7   promo_amount            852584 non-null  int64  
 8   promo_code              326536 non-null  object 
 9   shipment_fee            852584 non-null  int64  
 10  shipment_date_limit     852584 non-null  object 
 11  shipment_location_lat   852584 non-null  float64
 12  shipment_location_long  852584 non-null  float64
 13  total_amount            852584 non-null  int64  
dtypes: float64(2), int64