This project aims to gain consumer insight of a multi-category online store through Exploratory Data Analysis (EDA). Various methods are utilised in order to obtain various angles and critically validate the analysis.
E-Commerce is a business model that lets firms and individuals buy and sell things over the internet.
Behavioral data is data generated by, or in response to, a customer’s engagement with a business.
The dataset contains behaviour data, collected by Open CDP project, from a large multi-category online store in October 2019. Each row in the file represents an event. All events are related to products and users. Each event is like many-to many relation between products and users. Source Data
o event_time: Time when event happened at (in UTC).
o event_type: Only one kind of event: purchase.
o product_id: ID of a product
o category_id: Product's category ID
o category_code: Product's category taxonomy (code name) if it was possible to make it. Usually present for meaningful categories and skipped for different kinds of accessories.
o brand: Downcased string of brand name. Can be missed.
o price: Float price of a product. Present.
o user_id: Permanent user ID.
o user_session: Temporary user's session ID. Same for each user's session. Is changed every time user come back to online store from a long pause.
- How is the behavior of Visitors/Customers towards the goods available in the online store in October 2019?
- What items are most viewed by visitors/customers in October 2019?
- What brand of goods were purchased the most by Visitors/Customers in October 2019?
- What is the maximum, minimum, and average price of goods in the Online Store in October 2019?
- How is the behavior of visitors at a specific time of the day?
- What is the best product to sell at a specific time of the day?
- How is the behavior of our repeat user vs new user?
The type of missing value there are NaN in several rows and columns.
The most missing values are in the ‘category_code’ column, which is 1.127.141 missing values., followed by the 'brand' which is 524.374 missing values. These NA value will not be deleted so as to accurately asses the price distribution.
In general, online shop visitors have a habit of viewing goods or add items to the cart but not yet certainly buy. Visitors/customers who buy goods at October 2019 totaled 58.402 visitors (1,63%).
The items most visited/viewed by online shop in October 2019 is electronics.smartphones. Number of visited/viewed of items electronics.smartphones reached 2.058.075 units in October 2019.
Samsung has become the brand that is most in demand by customers in October 2019. Number of purchases of brand goods Samsung reached 18.409 units in October 2019.
The highest item price in the Online Store in October 2019 was 2574.07.
The lowest item price in the Online Store in October 2019 was 0.79.
The average price of goods in online stores in October 2019 is 300.0685
Most consumers shop at e-commerce on Wednesday. Actually, consumers have seen the product since Tuesday. After Tuesday consumers are still considering whether to buy or not. Besides that consumers are still comparing one product to another. The final decision to buy is mostly made on Wednesday. The e-commerce business team can provide a campaign to convince customers. The campaign can be intensified on Wednesday.
electronics.smartphone has become the category_code that is most in demand at a specific time of the day by customers in October 2019. Number of purchases of category_code goods electronics.smartphone reached 13.333 units in Wednesday.
Samsung has become the brand that is most in demand at a specific time of the day by customers in October 2019. Number of purchases of brand goods Samsung reached 6.269 units in Wednesday.
Disclaimer:
I defined user that customer who purchase the product and cart the product (written as "not_view")
New customer: not_view_count is just one
Repeat customer: not_view_count is more than one
The difference between the behavior of new users and repeat users is that new users tend to purchase the product right away rather than adding the product to their cart. This is because new users when opening e-commerce only focuses on purchasing products. Whereas repeat users when opening e-commerce only to see the latest products, if like the products, repeat users tend to add products to the cart first, then repeat users are still considering buying or not and comparing one product with other products. If the product is as expected, repeat users will buy the product.
Most new users buy electronics smartphone products. The e-commerce business team can make promo of this product category to attract more new market users.
Most repeat users buy smartphone electronics products. Meanwhile product electronics audio headphones have a very large gap. The e-commerce business team can make product bundling between electronics smartphone products and electronics audio headphones to increase profits from product categories that are still low in orders.