**Data Collection Section of the Report**

**Introduction:**
In this section of the report, we will discuss the data collection process for Project 1, which focuses on Sales Trends and Customer Insights in the e-commerce business. This project is part of the Descriptive Analysis phase, aimed at providing a clear understanding of past events and trends within the e-commerce dataset. Specifically, we will gather data related to customer orders, including recency, which is a critical component for RFM (Recency, Frequency, Monetary) analysis.

**About the Dataset:**
The dataset used for this analysis is `bigquery-public-data.thelook_ecommerce`. It is stored on Google BigQuery, a powerful cloud-based data warehouse. The dataset contains various tables that provide detailed information about e-commerce transactions, including orders, order items, and user profiles. Here are some key tables from the dataset:

- `orders`: Contains information about individual orders, including order IDs, order dates, and user IDs.
- `order_items`: Provides details about the items within each order, including product IDs, sale prices, and associated order IDs.
- `users`: Contains user profiles, including gender, age, and other demographic information.

**Data Collection Query:**
To perform the analysis for Project 1, we utilized a SQL query to extract relevant data from the dataset. The query involves the calculation of recency, a key component for understanding customer behavior and retention. Below is the query used for data collection:


In [1]:
import os
from google.cloud import bigquery

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "C:/Users/ramdh/AppData/Roaming/gcloud/application_default_credentials.json"
os.environ["GOOGLE_CLOUD_PROJECT"] = "fluted-gateway-306407"


client = bigquery.Client()

In [2]:
# BigSQL query
sql_query = """
WITH LastPurchaseDates AS (
  SELECT
    user_id,
    MAX(created_at) AS last_purchase_date
  FROM
    `bigquery-public-data.thelook_ecommerce.orders`
  GROUP BY
    user_id
)

SELECT
  o.order_id,
  o.created_at AS order_date,
  o.num_of_item AS num_of_item,
  o.status AS order_status,
  oi.product_id,
  oi.sale_price,
  u.gender,
  u.age,
  DATE_DIFF(o.created_at, lpd.last_purchase_date, DAY) AS recency_days
FROM
  `bigquery-public-data.thelook_ecommerce.orders` AS o
JOIN
  `bigquery-public-data.thelook_ecommerce.order_items` AS oi
ON
  o.order_id = oi.order_id
JOIN
  `bigquery-public-data.thelook_ecommerce.users` AS u
ON
  o.user_id = u.id
LEFT JOIN
  LastPurchaseDates AS lpd
ON
  o.user_id = lpd.user_id
"""

# Run the query
query_job = client.query(sql_query)

# Fetch the results
results = query_job.result()


df = results.to_dataframe()
df.to_csv()

**Explanation of the Query:**
- We begin by creating a Common Table Expression (CTE) named `LastPurchaseDates` to find the last purchase date for each user. This is essential for calculating recency.
- The main query joins multiple tables, including `orders`, `order_items`, and `users`, to gather relevant information about each order, such as order date, product details, and user demographics.
- Within the main query, we calculate recency by using the `DATE_DIFF` function to determine the number of days between the order date and the user's last purchase date. This provides insights into how recently each customer has made a purchase.

**Purpose of Recency Calculation:**
Recency is a crucial metric for RFM analysis. It helps us understand how recently customers have interacted with the e-commerce platform. By calculating recency, we can segment customers based on their activity levels, identify loyal and inactive customers, and tailor marketing strategies to re-engage dormant customers. This analysis enables the e-commerce business to optimize customer retention and maximize revenue.

In the subsequent sections of this report, we will delve deeper into the findings and insights derived from this data collection process, providing a narrative-driven storytelling approach to understand past sales trends and customer behavior.

In [5]:
df

Unnamed: 0,order_id,order_date,num_of_item,order_status,product_id,sale_price,gender,age,recency_days
0,47594,2022-05-07 06:56:00+00:00,1,Shipped,13606,2.50,F,17,0
1,43908,2021-01-12 14:33:00+00:00,1,Complete,13606,2.50,F,25,-779
2,45959,2023-04-14 16:28:00+00:00,3,Complete,13606,2.50,F,54,0
3,60468,2023-05-28 14:59:00+00:00,4,Complete,13606,2.50,F,55,0
4,77057,2021-09-02 07:05:00+00:00,2,Complete,13606,2.50,F,17,0
...,...,...,...,...,...,...,...,...,...
181222,111053,2023-01-03 12:22:00+00:00,1,Returned,3449,9.82,F,16,0
181223,104251,2023-03-22 17:56:00+00:00,1,Cancelled,3449,9.82,F,62,0
181224,1583,2023-09-11 10:09:00+00:00,1,Processing,3449,9.82,F,22,0
181225,3856,2023-05-28 06:24:00+00:00,1,Processing,3449,9.82,F,60,0


todo:
- add shipping days
- add user_id

In [6]:
df['recency_days'].describe()

count      181227.0
mean    -109.315317
std      232.187386
min         -1639.0
25%           -91.0
50%             0.0
75%             0.0
max             0.0
Name: recency_days, dtype: Float64


There are a number of possible reasons why the number of users of a product or service may be increasing while the frequencies and average order values (AOV) remain stagnant. Some of the most likely explanations include:

- New users are not as engaged as existing users. New users may be less likely to use a product or service frequently or to spend a lot of money on it, especially if they are still trying it out or learning how to use it.
- The product or service is no longer meeting the needs of existing users. If the product or service is not evolving or improving, existing users may become bored or frustrated and start to use it less often or to spend less money on it.
- There is increased competition. If there are other products or services that are similar to yours and that are offering better value or a better user experience, then your users may be switching to those products or services.
The overall economy is struggling. If people are having less money to spend, they may be cutting back on their spending on non-essential goods and services.