`estimated_delivery_date` 열을 새로 생성한 이유
- 승인이 안 되었거나(NULL), 취소된 주문에도 예상 배송일은 무조건 채워져 있음
- 따라서, **주문 생성 시점**에 예상 배송일이 생성됐다는 의미로 해석할 수 있음

In [1]:
import pandas as pd
import os

In [2]:
REDEFINED_DIR = "../../downloads/olist_redefined"
os.makedirs(REDEFINED_DIR, exist_ok=True)

In [3]:
orders = pd.read_csv("../../downloads/olist/olist_orders_dataset.csv")

In [4]:
estd_date = orders[['order_id', 'order_estimated_delivery_date']].drop_duplicates()
estd_date = estd_date.rename(columns={'order_estimated_delivery_date': 'estimated_delivery_date'})

In [5]:
melted_orders = pd.melt(
    orders,
    id_vars=['order_id', 'customer_id'],  # 고정할 컬럼
    value_vars=['order_purchase_timestamp', 'order_approved_at', 
                'order_delivered_carrier_date', 'order_delivered_customer_date', 
                ],  # 변환할 컬럼: datetime이 있는 모든 열
    var_name='status',
    value_name='timestamp'  # 값이 타임스탬프이므로 'datetime'으로 설정
).sort_values(['order_id', 'timestamp'])

melted_orders.rename(columns={'order_status': "final_status"}, inplace=True)
melted_orders['status'] = melted_orders['status'].apply(lambda x: "_".join(x.split("_")[1:-1]))
melted_orders['timestamp'] = pd.to_datetime(melted_orders['timestamp'])
melted_orders = melted_orders.dropna()

In [6]:
melted_orders.shape

(392856, 4)

In [7]:
melted_orders.head()

Unnamed: 0,order_id,customer_id,status,timestamp
85267,00010242fe8c5a6d1ba2dd792cb16214,3ce436f183e68e07877b285a838db11a,purchase,2017-09-13 08:59:02
184708,00010242fe8c5a6d1ba2dd792cb16214,3ce436f183e68e07877b285a838db11a,approved,2017-09-13 09:45:35
284149,00010242fe8c5a6d1ba2dd792cb16214,3ce436f183e68e07877b285a838db11a,delivered_carrier,2017-09-19 18:34:16
383590,00010242fe8c5a6d1ba2dd792cb16214,3ce436f183e68e07877b285a838db11a,delivered_customer,2017-09-20 23:43:48
71853,00018f77f2f0320c557190d7a144bdd3,f6dd3ec061db4e3987629fe6b26e5cce,purchase,2017-04-26 10:53:06


In [8]:
result = melted_orders.merge(estd_date, on='order_id', how='left')

In [9]:
result.sort_values(['timestamp']).to_csv(f"{REDEFINED_DIR}/order_status.tsv", index=False, sep='\t')