# 🚩 Dataframe Date/Time Type
## 주요 토픽
1. Datetime 데이터 타입
2. Formatting & Parting
3. Time Delta
4. Datetime 인덱스
5. Shifting & Aggregating
## 목표
1. Understand nuances of the datetime types in base Python and Pandas
2. Apply custom date formats and extract datetime components from datetime data
3. Access portions of time series data and offset time series for period-by-period comparison
4. Create custom time periods and reshape data to fit periods of interest

In [1]:
import pandas as pd
import numpy as np

## 1. Times in 파이썬, 판다스
### > 파이썬
- Date Components + Time Components

In [2]:
from datetime import datetime

now = datetime.now()
now

datetime.datetime(2023, 8, 2, 18, 18, 1, 828975)

In [3]:
type(now)

datetime.datetime

### > 판다스
- astype 메서드
    - convert strings to datetimes, 그러나 판다스가 dates라고 인식하지 못하는 값들로 인해 에러가 발생한다
- to_datetime 메서드 (RECOMMENDED)
    1. errors='coerce'
    2. infer_datetime_format=True
    3. format

In [4]:
retail = pd.read_csv('./data/retail/retail_2016_2017.csv')
retail.head()

Unnamed: 0,id,date,store_nbr,family,sales,onpromotion
0,1945944,2016-01-01,1,AUTOMOTIVE,0.0,0
1,1945945,2016-01-01,1,BABY CARE,0.0,0
2,1945946,2016-01-01,1,BEAUTY,0.0,0
3,1945947,2016-01-01,1,BEVERAGES,0.0,0
4,1945948,2016-01-01,1,BOOKS,0.0,0


In [5]:
retail.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1054944 entries, 0 to 1054943
Data columns (total 6 columns):
 #   Column       Non-Null Count    Dtype  
---  ------       --------------    -----  
 0   id           1054944 non-null  int64  
 1   date         1054944 non-null  object 
 2   store_nbr    1054944 non-null  int64  
 3   family       1054944 non-null  object 
 4   sales        1054944 non-null  float64
 5   onpromotion  1054944 non-null  int64  
dtypes: float64(1), int64(3), object(2)
memory usage: 167.8 MB


In [6]:
retail['date'] = pd.to_datetime(
    retail['date'],
    errors='coerce',
    infer_datetime_format=True
)

retail.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1054944 entries, 0 to 1054943
Data columns (total 6 columns):
 #   Column       Non-Null Count    Dtype         
---  ------       --------------    -----         
 0   id           1054944 non-null  int64         
 1   date         1054944 non-null  datetime64[ns]
 2   store_nbr    1054944 non-null  int64         
 3   family       1054944 non-null  object        
 4   sales        1054944 non-null  float64       
 5   onpromotion  1054944 non-null  int64         
dtypes: datetime64[ns](1), float64(1), int64(3), object(1)
memory usage: 108.4 MB


In [7]:
retail.head()

Unnamed: 0,id,date,store_nbr,family,sales,onpromotion
0,1945944,2016-01-01,1,AUTOMOTIVE,0.0,0
1,1945945,2016-01-01,1,BABY CARE,0.0,0
2,1945946,2016-01-01,1,BEAUTY,0.0,0
3,1945947,2016-01-01,1,BEVERAGES,0.0,0
4,1945948,2016-01-01,1,BOOKS,0.0,0


### > Date Codes
- %D
- %Y (4자리 연도)
- %y (2자리 연도)
- %m (월; 숫자)
- %B (월; 문자열 전체)
- %b (월; 문자열 파트)
- %d (일; 숫자)
- %  (요일; 숫자)
    - 0-6 (일-토)
- %A (요일; 문자열)
- %U (week of year)
- %j (day of year)
### > Time Codes
- %T
- %H (시; 24)
- %I (시; 12)
- %p (오전, 오후)
- %M (분)
- %S (초)
### > dt.strftime 메서드
- object 타입을 리턴한다

In [8]:
retail = pd.read_csv('./data/retail/retail_2016_2017.csv', parse_dates=['date'])
retail.head()

Unnamed: 0,id,date,store_nbr,family,sales,onpromotion
0,1945944,2016-01-01,1,AUTOMOTIVE,0.0,0
1,1945945,2016-01-01,1,BABY CARE,0.0,0
2,1945946,2016-01-01,1,BEAUTY,0.0,0
3,1945947,2016-01-01,1,BEVERAGES,0.0,0
4,1945948,2016-01-01,1,BOOKS,0.0,0


In [9]:
retail.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1054944 entries, 0 to 1054943
Data columns (total 6 columns):
 #   Column       Non-Null Count    Dtype         
---  ------       --------------    -----         
 0   id           1054944 non-null  int64         
 1   date         1054944 non-null  datetime64[ns]
 2   store_nbr    1054944 non-null  int64         
 3   family       1054944 non-null  object        
 4   sales        1054944 non-null  float64       
 5   onpromotion  1054944 non-null  int64         
dtypes: datetime64[ns](1), float64(1), int64(3), object(1)
memory usage: 48.3+ MB


In [10]:
retail['date'].dt.strftime('%Y').head()

0    2016
1    2016
2    2016
3    2016
4    2016
Name: date, dtype: object

In [11]:
retail['date'].dt.strftime('%y-%b-%d').head()

0    16-Jan-01
1    16-Jan-01
2    16-Jan-01
3    16-Jan-01
4    16-Jan-01
Name: date, dtype: object

### > Extract datetime components
- dt 접근자
    - date / time → object
    - year, month, dayofweek / hour, minute, second → int

In [12]:
retail.head(2)

Unnamed: 0,id,date,store_nbr,family,sales,onpromotion
0,1945944,2016-01-01,1,AUTOMOTIVE,0.0,0
1,1945945,2016-01-01,1,BABY CARE,0.0,0


In [13]:
retail.assign(
    year=retail['date'].dt.year,
    month=retail['date'].dt.month,
    day=retail['date'].dt.day
).head()

Unnamed: 0,id,date,store_nbr,family,sales,onpromotion,year,month,day
0,1945944,2016-01-01,1,AUTOMOTIVE,0.0,0,2016,1,1
1,1945945,2016-01-01,1,BABY CARE,0.0,0,2016,1,1
2,1945946,2016-01-01,1,BEAUTY,0.0,0,2016,1,1
3,1945947,2016-01-01,1,BEVERAGES,0.0,0,2016,1,1
4,1945948,2016-01-01,1,BOOKS,0.0,0,2016,1,1
