# Pandas 시계열 자료 다루기
### - 인덱스(Index) : pandas에서 일반적 테이블 자료와 시계열 자료의 차이점
### - 일반테이블 : 임의의 값 인덱스로 가질 수 있음.
### - 시계열 자료 : 다음 클래스를 인덱스로 -> DatetimeIndex(타임스탬프)

## DatatimeIndex
### - DatatimeIndex : 특정한 순간에 기록된 타임스탬프 형식의 시계열 자료를 다루기 위한 인덱스. 타임스탬프 인덱스는 반드시 일정한 간격으로 자료가 있어야 한다는 조건은 없음
### - DatatimeIndex 생성방법
###      - pd.to_datetime 함수
###      - pd.date_range 함수

## to_datetime

In [2]:
import pandas as pd
import numpy as np

In [3]:
date_str = ["2016, 1, 1", "2016, 1, 4", "2016, 1, 5", "2016, 1, 6"]
idx = pd.to_datetime(date_str)
idx

DatetimeIndex(['2016-01-01', '2016-01-04', '2016-01-05', '2016-01-06'], dtype='datetime64[ns]', freq=None)

In [4]:
np.random.seed(0)
s = pd.Series(np.random.randn(4), index=idx)
s

2016-01-01    1.764052
2016-01-04    0.400157
2016-01-05    0.978738
2016-01-06    2.240893
dtype: float64

## date_range
#### - 시작일과 종료일 또는 시작일과 기간을 입력하면 범위 내의 날짜 및 시간 인덱스 생성
#### - freq 인수로 빈도지정 가능

In [5]:
pd.date_range("2016-4-1", "2016-4-30")

DatetimeIndex(['2016-04-01', '2016-04-02', '2016-04-03', '2016-04-04',
               '2016-04-05', '2016-04-06', '2016-04-07', '2016-04-08',
               '2016-04-09', '2016-04-10', '2016-04-11', '2016-04-12',
               '2016-04-13', '2016-04-14', '2016-04-15', '2016-04-16',
               '2016-04-17', '2016-04-18', '2016-04-19', '2016-04-20',
               '2016-04-21', '2016-04-22', '2016-04-23', '2016-04-24',
               '2016-04-25', '2016-04-26', '2016-04-27', '2016-04-28',
               '2016-04-29', '2016-04-30'],
              dtype='datetime64[ns]', freq='D')

In [7]:
pd.date_range(start="2016-4-1", periods=30)

DatetimeIndex(['2016-04-01', '2016-04-02', '2016-04-03', '2016-04-04',
               '2016-04-05', '2016-04-06', '2016-04-07', '2016-04-08',
               '2016-04-09', '2016-04-10', '2016-04-11', '2016-04-12',
               '2016-04-13', '2016-04-14', '2016-04-15', '2016-04-16',
               '2016-04-17', '2016-04-18', '2016-04-19', '2016-04-20',
               '2016-04-21', '2016-04-22', '2016-04-23', '2016-04-24',
               '2016-04-25', '2016-04-26', '2016-04-27', '2016-04-28',
               '2016-04-29', '2016-04-30'],
              dtype='datetime64[ns]', freq='D')

In [9]:
pd.date_range("2016-4-1", "2016-4-30", freq="B")

DatetimeIndex(['2016-04-01', '2016-04-04', '2016-04-05', '2016-04-06',
               '2016-04-07', '2016-04-08', '2016-04-11', '2016-04-12',
               '2016-04-13', '2016-04-14', '2016-04-15', '2016-04-18',
               '2016-04-19', '2016-04-20', '2016-04-21', '2016-04-22',
               '2016-04-25', '2016-04-26', '2016-04-27', '2016-04-28',
               '2016-04-29'],
              dtype='datetime64[ns]', freq='B')

In [10]:
pd.date_range("2016-4-1", "2016-12-31", freq="MS")

DatetimeIndex(['2016-04-01', '2016-05-01', '2016-06-01', '2016-07-01',
               '2016-08-01', '2016-09-01', '2016-10-01', '2016-11-01',
               '2016-12-01'],
              dtype='datetime64[ns]', freq='MS')

In [11]:
pd.date_range("2016-4-1", "2016-12-31", freq="M")

DatetimeIndex(['2016-04-30', '2016-05-31', '2016-06-30', '2016-07-31',
               '2016-08-31', '2016-09-30', '2016-10-31', '2016-11-30',
               '2016-12-31'],
              dtype='datetime64[ns]', freq='M')

In [12]:
pd.date_range("2016-4-1","2016-12-31", freq="BMS")

DatetimeIndex(['2016-04-01', '2016-05-02', '2016-06-01', '2016-07-01',
               '2016-08-01', '2016-09-01', '2016-10-03', '2016-11-01',
               '2016-12-01'],
              dtype='datetime64[ns]', freq='BMS')

In [13]:
pd.date_range("2016-4-1", "2016-12-31", freq="BM")

DatetimeIndex(['2016-04-29', '2016-05-31', '2016-06-30', '2016-07-29',
               '2016-08-31', '2016-09-30', '2016-10-31', '2016-11-30',
               '2016-12-30'],
              dtype='datetime64[ns]', freq='BM')

In [14]:
pd.date_range("2016-1-1", "2016-12-31", freq="W-MON")

DatetimeIndex(['2016-01-04', '2016-01-11', '2016-01-18', '2016-01-25',
               '2016-02-01', '2016-02-08', '2016-02-15', '2016-02-22',
               '2016-02-29', '2016-03-07', '2016-03-14', '2016-03-21',
               '2016-03-28', '2016-04-04', '2016-04-11', '2016-04-18',
               '2016-04-25', '2016-05-02', '2016-05-09', '2016-05-16',
               '2016-05-23', '2016-05-30', '2016-06-06', '2016-06-13',
               '2016-06-20', '2016-06-27', '2016-07-04', '2016-07-11',
               '2016-07-18', '2016-07-25', '2016-08-01', '2016-08-08',
               '2016-08-15', '2016-08-22', '2016-08-29', '2016-09-05',
               '2016-09-12', '2016-09-19', '2016-09-26', '2016-10-03',
               '2016-10-10', '2016-10-17', '2016-10-24', '2016-10-31',
               '2016-11-07', '2016-11-14', '2016-11-21', '2016-11-28',
               '2016-12-05', '2016-12-12', '2016-12-19', '2016-12-26'],
              dtype='datetime64[ns]', freq='W-MON')

In [15]:
pd.date_range("2016-1-1", "2016-12-31", freq="WOM-2THU")

DatetimeIndex(['2016-01-14', '2016-02-11', '2016-03-10', '2016-04-14',
               '2016-05-12', '2016-06-09', '2016-07-14', '2016-08-11',
               '2016-09-08', '2016-10-13', '2016-11-10', '2016-12-08'],
              dtype='datetime64[ns]', freq='WOM-2THU')

In [16]:
pd.date_range("2016-1-1", "2016-12-31", freq="Q-DEC")

DatetimeIndex(['2016-03-31', '2016-06-30', '2016-09-30', '2016-12-31'], dtype='datetime64[ns]', freq='Q-DEC')

## shift 연산
#### 날짜 이동

In [17]:
ts = pd.Series(np.random.randn(4), index=pd.date_range("2000-1-1", periods=4, freq="M"))
ts

2000-01-31    1.867558
2000-02-29   -0.977278
2000-03-31    0.950088
2000-04-30   -0.151357
Freq: M, dtype: float64

In [18]:
ts.shift(1)

2000-01-31         NaN
2000-02-29    1.867558
2000-03-31   -0.977278
2000-04-30    0.950088
Freq: M, dtype: float64

In [19]:
ts.shift(-1)

2000-01-31   -0.977278
2000-02-29    0.950088
2000-03-31   -0.151357
2000-04-30         NaN
Freq: M, dtype: float64

In [20]:
ts.shift(1, freq="M")

2000-02-29    1.867558
2000-03-31   -0.977278
2000-04-30    0.950088
2000-05-31   -0.151357
Freq: M, dtype: float64

In [21]:
ts.shift(1, freq="W")

2000-02-06    1.867558
2000-03-05   -0.977278
2000-04-02    0.950088
2000-05-07   -0.151357
Freq: WOM-1SUN, dtype: float64

## 리샘플링(Resampling)
### - up-sampling : 구간이 작아지는 경우
### - down-sampling : 구간이 커지는 경우

In [22]:
ts = pd.Series(np.random.randn(100), index=pd.date_range("2000-1-1", periods=100, freq="D"))
ts.tail(20)

2000-03-21   -1.070753
2000-03-22    1.054452
2000-03-23   -0.403177
2000-03-24    1.222445
2000-03-25    0.208275
2000-03-26    0.976639
2000-03-27    0.356366
2000-03-28    0.706573
2000-03-29    0.010500
2000-03-30    1.785870
2000-03-31    0.126912
2000-04-01    0.401989
2000-04-02    1.883151
2000-04-03   -1.347759
2000-04-04   -1.270485
2000-04-05    0.969397
2000-04-06   -1.173123
2000-04-07    1.943621
2000-04-08   -0.413619
2000-04-09   -0.747455
Freq: D, dtype: float64

In [23]:
ts.resample('W').mean()

2000-01-02    0.153690
2000-01-09    0.678949
2000-01-16   -0.360469
2000-01-23    0.547293
2000-01-30   -0.035616
2000-02-06   -0.489050
2000-02-13   -0.464083
2000-02-20   -0.222374
2000-02-27   -0.594077
2000-03-05   -0.003614
2000-03-12   -0.460333
2000-03-19    0.461145
2000-03-26    0.258279
2000-04-02    0.753052
2000-04-09   -0.291346
Freq: W-SUN, dtype: float64

In [24]:
ts.resample('M').first()

2000-01-31   -0.103219
2000-02-29   -0.302303
2000-03-31   -0.907298
2000-04-30    0.401989
Freq: M, dtype: float64

In [25]:
ts = pd.Series(np.random.randn(60), index=pd.date_range("2000-1-1", periods=60, freq="T"))
ts.head(20)

2000-01-01 00:00:00    1.922942
2000-01-01 00:01:00    1.480515
2000-01-01 00:02:00    1.867559
2000-01-01 00:03:00    0.906045
2000-01-01 00:04:00   -0.861226
2000-01-01 00:05:00    1.910065
2000-01-01 00:06:00   -0.268003
2000-01-01 00:07:00    0.802456
2000-01-01 00:08:00    0.947252
2000-01-01 00:09:00   -0.155010
2000-01-01 00:10:00    0.614079
2000-01-01 00:11:00    0.922207
2000-01-01 00:12:00    0.376426
2000-01-01 00:13:00   -1.099401
2000-01-01 00:14:00    0.298238
2000-01-01 00:15:00    1.326386
2000-01-01 00:16:00   -0.694568
2000-01-01 00:17:00   -0.149635
2000-01-01 00:18:00   -0.435154
2000-01-01 00:19:00    1.849264
Freq: T, dtype: float64

In [26]:
ts.resample('10min').sum()

2000-01-01 00:00:00    8.552595
2000-01-01 00:10:00    3.007843
2000-01-01 00:20:00    0.615467
2000-01-01 00:30:00    2.584603
2000-01-01 00:40:00   -2.418811
2000-01-01 00:50:00   -2.042876
Freq: 10T, dtype: float64

In [27]:
ts.resample('10min', closed="right").sum()

1999-12-31 23:50:00    1.922942
2000-01-01 00:00:00    7.243732
2000-01-01 00:10:00    3.066058
2000-01-01 00:20:00    0.339179
2000-01-01 00:30:00    0.872688
2000-01-01 00:40:00   -2.250372
2000-01-01 00:50:00   -0.895407
Freq: 10T, dtype: float64

In [28]:
ts.resample('5min').ohlc()

Unnamed: 0,open,high,low,close
2000-01-01 00:00:00,1.922942,1.922942,-0.861226,-0.861226
2000-01-01 00:05:00,1.910065,1.910065,-0.268003,-0.15501
2000-01-01 00:10:00,0.614079,0.922207,-1.099401,0.298238
2000-01-01 00:15:00,1.326386,1.849264,-0.694568,1.849264
2000-01-01 00:20:00,0.672295,0.672295,-0.769916,-0.674333
2000-01-01 00:25:00,0.031831,0.676433,-0.635846,-0.208299
2000-01-01 00:30:00,0.396007,0.439392,-1.491258,0.166673
2000-01-01 00:35:00,0.635031,2.383145,-0.912822,1.117016
2000-01-01 00:40:00,-1.315907,1.713343,-1.315907,-0.744755
2000-01-01 00:45:00,-0.826439,1.126636,-1.079932,-1.079932


In [29]:
ts.resample('30s').ffill().head(20)

2000-01-01 00:00:00    1.922942
2000-01-01 00:00:30    1.922942
2000-01-01 00:01:00    1.480515
2000-01-01 00:01:30    1.480515
2000-01-01 00:02:00    1.867559
2000-01-01 00:02:30    1.867559
2000-01-01 00:03:00    0.906045
2000-01-01 00:03:30    0.906045
2000-01-01 00:04:00   -0.861226
2000-01-01 00:04:30   -0.861226
2000-01-01 00:05:00    1.910065
2000-01-01 00:05:30    1.910065
2000-01-01 00:06:00   -0.268003
2000-01-01 00:06:30   -0.268003
2000-01-01 00:07:00    0.802456
2000-01-01 00:07:30    0.802456
2000-01-01 00:08:00    0.947252
2000-01-01 00:08:30    0.947252
2000-01-01 00:09:00   -0.155010
2000-01-01 00:09:30   -0.155010
Freq: 30S, dtype: float64

In [30]:
ts.resample('30s').bfill().head(20)

2000-01-01 00:00:00    1.922942
2000-01-01 00:00:30    1.480515
2000-01-01 00:01:00    1.480515
2000-01-01 00:01:30    1.867559
2000-01-01 00:02:00    1.867559
2000-01-01 00:02:30    0.906045
2000-01-01 00:03:00    0.906045
2000-01-01 00:03:30   -0.861226
2000-01-01 00:04:00   -0.861226
2000-01-01 00:04:30    1.910065
2000-01-01 00:05:00    1.910065
2000-01-01 00:05:30   -0.268003
2000-01-01 00:06:00   -0.268003
2000-01-01 00:06:30    0.802456
2000-01-01 00:07:00    0.802456
2000-01-01 00:07:30    0.947252
2000-01-01 00:08:00    0.947252
2000-01-01 00:08:30   -0.155010
2000-01-01 00:09:00   -0.155010
2000-01-01 00:09:30    0.614079
Freq: 30S, dtype: float64