# Hands on timeseries analysis
### part1 : Understanding python datetime library

<div style="text-align: right"> <b>Author : Kwang Myung Yu</b></div> 

<div style="text-align: right"> Initial upload: 2020.07.20 </div> 
<div style="text-align: right"> Last update: 2024.10.30 </div> 

This is a basic tutorial for timesseries analysis. In this tutorial, you will learn how to read, modify, visualize, and analyze timeseries data.

### 0. Library import

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
import warnings; warnings.filterwarnings('ignore')
plt.style.use('ggplot')
%matplotlib inline

### 1. datetime library 기본

날짜정보 객체 만들기

In [2]:
my_birthday = datetime.datetime(1980, 5, 28)

In [3]:
from datetime import datetime

In [4]:
my_birthday = datetime(1980, 5, 28)

In [5]:
my_birthday

datetime.datetime(1980, 5, 28, 0, 0)

데이터 타입 확인하기

In [6]:
type(my_birthday)

datetime.datetime

현재 날짜와 시간 구하기 1

In [7]:
today = datetime.today()

In [8]:
today

datetime.datetime(2024, 10, 30, 8, 38, 59, 979298)

- 년, 월, 일, 시, 분, 초 순서이다.

현재 날짜와 시간 구하기 2

In [9]:
today = datetime.now()
today

datetime.datetime(2024, 10, 30, 8, 38, 59, 987157)

In [10]:
print(today)

2024-10-30 08:38:59.987157


- 연-월-일 시:분:초 형태로 표현되는 것을 알 수있다.

### 2. datetime library 응용

strftime : 날짜정보를 문자열로 출력하기

In [11]:
today = today.strftime('%Y-%m-%d')
print(today)

2024-10-30


In [12]:
type(today)

str

In [13]:
today = datetime.now()
nowTime = today.strftime('%H:%M:%S')
print(nowTime)

08:39:00


In [14]:
today = datetime.now()
todayTime = today.strftime('%Y-%m-%d %H:%M:%S')
print(todayTime)

2024-10-30 08:39:00


strptime : 문자열을 날짜정보로 변환하기

In [15]:
my_birthday = '1980-05-28 14:30:00'
my_birthday = datetime.strptime(my_birthday, '%Y-%m-%d %H:%M:%S')
print(type(my_birthday))
print(my_birthday)

<class 'datetime.datetime'>
1980-05-28 14:30:00


날짜, 시간 정보에만 인덱싱하기

In [16]:
today = datetime.now()
print(today)

2024-10-30 08:39:00.039145


In [17]:
date = today.date()
print(date)

2024-10-30


In [18]:
nowTime = today.time()
print(nowTime)

08:39:00.039145


In [19]:
# 날짜와 시간 합치기
dateTime = datetime.combine(date, nowTime)
print(dateTime)

2024-10-30 08:39:00.039145


년(year), 월(month), 일(day) 정보를 인덱싱하기

In [20]:
today = datetime.now()
print(today)

2024-10-30 08:39:00.053950


In [21]:
print(today.year)
print(today.month)
print(today.day)

2024
10
30


날짜 연산

In [22]:
from datetime import timedelta

In [23]:
today = datetime.now().date()
print(today)

2024-10-30


In [24]:
tomorrow = today + timedelta(days=1)
print(tomorrow)

2024-10-31


- days 대신에 사용할 수 있는 인자 값 : weeks, hours, minutes, seconds 등

### 3. datetime index  
pandas 데이터프레임 시계열 데이터의 인덱스를 datetime으로 설정하면 분석이 용이하다.

In [25]:
dates = [datetime(2020, 7, 19), datetime(2020, 7, 19)+timedelta(days= 1)]

In [26]:
dates

[datetime.datetime(2020, 7, 19, 0, 0), datetime.datetime(2020, 7, 20, 0, 0)]

pandas Datetime index 만들기

In [27]:
dt_index = pd.DatetimeIndex(dates)

In [28]:
dt_index

DatetimeIndex(['2020-07-19', '2020-07-20'], dtype='datetime64[ns]', freq=None)

특정 구간을 DatetimeIndex로 만들기

In [29]:
dates = [datetime(2020, 7, 19), datetime(2020, 7, 19)+timedelta(days= 20)]
dates

[datetime.datetime(2020, 7, 19, 0, 0), datetime.datetime(2020, 8, 8, 0, 0)]

In [30]:
dt_index = pd.date_range(dates[0], dates[1], freq='D')

In [31]:
dt_index

DatetimeIndex(['2020-07-19', '2020-07-20', '2020-07-21', '2020-07-22',
               '2020-07-23', '2020-07-24', '2020-07-25', '2020-07-26',
               '2020-07-27', '2020-07-28', '2020-07-29', '2020-07-30',
               '2020-07-31', '2020-08-01', '2020-08-02', '2020-08-03',
               '2020-08-04', '2020-08-05', '2020-08-06', '2020-08-07',
               '2020-08-08'],
              dtype='datetime64[ns]', freq='D')

freq 관련 주요 인자는 다음과 같다.
- s: 초
- T: 분
- H: 시간
- D: 일(day)
- B: 주말이 아닌 평일
- W: 주(일요일)
- W-MON: 주(월요일)
- M: 각 달(month)의 마지막 날
- MS: 각 달의 첫날
- BM: 주말이 아닌 평일 중에서 각 달의 마지막 날
- BMS: 주말이 아닌 평일 중에서 각 달의 첫날
- WOM-2THU: 각 달의 두번째 목요일
- Q-JAN: 각 분기의 첫달의 마지막 날
- Q-DEC: 각 분기의 마지막 달의 마지막 날

### 4. 시계열 데이터셋 만들어보기

In [32]:
dates = [datetime(2020, 7, 1), datetime(2020, 7, 1)+timedelta(days= 30)]
dt_index = pd.date_range(dates[0], dates[1], freq='D')

Series 만들기

In [33]:
series = pd.Series(np.random.randn(len(dt_index)), index=dt_index)

In [34]:
series.head()

2020-07-01   -0.212141
2020-07-02    0.813211
2020-07-03   -0.561535
2020-07-04    2.299846
2020-07-05   -1.986172
Freq: D, dtype: float64

Dataframe 만들기

In [35]:
columns = ['A', 'B', 'C', 'D']
data = np.random.randn(len(dt_index), len(columns))

In [36]:
df = pd.DataFrame(data = data, index=dt_index, columns=columns)

In [37]:
df.head()

Unnamed: 0,A,B,C,D
2020-07-01,0.013541,-1.628359,-0.988472,-0.964649
2020-07-02,0.544616,0.45926,-1.824049,0.340942
2020-07-03,1.272145,0.459343,-0.808619,1.343583
2020-07-04,0.069917,-0.9595,-0.580901,-1.421207
2020-07-05,0.171612,-1.525901,-0.250458,1.316084


Dataframe 살펴보기

In [38]:
df.index

DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
               '2020-07-05', '2020-07-06', '2020-07-07', '2020-07-08',
               '2020-07-09', '2020-07-10', '2020-07-11', '2020-07-12',
               '2020-07-13', '2020-07-14', '2020-07-15', '2020-07-16',
               '2020-07-17', '2020-07-18', '2020-07-19', '2020-07-20',
               '2020-07-21', '2020-07-22', '2020-07-23', '2020-07-24',
               '2020-07-25', '2020-07-26', '2020-07-27', '2020-07-28',
               '2020-07-29', '2020-07-30', '2020-07-31'],
              dtype='datetime64[ns]', freq='D')

In [39]:
df.index.min()

Timestamp('2020-07-01 00:00:00')

In [40]:
df.index.max()

Timestamp('2020-07-31 00:00:00')