# Working with Time Series

## Dates and Times in Python


### Native Python dates and times: ``datetime`` and ``dateutil``

In [4]:
from dateutil import parser

In [5]:
date = parser.parse('2020-11-11')

In [6]:
date.strftime('%A')

'Wednesday'

In [7]:
date.strftime('%a')

'Wed'

In [8]:
date.weekday()

2

[strftime section](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) 

[datetime documentation](https://docs.python.org/3/library/datetime.html)

[dateutil's online documentation](http://labix.org/python-dateutil)


### Typed arrays of times: NumPy's ``datetime64``

기존의 파이썬 datetime 을 보강하기 위해,  date 의 array 도 처리할 수 있게 numpy 에서 64-bit 로 처리하도록 라이브러리를 강화했음.

In [9]:
import numpy as np

In [10]:
date = np.array('2020-11-11', dtype = np.datetime64)

In [11]:
date

array('2020-11-11', dtype='datetime64[D]')

In [12]:
date + np.arange(11+1)

array(['2020-11-11', '2020-11-12', '2020-11-13', '2020-11-14',
       '2020-11-15', '2020-11-16', '2020-11-17', '2020-11-18',
       '2020-11-19', '2020-11-20', '2020-11-21', '2020-11-22'],
      dtype='datetime64[D]')

In [13]:
np.datetime64('2020-10-22 13:23:58' , 'ns') # nano second로 ..

numpy.datetime64('2020-10-22T13:23:58.000000000')

|Code    | Meaning     | Time span (relative) | Time span (absolute)   |
|--------|-------------|----------------------|------------------------|
| ``Y``  | Year	       | ± 9.2e18 years       | [9.2e18 BC, 9.2e18 AD] |
| ``M``  | Month       | ± 7.6e17 years       | [7.6e17 BC, 7.6e17 AD] |
| ``W``  | Week	       | ± 1.7e17 years       | [1.7e17 BC, 1.7e17 AD] |
| ``D``  | Day         | ± 2.5e16 years       | [2.5e16 BC, 2.5e16 AD] |
| ``h``  | Hour        | ± 1.0e15 years       | [1.0e15 BC, 1.0e15 AD] |
| ``m``  | Minute      | ± 1.7e13 years       | [1.7e13 BC, 1.7e13 AD] |
| ``s``  | Second      | ± 2.9e12 years       | [ 2.9e9 BC, 2.9e9 AD]  |
| ``ms`` | Millisecond | ± 2.9e9 years        | [ 2.9e6 BC, 2.9e6 AD]  |
| ``us`` | Microsecond | ± 2.9e6 years        | [290301 BC, 294241 AD] |
| ``ns`` | Nanosecond  | ± 292 years          | [ 1678 AD, 2262 AD]    |
| ``ps`` | Picosecond  | ± 106 days           | [ 1969 AD, 1970 AD]    |
| ``fs`` | Femtosecond | ± 2.6 hours          | [ 1969 AD, 1970 AD]    |
| ``as`` | Attosecond  | ± 9.2 seconds        | [ 1969 AD, 1970 AD]    |

### Dates and times in pandas: best of both worlds



In [14]:
#판다스 날짜처리가 가장 유용

In [15]:
import pandas as pd

In [16]:
dates = ['2020, 1,4', '2020, 1,7', '2020, 1,8', '2020, 1,22']

In [17]:
idx = pd.to_datetime(dates)

In [18]:
idx[0].strftime('%A')

'Saturday'

In [19]:
date

array('2020-11-11', dtype='datetime64[D]')

In [20]:
date + pd.to_timedelta(np.arange(11+1), 'D') # timedelta 날짜 계산함수

DatetimeIndex(['2020-11-11', '2020-11-12', '2020-11-13', '2020-11-14',
               '2020-11-15', '2020-11-16', '2020-11-17', '2020-11-18',
               '2020-11-19', '2020-11-20', '2020-11-21', '2020-11-22'],
              dtype='datetime64[ns]', freq=None)

In [21]:
date + pd.to_timedelta(np.arange(11+1), 'm') # m = 매분으로 만들어주는거

DatetimeIndex(['2020-11-11 00:00:00', '2020-11-11 00:01:00',
               '2020-11-11 00:02:00', '2020-11-11 00:03:00',
               '2020-11-11 00:04:00', '2020-11-11 00:05:00',
               '2020-11-11 00:06:00', '2020-11-11 00:07:00',
               '2020-11-11 00:08:00', '2020-11-11 00:09:00',
               '2020-11-11 00:10:00', '2020-11-11 00:11:00'],
              dtype='datetime64[ns]', freq=None)

In [22]:
date + pd.to_timedelta(np.arange(11+1), 'W') 

DatetimeIndex(['2020-11-11', '2020-11-18', '2020-11-25', '2020-12-02',
               '2020-12-09', '2020-12-16', '2020-12-23', '2020-12-30',
               '2021-01-06', '2021-01-13', '2021-01-20', '2021-01-27'],
              dtype='datetime64[ns]', freq=None)

## Pandas Time Series: Indexing by Time



In [24]:
import pandas as pd

In [31]:
date_index = pd.DatetimeIndex(['2020-11-04', '2020-11-08', '2020-11-22','2021-01-01'])

In [41]:
data = pd.Series([0,1,2,3,4])

In [42]:
date['2020-11-04':]

IndexError: too many indices for array

In [33]:
date['2020-11-10': '2021-02-01']

IndexError: too many indices for array

In [40]:
data['2020']

NameError: name 'data' is not defined

### Regular sequences: ``pd.date_range()``



In [35]:
pd.date_range('2020-11-01', periods = 8, freq = 'H') 

DatetimeIndex(['2020-11-01 00:00:00', '2020-11-01 01:00:00',
               '2020-11-01 02:00:00', '2020-11-01 03:00:00',
               '2020-11-01 04:00:00', '2020-11-01 05:00:00',
               '2020-11-01 06:00:00', '2020-11-01 07:00:00'],
              dtype='datetime64[ns]', freq='H')

In [36]:
pd.date_range('2020-11-01', periods = 8, freq = 'M') 

DatetimeIndex(['2020-11-30', '2020-12-31', '2021-01-31', '2021-02-28',
               '2021-03-31', '2021-04-30', '2021-05-31', '2021-06-30'],
              dtype='datetime64[ns]', freq='M')

In [37]:
pd.date_range('2020-11-01', periods = 8, freq = 'MS') #MS 매월의 시작일로 세팅

DatetimeIndex(['2020-11-01', '2020-12-01', '2021-01-01', '2021-02-01',
               '2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01'],
              dtype='datetime64[ns]', freq='MS')

## Frequencies and Offsets

| Code   | Description         | Code   | Description          |
|--------|---------------------|--------|----------------------|
| ``D``  | Calendar day        | ``B``  | Business day         |
| ``W``  | Weekly              |        |                      |
| ``M``  | Month end           | ``BM`` | Business month end   |
| ``Q``  | Quarter end         | ``BQ`` | Business quarter end |
| ``A``  | Year end            | ``BA`` | Business year end    |
| ``H``  | Hours               | ``BH`` | Business hours       |
| ``T``  | Minutes             |        |                      |
| ``S``  | Seconds             |        |                      |
| ``L``  | Milliseonds         |        |                      |
| ``U``  | Microseconds        |        |                      |
| ``N``  | nanoseconds         |        |                      |

In [39]:
pd.timedelta_range(0, periods = 9, freq = '2H30T') #시간 단위로 세팅해서 인덱스로 사용할때 용이

TimedeltaIndex(['00:00:00', '02:30:00', '05:00:00', '07:30:00', '10:00:00',
                '12:30:00', '15:00:00', '17:30:00', '20:00:00'],
               dtype='timedelta64[ns]', freq='150T')