## 时间日期

* 时间戳 timestamp：固定的时刻 -> pd.Timestamp
* 固定时期 period：比如 2016年3月份，再如2015年销售额 -> pd.Period
* 时间间隔 interval：由起始时间和结束时间来表示，固定时期是时间间隔的一个特殊

**时间日期在 Pandas 里的作用**

* 分析金融数据，如股票交易数据
* 分析服务器日志

### Python datetime

python 标准库里提供了时间日期的处理。这个是时间日期的基础。

In [82]:
from datetime import datetime
from datetime import timedelta
import pandas as pd
import numpy as np
import os

In [2]:
now = datetime.now()
now

datetime.datetime(2020, 5, 13, 23, 55, 12, 756768)

In [3]:
now.year,now.month,now.day

(2020, 5, 13)

### 时间差

In [4]:
date1 = datetime(2016,3,20)
date2 = datetime(2020,3,20)
delta = date2 - date1
delta

datetime.timedelta(days=1461)

In [5]:
delta.days

1461

In [6]:
delta.total_seconds()

126230400.0

In [7]:
date2 + delta

datetime.datetime(2024, 3, 20, 0, 0)

In [8]:
date2 + timedelta(5.2)

datetime.datetime(2020, 3, 25, 4, 48)

### 字符串和 datetime 转换

In [9]:
date = datetime(2016,3,20,8,30)
date

datetime.datetime(2016, 3, 20, 8, 30)

In [10]:
str(date)

'2016-03-20 08:30:00'

In [11]:
str_vle = date.strftime('%Y-%m-%d %H:%M:%s')
str_vle_time = str_vle[:19]
str_vle_time

'2016-03-20 08:30:14'

In [12]:
datetime.strptime(str_vle_time,'%Y-%m-%d %H:%M:%S')
# datetime.strptime('2016-03-20 09:30', '%Y-%m-%d %H:%M')

datetime.datetime(2016, 3, 20, 8, 30, 14)

## Pandas 里的时间序列
Pandas 里使用 Timestamp 来表达时间

In [13]:
dates = [datetime(2016, 3, 1), datetime(2016, 3, 2), datetime(2016, 3, 3), datetime(2016, 3, 4)]
s = pd.Series(np.random.randn(4),index=dates)
s

2016-03-01    1.471523
2016-03-02    1.029107
2016-03-03   -0.338832
2016-03-04   -1.333642
dtype: float64

In [14]:
type(s.index)

pandas.core.indexes.datetimes.DatetimeIndex

In [15]:
type(s.index[0])

pandas._libs.tslibs.timestamps.Timestamp

## 日期范围

### 生成日期范围

In [16]:
pd.date_range('20150303','20150331')

DatetimeIndex(['2015-03-03', '2015-03-04', '2015-03-05', '2015-03-06',
               '2015-03-07', '2015-03-08', '2015-03-09', '2015-03-10',
               '2015-03-11', '2015-03-12', '2015-03-13', '2015-03-14',
               '2015-03-15', '2015-03-16', '2015-03-17', '2015-03-18',
               '2015-03-19', '2015-03-20', '2015-03-21', '2015-03-22',
               '2015-03-23', '2015-03-24', '2015-03-25', '2015-03-26',
               '2015-03-27', '2015-03-28', '2015-03-29', '2015-03-30',
               '2015-03-31'],
              dtype='datetime64[ns]', freq='D')

In [17]:
pd.date_range(start='20160320',periods=10)

DatetimeIndex(['2016-03-20', '2016-03-21', '2016-03-22', '2016-03-23',
               '2016-03-24', '2016-03-25', '2016-03-26', '2016-03-27',
               '2016-03-28', '2016-03-29'],
              dtype='datetime64[ns]', freq='D')

In [18]:
## 规则化时间戳
pd.date_range(start='2015-03-03 15:33:12',periods=10,normalize=True)

DatetimeIndex(['2015-03-03', '2015-03-04', '2015-03-05', '2015-03-06',
               '2015-03-07', '2015-03-08', '2015-03-09', '2015-03-10',
               '2015-03-11', '2015-03-12'],
              dtype='datetime64[ns]', freq='D')

### 时间频率

In [19]:
pd.date_range(start='2015-02-01',periods=10,freq='W')

DatetimeIndex(['2015-02-01', '2015-02-08', '2015-02-15', '2015-02-22',
               '2015-03-01', '2015-03-08', '2015-03-15', '2015-03-22',
               '2015-03-29', '2015-04-05'],
              dtype='datetime64[ns]', freq='W-SUN')

In [20]:
pd.date_range(start='2015-02-01',periods=10,freq='M')


DatetimeIndex(['2015-02-28', '2015-03-31', '2015-04-30', '2015-05-31',
               '2015-06-30', '2015-07-31', '2015-08-31', '2015-09-30',
               '2015-10-31', '2015-11-30'],
              dtype='datetime64[ns]', freq='M')

In [21]:
# 每个月最后一个工作日组成的索引
pd.date_range(start='2015-02-01',periods=10,freq='BM')


DatetimeIndex(['2015-02-27', '2015-03-31', '2015-04-30', '2015-05-29',
               '2015-06-30', '2015-07-31', '2015-08-31', '2015-09-30',
               '2015-10-30', '2015-11-30'],
              dtype='datetime64[ns]', freq='BM')

In [22]:
# hour
pd.date_range(start='2015-02-01',periods=10,freq='4H')

DatetimeIndex(['2015-02-01 00:00:00', '2015-02-01 04:00:00',
               '2015-02-01 08:00:00', '2015-02-01 12:00:00',
               '2015-02-01 16:00:00', '2015-02-01 20:00:00',
               '2015-02-02 00:00:00', '2015-02-02 04:00:00',
               '2015-02-02 08:00:00', '2015-02-02 12:00:00'],
              dtype='datetime64[ns]', freq='4H')

## 时期及算术运算
pd.Period 表示时期，比如几日，月或几个月等。比如用来统计每个月的销售额，就可以用时期作为单位。

In [23]:
p1 = pd.Period(2010)
p1

Period('2010', 'A-DEC')

In [24]:
p2 = p1 + 2
p2

Period('2012', 'A-DEC')

In [25]:
p2 - p1

<2 * YearEnds: month=12>

In [26]:
p1 = pd.Period(2016,freq='M')
p1

Period('2016-01', 'M')

In [27]:
p1 + 4

Period('2016-05', 'M')

### 时期序列

In [28]:
pd.period_range(start='2015-01',periods=12,freq='M')

PeriodIndex(['2015-01', '2015-02', '2015-03', '2015-04', '2015-05', '2015-06',
             '2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12'],
            dtype='period[M]', freq='M')

In [29]:
pd.period_range(start='2015-01',end='2016-01',freq='M')

PeriodIndex(['2015-01', '2015-02', '2015-03', '2015-04', '2015-05', '2015-06',
             '2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12',
             '2016-01'],
            dtype='period[M]', freq='M')

In [30]:
# 直接用字符串
index = pd.PeriodIndex(['2015Q1','2015Q2','2015Q3','2015Q4'],freq='Q-DEC')
index

PeriodIndex(['2015Q1', '2015Q2', '2015Q3', '2015Q4'], dtype='period[Q-DEC]', freq='Q-DEC')

### 时期的频率转换
asfreq

* A-DEC: 以 12 月份作为结束的年时期
* A-NOV: 以 11 月份作为结束的年时期
* Q-DEC: 以 12 月份作为结束的季度时期

In [31]:
p = pd.Period('2015',freq='A-DEC')
p

Period('2015', 'A-DEC')

In [32]:
p.asfreq('M',how='start')

Period('2015-01', 'M')

In [33]:
p.asfreq('M',how='end')

Period('2015-12', 'M')

In [34]:
p.asfreq('A-DEC')

Period('2015', 'A-DEC')

In [35]:
# 以年为周期，以一年中的 3 月份作为年的结束（财年）
p.asfreq('A-MAR')

Period('2016', 'A-MAR')

### 季度时间频率
Pandas 支持 12 种季度型频率，从 Q-JAN 到 Q-DEC

In [36]:
p = pd.Period('2016Q4',freq='Q-JAN')
p

Period('2016Q4', 'Q-JAN')

In [37]:
# 以 1 月份结束的财年中，2016Q4 的时期是指 2015-11-1 到 2016-1-31
p.asfreq('D', how='start'), p.asfreq('D', how='end')

(Period('2015-11-01', 'D'), Period('2016-01-31', 'D'))

In [38]:
# 获取该季度倒数第二个工作日下午4点的时间戳
p4pm = (p.asfreq('B', how='end') - 1).asfreq('T', 'start') + 16 * 60
p4pm

Period('2016-01-28 16:00', 'T')

In [39]:
# 转换为 timestamp
p4pm.to_timestamp()


Timestamp('2016-01-28 16:00:00')

### Timestamp 和 Period 相互转换

In [40]:
ts = pd.Series(np.random.randn(5), index = pd.date_range('2016-01-01', periods=5, freq='M'))
ts

2016-01-31    0.262207
2016-02-29    0.196236
2016-03-31   -0.652523
2016-04-30    0.640330
2016-05-31   -0.246301
Freq: M, dtype: float64

In [41]:
ts.to_period()

2016-01    0.262207
2016-02    0.196236
2016-03   -0.652523
2016-04    0.640330
2016-05   -0.246301
Freq: M, dtype: float64

In [42]:
ts = pd.Series(np.random.randn(5), index = pd.date_range('2016-12-29', periods=5, freq='D'))
ts

2016-12-29    1.876762
2016-12-30   -1.857878
2016-12-31    0.480146
2017-01-01    0.614659
2017-01-02    1.917813
Freq: D, dtype: float64

In [43]:
pts = ts.to_period(freq='M')
pts

2016-12    1.876762
2016-12   -1.857878
2016-12    0.480146
2017-01    0.614659
2017-01    1.917813
Freq: M, dtype: float64

In [44]:
pts.groupby(by=pts.index).sum()
# pts.groupby(level=0).sum()"

2016-12    0.499029
2017-01    2.532472
Freq: M, dtype: float64

### 重采样

* 高频率 -> 低频率 -> 降采样：5 分钟股票交易数据转换为日交易数据
* 低频率 -> 高频率 -> 升采样
* 其他重采样：每周三 (W-WED) 转换为每周五 (W-FRI)


In [45]:
ts = pd.Series(np.random.randint(0, 50, 60), index=pd.date_range('2016-04-25 09:30', periods=60, freq='T'))
ts

2016-04-25 09:30:00    32
2016-04-25 09:31:00    37
2016-04-25 09:32:00     3
2016-04-25 09:33:00    46
2016-04-25 09:34:00    41
2016-04-25 09:35:00    43
2016-04-25 09:36:00     6
2016-04-25 09:37:00    37
2016-04-25 09:38:00    30
2016-04-25 09:39:00    42
2016-04-25 09:40:00    42
2016-04-25 09:41:00    15
2016-04-25 09:42:00    32
2016-04-25 09:43:00     7
2016-04-25 09:44:00    24
2016-04-25 09:45:00    45
2016-04-25 09:46:00     7
2016-04-25 09:47:00    17
2016-04-25 09:48:00    13
2016-04-25 09:49:00    34
2016-04-25 09:50:00    25
2016-04-25 09:51:00     4
2016-04-25 09:52:00    26
2016-04-25 09:53:00    31
2016-04-25 09:54:00    24
2016-04-25 09:55:00    41
2016-04-25 09:56:00    23
2016-04-25 09:57:00     5
2016-04-25 09:58:00    24
2016-04-25 09:59:00    18
2016-04-25 10:00:00    40
2016-04-25 10:01:00    40
2016-04-25 10:02:00     3
2016-04-25 10:03:00    47
2016-04-25 10:04:00    33
2016-04-25 10:05:00    14
2016-04-25 10:06:00     3
2016-04-25 10:07:00     6
2016-04-25 1

In [46]:
# 0-4 分钟为第一组
ts.resample('5min').sum()

2016-04-25 09:30:00    159
2016-04-25 09:35:00    158
2016-04-25 09:40:00    120
2016-04-25 09:45:00    116
2016-04-25 09:50:00    110
2016-04-25 09:55:00    111
2016-04-25 10:00:00    163
2016-04-25 10:05:00    106
2016-04-25 10:10:00    125
2016-04-25 10:15:00    107
2016-04-25 10:20:00    163
2016-04-25 10:25:00    218
Freq: 5T, dtype: int64

In [47]:
# 0-4 分钟为第一组
# label右侧时间（结束时间）
ts.resample('5min',label='right').sum()

2016-04-25 09:35:00    159
2016-04-25 09:40:00    158
2016-04-25 09:45:00    120
2016-04-25 09:50:00    116
2016-04-25 09:55:00    110
2016-04-25 10:00:00    111
2016-04-25 10:05:00    163
2016-04-25 10:10:00    106
2016-04-25 10:15:00    125
2016-04-25 10:20:00    107
2016-04-25 10:25:00    163
2016-04-25 10:30:00    218
Freq: 5T, dtype: int64

### OHLC 重采样

金融数据专用：Open/High/Low/Close

In [48]:
ts.resample('5min').ohlc()

Unnamed: 0,open,high,low,close
2016-04-25 09:30:00,32,46,3,41
2016-04-25 09:35:00,43,43,6,42
2016-04-25 09:40:00,42,42,7,24
2016-04-25 09:45:00,45,45,7,34
2016-04-25 09:50:00,25,31,4,24
2016-04-25 09:55:00,41,41,5,18
2016-04-25 10:00:00,40,47,3,33
2016-04-25 10:05:00,14,46,3,46
2016-04-25 10:10:00,35,44,3,44
2016-04-25 10:15:00,31,41,5,5


In [49]:
### 通过 groupby 重采样
ts = pd.Series(np.random.randint(0, 50, 100), index=pd.date_range('2016-03-01', periods=100, freq='D'))
ts

2016-03-01    36
2016-03-02    27
2016-03-03    28
2016-03-04    12
2016-03-05    34
              ..
2016-06-04    33
2016-06-05    34
2016-06-06    38
2016-06-07     0
2016-06-08    29
Freq: D, Length: 100, dtype: int64

In [50]:
# x == ts.index.month
ts.groupby(lambda x: x.month).sum()

3    835
4    699
5    875
6    253
dtype: int64

In [51]:
ts.groupby(ts.index.to_period('M')).sum()
# ts.groupby(ts.index.to_period('M')).sum().index

2016-03    835
2016-04    699
2016-05    875
2016-06    253
Freq: M, dtype: int64

### 升采样和插值

In [56]:
# 以周为单位，每周五采样
df = pd.DataFrame(np.random.randint(1, 50, 2), index=pd.date_range('2016-04-22', periods=2, freq='W-FRI'))
df

Unnamed: 0,0
2016-04-22,13
2016-04-29,29


In [57]:
df.resample('D', fill_method='ffill', limit=3)

TypeError: resample() got an unexpected keyword argument 'fill_method'

In [62]:
# df.resample('W-MON', fill_method='ffill',limit=3)
df.resample('D').ffill(3)

Unnamed: 0,0
2016-04-22,13.0
2016-04-23,13.0
2016-04-24,13.0
2016-04-25,13.0
2016-04-26,
2016-04-27,
2016-04-28,
2016-04-29,29.0


In [63]:
# 以周为单位，每周一采样
# df.resample('W-MON', fill_method='ffill')
df.resample('W-MON').ffill()

Unnamed: 0,0
2016-04-25,13
2016-05-02,29


In [66]:
df.resample('W-MON')

<pandas.core.resample.DatetimeIndexResampler object at 0x11779b310>

### 时期重采样

In [68]:
df = pd.DataFrame(np.random.randint(2, 30, (24, 4)),
                  index=pd.period_range('2015-01', '2016-12', freq='M'),
                  columns=list('ABCD'))
df


Unnamed: 0,A,B,C,D
2015-01,8,22,17,17
2015-02,2,23,21,5
2015-03,23,26,15,9
2015-04,22,3,26,13
2015-05,9,18,14,19
2015-06,10,16,25,4
2015-07,9,16,28,2
2015-08,5,24,18,7
2015-09,29,12,19,7
2015-10,23,4,3,21


In [70]:
adf = df.resample('A-DEC').mean()
adf

Unnamed: 0,A,B,C,D
2015,15.833333,16.833333,17.75,12.333333
2016,18.833333,15.833333,16.333333,13.25


In [71]:
df.resample('A-MAY').mean()

Unnamed: 0,A,B,C,D
2015,12.8,18.4,18.6,12.6
2016,18.416667,17.0,17.25,12.416667
2017,18.714286,13.714286,15.571429,13.571429


In [73]:
# 升采样
adf.resample('Q-DEC').ffill()

Unnamed: 0,A,B,C,D
2015Q1,15.833333,16.833333,17.75,12.333333
2015Q2,15.833333,16.833333,17.75,12.333333
2015Q3,15.833333,16.833333,17.75,12.333333
2015Q4,15.833333,16.833333,17.75,12.333333
2016Q1,18.833333,15.833333,16.333333,13.25
2016Q2,18.833333,15.833333,16.333333,13.25
2016Q3,18.833333,15.833333,16.333333,13.25
2016Q4,18.833333,15.833333,16.333333,13.25


### 性能

In [74]:
n = 10000
ts = pd.Series(np.random.randn(n),index=pd.date_range('2020-01-01',periods=n,freq='10ms'))
len(ts)


10000

In [75]:
%timeit ts.resample('10min').ohlc()

2.27 ms ± 536 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [76]:
ts.resample('D').ohlc()

Unnamed: 0,open,high,low,close
2020-01-01,0.177078,3.91372,-4.219765,1.011024


## 从文件中读取日期序列

In [85]:
os.chdir(r'/Users/joey/PycharmProjects/mywork')

In [86]:
df = pd.read_csv('Pandas/002001.csv', index_col='Date')
df

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2015-12-22,16.86,17.13,16.48,16.95,13519900,16.95
2015-12-21,16.31,17.0,16.2,16.85,14132200,16.85
2015-12-18,16.59,16.7,16.21,16.31,10524300,16.31
2015-12-17,16.28,16.75,16.16,16.6,12326500,16.6
2015-12-16,16.23,16.42,16.05,16.28,8026000,16.28
2015-12-15,16.06,16.31,15.95,16.18,6647500,16.18
2015-12-14,15.6,16.06,15.45,16.06,8355200,16.06
2015-12-11,15.5,15.8,15.41,15.62,7243500,15.62
2015-12-10,15.99,16.05,15.51,15.56,7654900,15.56
2015-12-09,16.0,16.19,15.8,15.83,7926900,15.83


In [87]:
df.index

Index(['2015-12-22', '2015-12-21', '2015-12-18', '2015-12-17', '2015-12-16',
       '2015-12-15', '2015-12-14', '2015-12-11', '2015-12-10', '2015-12-09',
       '2015-12-08', '2015-12-07', '2015-12-04', '2015-12-03', '2015-12-02',
       '2015-12-01', '2015-11-30', '2015-11-27', '2015-11-26', '2015-11-25',
       '2015-11-24', '2015-11-23', '2015-11-20', '2015-11-19', '2015-11-18',
       '2015-11-17', '2015-11-16', '2015-11-13', '2015-11-12', '2015-11-11',
       '2015-11-10', '2015-11-09', '2015-11-06', '2015-11-05', '2015-11-04',
       '2015-11-03', '2015-11-02', '2015-10-30', '2015-10-29', '2015-10-28',
       '2015-10-27', '2015-10-26', '2015-10-23', '2015-10-22', '2015-10-21',
       '2015-10-20', '2015-10-19', '2015-10-16', '2015-10-15', '2015-10-14',
       '2015-10-13', '2015-10-12', '2015-10-09', '2015-10-08', '2015-10-07',
       '2015-10-06', '2015-10-05', '2015-10-02', '2015-10-01'],
      dtype='object', name='Date')

In [88]:
df = pd.read_csv('Pandas/002001.csv', index_col='Date', parse_dates=True)
df.index

DatetimeIndex(['2015-12-22', '2015-12-21', '2015-12-18', '2015-12-17',
               '2015-12-16', '2015-12-15', '2015-12-14', '2015-12-11',
               '2015-12-10', '2015-12-09', '2015-12-08', '2015-12-07',
               '2015-12-04', '2015-12-03', '2015-12-02', '2015-12-01',
               '2015-11-30', '2015-11-27', '2015-11-26', '2015-11-25',
               '2015-11-24', '2015-11-23', '2015-11-20', '2015-11-19',
               '2015-11-18', '2015-11-17', '2015-11-16', '2015-11-13',
               '2015-11-12', '2015-11-11', '2015-11-10', '2015-11-09',
               '2015-11-06', '2015-11-05', '2015-11-04', '2015-11-03',
               '2015-11-02', '2015-10-30', '2015-10-29', '2015-10-28',
               '2015-10-27', '2015-10-26', '2015-10-23', '2015-10-22',
               '2015-10-21', '2015-10-20', '2015-10-19', '2015-10-16',
               '2015-10-15', '2015-10-14', '2015-10-13', '2015-10-12',
               '2015-10-09', '2015-10-08', '2015-10-07', '2015-10-06',
      

In [90]:
wdf = df['Adj Close'].resample('W-FRI').ohlc()
# wdf = df['Adj Close'].resample('W-FRI', how='ohlc')
wdf

Unnamed: 0_level_0,open,high,low,close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2015-10-02,13.41,13.41,13.41,13.41
2015-10-09,13.41,14.75,13.41,14.62
2015-10-16,15.3,15.3,14.73,15.25
2015-10-23,15.03,15.22,14.26,15.2
2015-10-30,15.18,15.3,15.02,15.22
2015-11-06,14.74,15.86,14.62,15.86
2015-11-13,16.02,16.59,15.95,15.95
2015-11-20,16.21,16.22,15.75,16.08
2015-11-27,16.05,16.94,15.54,15.54
2015-12-04,15.7,16.62,15.7,16.62


In [91]:
wdf['Volume'] = df['Volume'].resample('W-FRI').sum()
wdf



Unnamed: 0_level_0,open,high,low,close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2015-10-02,13.41,13.41,13.41,13.41,0
2015-10-09,13.41,14.75,13.41,14.62,42135700
2015-10-16,15.3,15.3,14.73,15.25,73234300
2015-10-23,15.03,15.22,14.26,15.2,69848500
2015-10-30,15.18,15.3,15.02,15.22,64253700
2015-11-06,14.74,15.86,14.62,15.86,67429500
2015-11-13,16.02,16.59,15.95,15.95,87379300
2015-11-20,16.21,16.22,15.75,16.08,41097000
2015-11-27,16.05,16.94,15.54,15.54,64976300
2015-12-04,15.7,16.62,15.7,16.62,53552100


### 自定义时间日期解析函数

In [92]:
def date_parser(s):
    s = '2016/' + s
    d = datetime.strptime(s,'%Y/%m/%d')
    return d
df = pd.read_csv('Pandas/custom_date.csv',parse_dates=True,index_col='Date',date_parser=date_parser)
df

Unnamed: 0_level_0,Price
Date,Unnamed: 1_level_1
2016-01-01,10.2
2016-01-02,10.4
2016-01-03,10.5
2016-01-04,10.8
2016-01-05,11.2
2016-01-06,10.6


In [93]:
df.index



DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
               '2016-01-05', '2016-01-06'],
              dtype='datetime64[ns]', name='Date', freq=None)