# 时间序列

- 时间序列（time series）数据是一种重要的结构化数据形式。在多个时间点观察或测量到的任何时间都可以形成一段时间序列。很多时间， 时间序列是固定频率的， 也就是说， 数据点是根据某种规律定期出现的（比如每15秒。。。。）。时间序列也可以是不定期的。时间序列数据的意义取决于具体的应用场景。主要由以下几种：
    - 时间戳（timestamp），特定的时刻。
    - 固定时期（period），如2007年1月或2010年全年。
    - 时间间隔（interval），由起始和结束时间戳表示。时期（period）可以被看做间隔（interval）的特例。


In [1]:
import time
# 时间戳，1970.01.01：00-00-00 到现在时刻的秒的偏移量
time.time()

1581477251.930363

## 1. 时间和日期数据类型及其工具：
- Python标准库包含用于日期（date）和时间（time）数据的数据类型，而且还有日历方面的功能。
- 主要会用到datetime、time以及calendar模块。
- datetime.datetime（也可以简写为datetime）是用得最多的数据类型

In [2]:
import numpy as np
import pandas as pd
from datetime import datetime

In [3]:
# 现在特定时刻
now = datetime.now()
now

datetime.datetime(2020, 2, 12, 11, 14, 12, 420623)

In [4]:
# 获取年、月、日
now.year, now.month, now.day

(2020, 2, 12)

In [5]:
# 时间间隔
d = datetime(2020, 2, 11) - datetime(2018, 9, 6, 10, 15, 30)
d

datetime.timedelta(522, 49470)

In [6]:
# 天数
d.days

522

In [7]:
# 秒数
d.seconds 

49470

In [8]:
# timedelta: 表示两个datetime之间的差（日，秒，毫秒）
from datetime import  timedelta

In [9]:
start = datetime(2020, 2, 11)

In [10]:
start + timedelta(12)*2

datetime.datetime(2020, 3, 6, 0, 0)

### 字符串和datetime的相互转换 
- datetime转换为字符串
    - datetime().strftime('日期时间格式化')
- 字符串转换为datetime
    - datetime.strptime(字符串, '日期时间格式化')
    - parse(字符串), 需导入包

In [11]:
stamp = datetime(2020, 2, 11, 10, 4, 30)

In [12]:
# 直接 str(日期时间)
str(stamp)

'2020-02-11 10:04:30'

In [13]:
# .strftime('格式化')
stamp.strftime('%Y-%m-%d %H:%M:%S')

'2020-02-11 10:04:30'

In [14]:
v = '2018-9-6'

In [15]:
datetime.strptime(v,'%Y-%m-%d')

datetime.datetime(2018, 9, 6, 0, 0)

In [16]:
d = ['2/8/2017', '12/15/2016']

In [17]:
# 进行多个日期解析
[datetime.strptime(x, '%m/%d/%Y') for x in d] # 注意格式化的分隔要与字符串一致，用/时不能用- 

[datetime.datetime(2017, 2, 8, 0, 0), datetime.datetime(2016, 12, 15, 0, 0)]

In [18]:
# 可以直接将字符串转换为datetime，但需要导入包
from dateutil.parser import parse
parse('2011-01-03')

datetime.datetime(2011, 1, 3, 0, 0)

In [19]:
# 日在第一位
parse('6/12/2020', dayfirst=True)

datetime.datetime(2020, 12, 6, 0, 0)

### pandas通常是用于处理成组日期的，不管这些日期是DataFrame的轴索引还是列。
- to_datetime方法可以解析多种不同的日期表示形式。对标准日期格式（如ISO8601）的解析非常快

In [20]:
d = ['2018-9-6 12;15:30', '2016-6-30 8:20:40']

In [21]:
# 时间戳类型的索引！！！！！
pd.to_datetime(d)

DatetimeIndex(['2018-09-06 15:30:00', '2016-06-30 08:20:40'], dtype='datetime64[ns]', freq=None)

In [22]:
# NaT表示的是时间戳数据的nan值
idx = pd.to_datetime(d + [''])
idx

DatetimeIndex(['2018-09-06 15:30:00', '2016-06-30 08:20:40', 'NaT'], dtype='datetime64[ns]', freq=None)

In [23]:
pd.isnull(idx)

array([False, False,  True])

## 2. 时间序列基础
- pandas最基本的时间序列类型就是以时间戳（通常以Python字符串或datatime对象表示）为索引的Series

In [24]:
import numpy as np
import pandas as pd
from datetime import datetime

In [25]:
dates = [datetime(2020, 2, 11), datetime(2020, 2, 12),
         datetime(2020, 2, 13), datetime(2020, 2, 14),
         datetime(2020, 2, 15), datetime(2020, 2, 16)]

In [26]:
ts = pd.Series(np.random.randn(6), index=dates)
ts

2020-02-11    0.880966
2020-02-12    0.849202
2020-02-13    1.755187
2020-02-14    0.952199
2020-02-15   -0.060903
2020-02-16   -1.522155
dtype: float64

In [27]:
# 时间戳索引
# pandas用numpy的datetime64数据类型以纳秒形式存储
ts.index

DatetimeIndex(['2020-02-11', '2020-02-12', '2020-02-13', '2020-02-14',
               '2020-02-15', '2020-02-16'],
              dtype='datetime64[ns]', freq=None)

In [28]:
ts.index.dtype

dtype('<M8[ns]')

In [29]:
ts[::2]

2020-02-11    0.880966
2020-02-13    1.755187
2020-02-15   -0.060903
dtype: float64

In [30]:
ts + ts[::2]

2020-02-11    1.761932
2020-02-12         NaN
2020-02-13    3.510375
2020-02-14         NaN
2020-02-15   -0.121806
2020-02-16         NaN
dtype: float64

In [31]:
s = ts.index[0]
s

Timestamp('2020-02-11 00:00:00')

### 索引、选取、子集构造

In [32]:
stamp = ts.index[2]
ts[stamp]

1.7551874737226236

In [33]:
ts[ts.index[2]]

1.7551874737226236

In [34]:
ts['2/13/2020']

1.7551874737226236

In [35]:
ts['20200213']

1.7551874737226236

In [36]:
# pd.date_range('起始日期', preiods=时间跨度)
ts1 = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2020', periods=1000))
ts1

2020-01-01   -1.435394
2020-01-02   -0.269245
2020-01-03   -1.668743
2020-01-04    0.965735
2020-01-05   -0.126917
2020-01-06    1.324376
2020-01-07   -0.372500
2020-01-08    0.314928
2020-01-09    0.883962
2020-01-10    2.347653
2020-01-11    1.245779
2020-01-12    1.455056
2020-01-13    1.188898
2020-01-14   -1.206531
2020-01-15    0.664966
2020-01-16    1.450537
2020-01-17   -0.134413
2020-01-18    1.017629
2020-01-19    0.185370
2020-01-20    1.862067
2020-01-21   -0.541994
2020-01-22    0.302626
2020-01-23   -1.592782
2020-01-24    1.785318
2020-01-25   -0.066156
2020-01-26   -0.187910
2020-01-27   -0.111399
2020-01-28    0.351521
2020-01-29    0.223974
2020-01-30    0.641828
                ...   
2022-08-28   -2.563082
2022-08-29   -0.206080
2022-08-30    0.515714
2022-08-31   -0.407934
2022-09-01    0.037548
2022-09-02    0.990364
2022-09-03    0.087685
2022-09-04   -0.962819
2022-09-05   -0.179523
2022-09-06    1.063908
2022-09-07   -0.929042
2022-09-08   -1.697845
2022-09-09 

In [37]:
ts1['2020']

2020-01-01   -1.435394
2020-01-02   -0.269245
2020-01-03   -1.668743
2020-01-04    0.965735
2020-01-05   -0.126917
2020-01-06    1.324376
2020-01-07   -0.372500
2020-01-08    0.314928
2020-01-09    0.883962
2020-01-10    2.347653
2020-01-11    1.245779
2020-01-12    1.455056
2020-01-13    1.188898
2020-01-14   -1.206531
2020-01-15    0.664966
2020-01-16    1.450537
2020-01-17   -0.134413
2020-01-18    1.017629
2020-01-19    0.185370
2020-01-20    1.862067
2020-01-21   -0.541994
2020-01-22    0.302626
2020-01-23   -1.592782
2020-01-24    1.785318
2020-01-25   -0.066156
2020-01-26   -0.187910
2020-01-27   -0.111399
2020-01-28    0.351521
2020-01-29    0.223974
2020-01-30    0.641828
                ...   
2020-12-02    0.948579
2020-12-03   -1.133421
2020-12-04    0.583608
2020-12-05    0.563024
2020-12-06    0.383747
2020-12-07   -0.945768
2020-12-08    1.298131
2020-12-09   -1.165604
2020-12-10   -0.389600
2020-12-11   -0.113803
2020-12-12   -0.248254
2020-12-13    0.755189
2020-12-14 

In [38]:
ts1['2020 02']

2020-02-01    0.108266
2020-02-02    0.142087
2020-02-03   -0.497728
2020-02-04    0.348886
2020-02-05   -0.666528
2020-02-06    0.121098
2020-02-07    0.465156
2020-02-08   -0.128819
2020-02-09   -2.140887
2020-02-10    0.786518
2020-02-11   -0.851378
2020-02-12   -2.259813
2020-02-13   -0.012933
2020-02-14    1.068314
2020-02-15    0.146971
2020-02-16    1.110203
2020-02-17    0.901196
2020-02-18    0.626660
2020-02-19    1.851583
2020-02-20   -1.661206
2020-02-21    0.306660
2020-02-22    1.241001
2020-02-23   -0.500679
2020-02-24    1.345824
2020-02-25    1.257090
2020-02-26    0.240580
2020-02-27   -0.887878
2020-02-28    1.337424
2020-02-29    0.611826
Freq: D, dtype: float64

In [39]:
ts

2020-02-11    0.880966
2020-02-12    0.849202
2020-02-13    1.755187
2020-02-14    0.952199
2020-02-15   -0.060903
2020-02-16   -1.522155
dtype: float64

In [40]:
ts[datetime(2020, 2, 12):]

2020-02-12    0.849202
2020-02-13    1.755187
2020-02-14    0.952199
2020-02-15   -0.060903
2020-02-16   -1.522155
dtype: float64

In [41]:
ts['1/6/2020':'2020-02-14']

2020-02-11    0.880966
2020-02-12    0.849202
2020-02-13    1.755187
2020-02-14    0.952199
dtype: float64

In [42]:
# 可以直接通过时间戳索引修改原始数据
ts['1/6/2020':'2020-02-14'] = 1
ts

2020-02-11    1.000000
2020-02-12    1.000000
2020-02-13    1.000000
2020-02-14    1.000000
2020-02-15   -0.060903
2020-02-16   -1.522155
dtype: float64

In [43]:
# pd.truncate()截断数据
ts.truncate(after='20200213')

2020-02-11    1.0
2020-02-12    1.0
2020-02-13    1.0
dtype: float64

### 带有重复索引的时间序列

In [44]:
dates = pd.DatetimeIndex(['2/11/2020', '2/13/2020', '2/12/2020', '2/12/2020', '2/11/2020'])
dates

DatetimeIndex(['2020-02-11', '2020-02-13', '2020-02-12', '2020-02-12',
               '2020-02-11'],
              dtype='datetime64[ns]', freq=None)

In [45]:
ts2 = pd.Series(np.arange(5), index=dates)
ts2

2020-02-11    0
2020-02-13    1
2020-02-12    2
2020-02-12    3
2020-02-11    4
dtype: int64

In [46]:
ts2.index.is_unique

False

In [47]:
ts2['2/13/2020']

2020-02-13    1
dtype: int64

In [48]:
ts2['2/11/2020']

2020-02-11    0
2020-02-11    4
dtype: int64

In [49]:
# 对数据进行分组规整操作
ts2.groupby(level=0).mean()

2020-02-11    2.0
2020-02-12    2.5
2020-02-13    1.0
dtype: float64

In [50]:
ts2.groupby(level=0).count()

2020-02-11    2
2020-02-12    2
2020-02-13    1
dtype: int64

## 3. 日期的范围、频率以及移动
- pandas中的原生时间序列一般被认为是不规则的，也就是说，它们没有固定的频率。对于大部分应用程序而言，这是无所谓的
- 但是，它常常需要以某种相对固定的频率进行分析，比如每日、每月、每15分钟等（这样自然会在时间序列中引入缺失值）
- 幸运的是，pandas有一整套标准时间序列频率以及用于重采样、频率推断、生成固定频率日期范围的工具
- 例如，我们可以将之前那个时间序列转换为一个具有固定频率（每日）的时间序列，只需调用resample即可

### 生成日期范围
- pandas.date_range可用于根据指定的频率生成指定长度的DatetimeIndex
    - *pd.date_range(start=None, end=None, periods=None, freq='D')*

In [51]:
index = pd.date_range('2020-04-01', '2020-06-01')
index

DatetimeIndex(['2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04',
               '2020-04-05', '2020-04-06', '2020-04-07', '2020-04-08',
               '2020-04-09', '2020-04-10', '2020-04-11', '2020-04-12',
               '2020-04-13', '2020-04-14', '2020-04-15', '2020-04-16',
               '2020-04-17', '2020-04-18', '2020-04-19', '2020-04-20',
               '2020-04-21', '2020-04-22', '2020-04-23', '2020-04-24',
               '2020-04-25', '2020-04-26', '2020-04-27', '2020-04-28',
               '2020-04-29', '2020-04-30', '2020-05-01', '2020-05-02',
               '2020-05-03', '2020-05-04', '2020-05-05', '2020-05-06',
               '2020-05-07', '2020-05-08', '2020-05-09', '2020-05-10',
               '2020-05-11', '2020-05-12', '2020-05-13', '2020-05-14',
               '2020-05-15', '2020-05-16', '2020-05-17', '2020-05-18',
               '2020-05-19', '2020-05-20', '2020-05-21', '2020-05-22',
               '2020-05-23', '2020-05-24', '2020-05-25', '2020-05-26',
      

In [52]:
pd.date_range('2020-04-01', periods=32)

DatetimeIndex(['2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04',
               '2020-04-05', '2020-04-06', '2020-04-07', '2020-04-08',
               '2020-04-09', '2020-04-10', '2020-04-11', '2020-04-12',
               '2020-04-13', '2020-04-14', '2020-04-15', '2020-04-16',
               '2020-04-17', '2020-04-18', '2020-04-19', '2020-04-20',
               '2020-04-21', '2020-04-22', '2020-04-23', '2020-04-24',
               '2020-04-25', '2020-04-26', '2020-04-27', '2020-04-28',
               '2020-04-29', '2020-04-30', '2020-05-01', '2020-05-02'],
              dtype='datetime64[ns]', freq='D')

In [53]:
# 从终止日期往前推
pd.date_range(end='2020-04-01', periods=32)

DatetimeIndex(['2020-03-01', '2020-03-02', '2020-03-03', '2020-03-04',
               '2020-03-05', '2020-03-06', '2020-03-07', '2020-03-08',
               '2020-03-09', '2020-03-10', '2020-03-11', '2020-03-12',
               '2020-03-13', '2020-03-14', '2020-03-15', '2020-03-16',
               '2020-03-17', '2020-03-18', '2020-03-19', '2020-03-20',
               '2020-03-21', '2020-03-22', '2020-03-23', '2020-03-24',
               '2020-03-25', '2020-03-26', '2020-03-27', '2020-03-28',
               '2020-03-29', '2020-03-30', '2020-03-31', '2020-04-01'],
              dtype='datetime64[ns]', freq='D')

In [54]:
# 添加/更改频率freq
# 每个月最后一个工作日（不包含周六日）
pd.date_range('2020-01-01', '2020-12-31', freq='BM')

DatetimeIndex(['2020-01-31', '2020-02-28', '2020-03-31', '2020-04-30',
               '2020-05-29', '2020-06-30', '2020-07-31', '2020-08-31',
               '2020-09-30', '2020-10-30', '2020-11-30', '2020-12-31'],
              dtype='datetime64[ns]', freq='BM')

In [55]:
# 每月第一个工作日
pd.date_range('2020-01-01', '2020-12-31', freq='BMS')

DatetimeIndex(['2020-01-01', '2020-02-03', '2020-03-02', '2020-04-01',
               '2020-05-01', '2020-06-01', '2020-07-01', '2020-08-03',
               '2020-09-01', '2020-10-01', '2020-11-02', '2020-12-01'],
              dtype='datetime64[ns]', freq='BMS')

In [56]:
# 每个月最后一个日历日
pd.date_range('2020-01-01', '2020-12-31', freq='M')

DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30',
               '2020-05-31', '2020-06-30', '2020-07-31', '2020-08-31',
               '2020-09-30', '2020-10-31', '2020-11-30', '2020-12-31'],
              dtype='datetime64[ns]', freq='M')

In [57]:
# 每月第一个日历日
pd.date_range('2020-01-01', '2020-12-31', freq='MS')

DatetimeIndex(['2020-01-01', '2020-02-01', '2020-03-01', '2020-04-01',
               '2020-05-01', '2020-06-01', '2020-07-01', '2020-08-01',
               '2020-09-01', '2020-10-01', '2020-11-01', '2020-12-01'],
              dtype='datetime64[ns]', freq='MS')

In [58]:
# 每工作日
pd.date_range('2020-01-01', '2020-12-31', freq='B')

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-06',
               '2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10',
               '2020-01-13', '2020-01-14',
               ...
               '2020-12-18', '2020-12-21', '2020-12-22', '2020-12-23',
               '2020-12-24', '2020-12-25', '2020-12-28', '2020-12-29',
               '2020-12-30', '2020-12-31'],
              dtype='datetime64[ns]', length=262, freq='B')

### 频率和日期偏移量
- pandas中的频率是由一个基础频率（base frequency）和一个乘数组成的。
- 基础频率通常以一个字符串别名表示，比如"M"表示每月，"H"表示每小时。
- 对于每个基础频率，都有一个被称为日期偏移量（date offset）的对象与之对应。
- 例如，按小时计算的频率可以用Hour类表示

In [59]:
from pandas.tseries.offsets import Hour, Minute

In [60]:
hour = Hour()
hour

<Hour>

In [61]:
h = Hour(4)
h

<4 * Hours>

In [62]:
# 一般用法不需要导入上面的东东，直接用:
pd.date_range('2020-01-01', '2020-01-02', freq='4H')

DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 04:00:00',
               '2020-01-01 08:00:00', '2020-01-01 12:00:00',
               '2020-01-01 16:00:00', '2020-01-01 20:00:00',
               '2020-01-02 00:00:00'],
              dtype='datetime64[ns]', freq='4H')

In [63]:
pd.date_range('2020-01-01', '2020-01-02', freq='2h30min')

DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 02:30:00',
               '2020-01-01 05:00:00', '2020-01-01 07:30:00',
               '2020-01-01 10:00:00', '2020-01-01 12:30:00',
               '2020-01-01 15:00:00', '2020-01-01 17:30:00',
               '2020-01-01 20:00:00', '2020-01-01 22:30:00'],
              dtype='datetime64[ns]', freq='150T')

### 移动数据
- 移动（shifting）指的是沿着时间轴将数据前移或后移。
- Series和DataFrame都有一个shift方法用于执行单纯的前移或后移操作，保持索引不变

In [64]:
ts = pd.Series(np.random.randn(4),
                index=pd.date_range('1/1/2020', periods=4, freq='M'))
ts

2020-01-31    0.274858
2020-02-29    0.053229
2020-03-31   -1.145568
2020-04-30    0.342589
Freq: M, dtype: float64

In [65]:
# 数据向下偏移2个单位（时间戳索引不变）
ts.shift(2)

2020-01-31         NaN
2020-02-29         NaN
2020-03-31    0.274858
2020-04-30    0.053229
Freq: M, dtype: float64

In [66]:
# 向上偏移
ts.shift(-2)

2020-01-31   -1.145568
2020-02-29    0.342589
2020-03-31         NaN
2020-04-30         NaN
Freq: M, dtype: float64

In [67]:
# 时间戳也跟着偏移
ts.shift(2, freq='M')

2020-03-31    0.274858
2020-04-30    0.053229
2020-05-31   -1.145568
2020-06-30    0.342589
Freq: M, dtype: float64

## 4. 重采样（重点！！！）
- 重采样（resampling）指的是将时间序列从一个频率转换到另一个频率的处理过程。
- 将高频率数据聚合到低频率称为降采样（downsampling），而将低频率数据转换到高频率则称为升采样（upsampling）。
- 并不是所有的重采样都能被划分到这两个大类中。
- 例如，将W-WED（每周三）转换为W-FRI既不是降采样也不是升采样。

#### pandas对象都带有一个resample方法，它是各种频率转换工作的主力函数。resample有一个类似于groupby的API，调用resample可以分组数据，然后会调用一个聚合函数


In [68]:
t = pd.DataFrame(np.random.uniform(10,50,(100,1)),index=pd.date_range('20200101',periods=100))
t

Unnamed: 0,0
2020-01-01,12.391886
2020-01-02,49.212466
2020-01-03,17.241950
2020-01-04,44.753837
2020-01-05,15.209509
2020-01-06,15.268854
2020-01-07,30.787211
2020-01-08,49.341461
2020-01-09,33.526050
2020-01-10,48.464232


In [69]:
# resample
# 降采样
t.resample('M').mean()

Unnamed: 0,0
2020-01-31,32.351213
2020-02-29,33.987774
2020-03-31,29.728867
2020-04-30,21.824353


In [70]:
t.resample('10D').mean()

Unnamed: 0,0
2020-01-01,31.619746
2020-01-11,36.521644
2020-01-21,30.557369
2020-01-31,30.552109
2020-02-10,38.327768
2020-02-20,31.274669
2020-03-01,31.726373
2020-03-11,27.108589
2020-03-21,31.073818
2020-03-31,21.892625


In [71]:
t.resample('10D').count()

Unnamed: 0,0
2020-01-01,10
2020-01-11,10
2020-01-21,10
2020-01-31,10
2020-02-10,10
2020-02-20,10
2020-03-01,10
2020-03-11,10
2020-03-21,10
2020-03-31,10


In [72]:
# 升采样
frame = pd.DataFrame(np.random.randn(2, 4),
                    index=pd.date_range('1/1/2020', periods=2, freq='W-MON'),
                    columns=['上海', '北京', '深圳', '广州'])
frame

Unnamed: 0,上海,北京,深圳,广州
2020-01-06,0.063727,-0.337403,-0.154973,1.711486
2020-01-13,-0.488623,-0.263856,0.628482,-0.53242


In [73]:
# 转换成高频
frame.resample('D').asfreq()

Unnamed: 0,上海,北京,深圳,广州
2020-01-06,0.063727,-0.337403,-0.154973,1.711486
2020-01-07,,,,
2020-01-08,,,,
2020-01-09,,,,
2020-01-10,,,,
2020-01-11,,,,
2020-01-12,,,,
2020-01-13,-0.488623,-0.263856,0.628482,-0.53242


In [74]:
# 填充内容
frame.resample('D').ffill(2)

Unnamed: 0,上海,北京,深圳,广州
2020-01-06,0.063727,-0.337403,-0.154973,1.711486
2020-01-07,0.063727,-0.337403,-0.154973,1.711486
2020-01-08,0.063727,-0.337403,-0.154973,1.711486
2020-01-09,,,,
2020-01-10,,,,
2020-01-11,,,,
2020-01-12,,,,
2020-01-13,-0.488623,-0.263856,0.628482,-0.53242


# 应用案例

In [75]:
import numpy as np
import pandas as pd

In [76]:
df = pd.read_csv('911.csv')
df

Unnamed: 0,lat,lng,desc,zip,title,timeStamp,twp,addr,e
0,40.297876,-75.581294,REINDEER CT & DEAD END; NEW HANOVER; Station ...,19525.0,EMS: BACK PAINS/INJURY,2015-12-10 17:10:52,NEW HANOVER,REINDEER CT & DEAD END,1
1,40.258061,-75.264680,BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...,19446.0,EMS: DIABETIC EMERGENCY,2015-12-10 17:29:21,HATFIELD TOWNSHIP,BRIAR PATH & WHITEMARSH LN,1
2,40.121182,-75.351975,HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...,19401.0,Fire: GAS-ODOR/LEAK,2015-12-10 14:39:21,NORRISTOWN,HAWS AVE,1
3,40.116153,-75.343513,AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...,19401.0,EMS: CARDIAC EMERGENCY,2015-12-10 16:47:36,NORRISTOWN,AIRY ST & SWEDE ST,1
4,40.251492,-75.603350,CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...,,EMS: DIZZINESS,2015-12-10 16:56:52,LOWER POTTSGROVE,CHERRYWOOD CT & DEAD END,1
5,40.253473,-75.283245,CANNON AVE & W 9TH ST; LANSDALE; Station 345;...,19446.0,EMS: HEAD INJURY,2015-12-10 15:39:04,LANSDALE,CANNON AVE & W 9TH ST,1
6,40.182111,-75.127795,LAUREL AVE & OAKDALE AVE; HORSHAM; Station 35...,19044.0,EMS: NAUSEA/VOMITING,2015-12-10 16:46:48,HORSHAM,LAUREL AVE & OAKDALE AVE,1
7,40.217286,-75.405182,COLLEGEVILLE RD & LYWISKI RD; SKIPPACK; Stati...,19426.0,EMS: RESPIRATORY EMERGENCY,2015-12-10 16:17:05,SKIPPACK,COLLEGEVILLE RD & LYWISKI RD,1
8,40.289027,-75.399590,MAIN ST & OLD SUMNEYTOWN PIKE; LOWER SALFORD;...,19438.0,EMS: SYNCOPAL EPISODE,2015-12-10 16:51:42,LOWER SALFORD,MAIN ST & OLD SUMNEYTOWN PIKE,1
9,40.102398,-75.291458,BLUEROUTE & RAMP I476 NB TO CHEMICAL RD; PLYM...,19462.0,Traffic: VEHICLE ACCIDENT -,2015-12-10 17:35:41,PLYMOUTH,BLUEROUTE & RAMP I476 NB TO CHEMICAL RD,1


In [77]:
# 将字符串转化为时间戳类型
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
df.head()

Unnamed: 0,lat,lng,desc,zip,title,timeStamp,twp,addr,e
0,40.297876,-75.581294,REINDEER CT & DEAD END; NEW HANOVER; Station ...,19525.0,EMS: BACK PAINS/INJURY,2015-12-10 17:10:52,NEW HANOVER,REINDEER CT & DEAD END,1
1,40.258061,-75.26468,BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...,19446.0,EMS: DIABETIC EMERGENCY,2015-12-10 17:29:21,HATFIELD TOWNSHIP,BRIAR PATH & WHITEMARSH LN,1
2,40.121182,-75.351975,HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...,19401.0,Fire: GAS-ODOR/LEAK,2015-12-10 14:39:21,NORRISTOWN,HAWS AVE,1
3,40.116153,-75.343513,AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...,19401.0,EMS: CARDIAC EMERGENCY,2015-12-10 16:47:36,NORRISTOWN,AIRY ST & SWEDE ST,1
4,40.251492,-75.60335,CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...,,EMS: DIZZINESS,2015-12-10 16:56:52,LOWER POTTSGROVE,CHERRYWOOD CT & DEAD END,1


In [78]:
# 建立时间戳索引
df.set_index('timeStamp', inplace=True)
df.head()

Unnamed: 0_level_0,lat,lng,desc,zip,title,twp,addr,e
timeStamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2015-12-10 17:10:52,40.297876,-75.581294,REINDEER CT & DEAD END; NEW HANOVER; Station ...,19525.0,EMS: BACK PAINS/INJURY,NEW HANOVER,REINDEER CT & DEAD END,1
2015-12-10 17:29:21,40.258061,-75.26468,BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...,19446.0,EMS: DIABETIC EMERGENCY,HATFIELD TOWNSHIP,BRIAR PATH & WHITEMARSH LN,1
2015-12-10 14:39:21,40.121182,-75.351975,HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...,19401.0,Fire: GAS-ODOR/LEAK,NORRISTOWN,HAWS AVE,1
2015-12-10 16:47:36,40.116153,-75.343513,AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...,19401.0,EMS: CARDIAC EMERGENCY,NORRISTOWN,AIRY ST & SWEDE ST,1
2015-12-10 16:56:52,40.251492,-75.60335,CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...,,EMS: DIZZINESS,LOWER POTTSGROVE,CHERRYWOOD CT & DEAD END,1


In [79]:
# 选取某年里某个月的数据
df['2015-12']

Unnamed: 0_level_0,lat,lng,desc,zip,title,twp,addr,e
timeStamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2015-12-10 17:10:52,40.297876,-75.581294,REINDEER CT & DEAD END; NEW HANOVER; Station ...,19525.0,EMS: BACK PAINS/INJURY,NEW HANOVER,REINDEER CT & DEAD END,1
2015-12-10 17:29:21,40.258061,-75.264680,BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...,19446.0,EMS: DIABETIC EMERGENCY,HATFIELD TOWNSHIP,BRIAR PATH & WHITEMARSH LN,1
2015-12-10 14:39:21,40.121182,-75.351975,HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...,19401.0,Fire: GAS-ODOR/LEAK,NORRISTOWN,HAWS AVE,1
2015-12-10 16:47:36,40.116153,-75.343513,AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...,19401.0,EMS: CARDIAC EMERGENCY,NORRISTOWN,AIRY ST & SWEDE ST,1
2015-12-10 16:56:52,40.251492,-75.603350,CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...,,EMS: DIZZINESS,LOWER POTTSGROVE,CHERRYWOOD CT & DEAD END,1
2015-12-10 15:39:04,40.253473,-75.283245,CANNON AVE & W 9TH ST; LANSDALE; Station 345;...,19446.0,EMS: HEAD INJURY,LANSDALE,CANNON AVE & W 9TH ST,1
2015-12-10 16:46:48,40.182111,-75.127795,LAUREL AVE & OAKDALE AVE; HORSHAM; Station 35...,19044.0,EMS: NAUSEA/VOMITING,HORSHAM,LAUREL AVE & OAKDALE AVE,1
2015-12-10 16:17:05,40.217286,-75.405182,COLLEGEVILLE RD & LYWISKI RD; SKIPPACK; Stati...,19426.0,EMS: RESPIRATORY EMERGENCY,SKIPPACK,COLLEGEVILLE RD & LYWISKI RD,1
2015-12-10 16:51:42,40.289027,-75.399590,MAIN ST & OLD SUMNEYTOWN PIKE; LOWER SALFORD;...,19438.0,EMS: SYNCOPAL EPISODE,LOWER SALFORD,MAIN ST & OLD SUMNEYTOWN PIKE,1
2015-12-10 17:35:41,40.102398,-75.291458,BLUEROUTE & RAMP I476 NB TO CHEMICAL RD; PLYM...,19462.0,Traffic: VEHICLE ACCIDENT -,PLYMOUTH,BLUEROUTE & RAMP I476 NB TO CHEMICAL RD,1


In [80]:
# 按月份统计次数
df.resample("M").count()

Unnamed: 0_level_0,lat,lng,desc,zip,title,twp,addr,e
timeStamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2015-12-31,7916,7916,7916,6902,7916,7911,7916,7916
2016-01-31,13096,13096,13096,11512,13096,13094,13096,13096
2016-02-29,11396,11396,11396,9926,11396,11395,11396,11396
2016-03-31,11059,11059,11059,9754,11059,11052,11059,11059
2016-04-30,11287,11287,11287,9897,11287,11284,11287,11287
2016-05-31,11374,11374,11374,9938,11374,11371,11374,11374
2016-06-30,11732,11732,11732,10205,11732,11726,11732,11732
2016-07-31,12088,12088,12088,10626,12088,12086,12088,12088
2016-08-31,11904,11904,11904,10381,11904,11902,11904,11904
2016-09-30,11669,11669,11669,10174,11669,11666,11669,11669
