# 使用Python处理时序

## 1. DatetimeIndex
### 1.1 创建DatetimeIndex
1.这个DatetimeIndex也就是时序索引，pandas为了创建时序索引，提供了date_range函数，几个参数是 开始日期、频率参数（freq）、周期数（periods）/或者结束日期 \
2.to_datetime函数可以将字符串（object）转为datetime类型。同时，也可以在read_csv()方法中指定参数，index_col指定索引列，parse_dates指定转化为datetime类型的列。我在例子中，统一使用类型转化的方式来处理。

In [1]:
import pandas as pd 
import numpy as np 
pd.options.plotting.backend = "plotly"

# 使用pandas创建时序索引 -----------------------
di_1 = pd.date_range('2022-09-04',periods=5, freq='W')
di_2 = pd.date_range('2022-09-04', '2022-09-30', freq='D')
# 根据di_1时序索引来创建一个dataframe，时序data_frame
df_ti_1 = pd.DataFrame(
    data = [183, 562, 18, 97, 49] ,
    columns = ["visitors"],
    index = di_1 
)
df_ti_1 


# 更改数据类型，使用to_datetime函数完成变更 
msft = pd.read_csv("data\MSFT.csv")
msft.loc[:,"Date"] = pd.to_datetime(msft["Date"])
msft.loc[:,"Volume"] = msft["Volume"].astype("int")
# msft.info()
msft

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,1986-03-13,0.088542,0.101563,0.088542,0.097222,0.062205,1031788800
1,1986-03-14,0.097222,0.102431,0.097222,0.100694,0.064427,308160000
2,1986-03-17,0.100694,0.103299,0.100694,0.102431,0.065537,133171200
3,1986-03-18,0.102431,0.103299,0.098958,0.099826,0.063871,67766400
4,1986-03-19,0.099826,0.100694,0.097222,0.098090,0.062760,47894400
...,...,...,...,...,...,...,...
8617,2020-05-20,184.809998,185.850006,183.940002,185.660004,185.660004,31261300
8618,2020-05-21,185.399994,186.669998,183.289993,183.429993,183.429993,29119500
8619,2020-05-22,183.190002,184.460007,182.539993,183.509995,183.509995,20826900
8620,2020-05-26,186.339996,186.500000,181.100006,181.570007,181.570007,36073600


### 1.2 筛选时序dataframe & 时区处理
1.如果dateframe是以时序作为索引的，可以方便的做筛选，给字符串参数可以按照年、月、日筛选 \
2.pd.DateOffset表示一个时间差，可以是小时、分钟等。\
3.使用tz_localize可以用用来设定时区。 

In [4]:
msft = msft.set_index("Date")      #只能执行一次
msft.loc["1987-01":"2000-06", "High"].plot()
msft_close = msft.loc[:,["Adj Close"]].copy()
msft_close.index = msft_close.index + pd.DateOffset(hours=6)   # 索引都增加 6小时
msft_close = msft_close.tz_localize("America/New_York")   # 时区切换
# msft_close.info()
msft_close

# 筛选出2020-01的股价变化
msft.loc[ "2020-01" ,: ]
msft_202001 = msft.loc[ "2020-01" , "Low":"Volume" ]

# 2.时序操作
1.计算变化率 ： \
   1).pandas使用shift方法，将值下移一行（除了索引列，都是值）。shift的参数为正时，就向下移动，为负时，向上移动。 \
   2).pandas内置方法pct_change ，在默认情况下，会计算相对前一行数据的百分比变化率。 \
2.相关性计算：  \
   1).使用concat来合并多个dataframe，pandas会将每个时序都沿日期进行了自动对齐。

In [10]:
# 以下2个计算返回相同的计算结果
msft_202001_rate = msft_202001/msft_202001.shift(1) - 1 
msft_202001_pct_change = msft_202001.pct_change()

# 使用pd.concat连接 时序dataframe， 
df_list = [] 
tiker = ["AAPL" , "AMZN" , "GOOGL", "MSFT"]
for tik in tiker:
    df_tik = pd.read_csv(
            f"data\{tik}.csv" ,
        parse_dates = ["Date"],         # 需要转到datetime类型的列
        usecols = ["Date", "Adj Close"],
        index_col = "Date"
    )
    df_tik = df_tik.rename(columns = {"Adj Close" : tik})     # rename不会直接修改dataframe，需要重新给到原dataframe
    df_list.append(df_tik)
adj_close = pd.concat(df_list,axis = 1)    # 一次性合并多个时间序列dataframe
adj_close = adj_close.dropna()             # 删掉dataframe中的NaN值


# 对adj_close进行采样， 
adj_close_sample = adj_close.loc["2019-06":"2020-05",:]
rebased_price = adj_close_sample / adj_close_sample.iloc[0,:] * 100
rebased_price.head(2)
rebased_price.plot()

adj_close

Unnamed: 0_level_0,AAPL,AMZN,GOOGL,MSFT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2004-08-19,1.898969,38.630001,50.220219,17.505459
2004-08-20,1.904534,39.509998,54.209209,17.557100
2004-08-23,1.921849,39.450001,54.754753,17.634779
2004-08-24,1.975645,39.049999,52.487488,17.634779
2004-08-25,2.043664,40.299999,53.053055,17.835468
...,...,...,...,...
2020-05-20,319.230011,2497.939941,1409.160034,185.660004
2020-05-21,316.850006,2446.739990,1406.750000,183.429993
2020-05-22,318.890015,2436.879883,1413.239990,183.509995
2020-05-26,316.730011,2421.860107,1421.369995,181.570007


# 3.时序操作上卷和下钻
1.向上采样指的是将时序的频率提高，而向下采样指的是将时序的频率降低。要将每日时序转换为每月时序，可以使用resample方法，该方法接受字符串形式的频率参数。

In [13]:
adj_close.resample("M").sum().head()

Unnamed: 0_level_0,AAPL,AMZN,GOOGL,MSFT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2004-08-31,18.254467,353.479999,474.154156,159.056304
2004-09-30,47.638091,851.509998,1190.075067,371.845358
2004-10-31,57.60969,803.580005,1610.535538,382.34592
2004-11-30,74.620643,805.43,1865.565536,402.220236
2004-12-31,87.742759,894.600002,2001.471457,429.697268
