上一小节，介绍了numpy中读取数据，计算均值方差，最大最小值等基本操作。这一节中，我将继续学习numpy常用函数，用来做更加深入的股票分析。
这里股票分析只是一个例子，不必在意概念细节，只是针对现有的数据来熟悉numpy的操作。

In [10]:
import numpy as np

##1. 股票收益率

简单收益率是指相邻两个价格之间的变化率，而对数收益率是指所有价格取对数后两两之间的差值。由于对数收益率是两两价格相除再去对数，所以其**可以用来衡量价格收益率。**

投资者最感兴趣的是收益率的方差或标准差，这代表了投资风险的大小。

###1.1 计算简单收益率
diff函数计算离散差分，计算收益率，需要用差值除以前一天的价格。

In [11]:
#cp means closing price
cp = np.loadtxt('data.csv', delimiter=',', usecols=(6,), unpack=True)
cp_diff = np.diff(cp)
rate_of_returns = cp_diff / cp[:-1]

In [12]:
cp

array([ 336.1 ,  339.32,  345.03,  344.32,  343.44,  346.5 ,  351.88,
        355.2 ,  358.16,  354.54,  356.85,  359.18,  359.9 ,  363.13,
        358.3 ,  350.56,  338.61,  342.62,  342.88,  348.16,  353.21,
        349.31,  352.12,  359.56,  360.  ,  355.36,  355.76,  352.47,
        346.67,  351.99])

In [13]:
#去掉最后一个
#从开始到倒数第二个
cp[:-1]

array([ 336.1 ,  339.32,  345.03,  344.32,  343.44,  346.5 ,  351.88,
        355.2 ,  358.16,  354.54,  356.85,  359.18,  359.9 ,  363.13,
        358.3 ,  350.56,  338.61,  342.62,  342.88,  348.16,  353.21,
        349.31,  352.12,  359.56,  360.  ,  355.36,  355.76,  352.47,
        346.67])

In [14]:
rate_of_returns

array([ 0.00958048,  0.01682777, -0.00205779, -0.00255576,  0.00890985,
        0.0155267 ,  0.00943503,  0.00833333, -0.01010721,  0.00651548,
        0.00652935,  0.00200457,  0.00897472, -0.01330102, -0.02160201,
       -0.03408832,  0.01184253,  0.00075886,  0.01539897,  0.01450483,
       -0.01104159,  0.00804443,  0.02112916,  0.00122372, -0.01288889,
        0.00112562, -0.00924781, -0.0164553 ,  0.01534601])

In [15]:
print "standard deviation = ", np.std(rate_of_returns)

standard deviation =  0.0129221344368


###1.2 对数收益率

取对数要注意事先检查数据中不包含0和负值，确保输入满足定义域条件

In [16]:
log_rate_of_returns = np.diff(np.log(cp))

In [17]:
log_rate_of_returns

array([ 0.00953488,  0.01668775, -0.00205991, -0.00255903,  0.00887039,
        0.01540739,  0.0093908 ,  0.0082988 , -0.01015864,  0.00649435,
        0.00650813,  0.00200256,  0.00893468, -0.01339027, -0.02183875,
       -0.03468287,  0.01177296,  0.00075857,  0.01528161,  0.01440064,
       -0.011103  ,  0.00801225,  0.02090904,  0.00122297, -0.01297267,
        0.00112499, -0.00929083, -0.01659219,  0.01522945])

得到收益率为正的情况，where函数可以根据制定条件返回数组的索引值

In [18]:
pos_return_indices = np.where(log_rate_of_returns > 0)
print 'Indices with positive returns', pos_return_indices

Indices with positive returns (array([ 0,  1,  4,  5,  6,  7,  9, 10, 11, 12, 16, 17, 18, 19, 21, 22, 23,
       25, 28]),)


###1.3 计算波动率
波动率volatility是对价格变动的一种度量。**年波动率等于对数收益率的标准差除以其均值，在除以交易日倒数的平方根，通常交易日去252天。**

In [19]:
annual_volatility = np.std(log_rate_of_returns) / np.mean(log_rate_of_returns)
annual_volatility = annual_volatility / np.sqrt(1. / 252.)
print annual_volatility

129.274789911


sqrt函数中的除法运算必须使用浮点数才能得到正确结果。

In [20]:
print "Monthly volatility", annual_volatility * np.sqrt(1. / 12.)

Monthly volatility 37.3184173773


##2. 日期分析
我们读入收盘价数据，根据星期几来切分数据，计算平均价格，最后找出一周的那一天的平均收盘价最高

**Numpy是面向浮点数运算的，对日期处理需要专门的方法**

loadtxt函数中有一个特定的参数converters，这是数据列到转换函数之间映射的字典。

In [21]:
#convert funtion
import datetime
def datestr2num(s):
    return datetime.datetime.strptime(s, "%d-%m-%Y").date().weekday()

#字符串首先会按照指定形式"%d-%m-%Y"转换成一个datetime对象，随后datetime对象被转换成date对象，最后调用weekday方法返回一个数字
#0代表星期一，6代表星期天

In [22]:
#cp means closing price
dates, cp = np.loadtxt('data.csv', delimiter=',', usecols=(1,6), converters={1:datestr2num}, unpack=True)
print "Dates = ", dates

Dates =  [ 4.  0.  1.  2.  3.  4.  0.  1.  2.  3.  4.  0.  1.  2.  3.  4.  1.  2.
  3.  4.  0.  1.  2.  3.  4.  0.  1.  2.  3.  4.]


In [23]:
#保存各工作日的平均收盘价
averages = np.zeros(5)

**where函数会根据指定的条件返回所有满足条件的数组元素的索引值；take函数根据这些索引值从数组中取出相应的元素。**

In [24]:
for i in xrange(5):
    indices = np.where(dates == i)
    prices = np.take(cp, indices)
    avg = prices.mean()
    print "Day", i, "prices", prices, "Average", avg
    averages[i] = avg

Day 0 prices [[ 339.32  351.88  359.18  353.21  355.36]] Average 351.79
Day 1 prices [[ 345.03  355.2   359.9   338.61  349.31  355.76]] Average 350.635
Day 2 prices [[ 344.32  358.16  363.13  342.62  352.12  352.47]] Average 352.136666667
Day 3 prices [[ 343.44  354.54  358.3   342.88  359.56  346.67]] Average 350.898333333
Day 4 prices [[ 336.1   346.5   356.85  350.56  348.16  360.    351.99]] Average 350.022857143


In [25]:
#找出哪个工作日的平均收盘价最高，哪个最低
top = averages.max()
bottom = averages.min()
print "Highest average", top
print "Top day of the week", np.argmax(averages)
print "Lowest average", bottom
print "Bottom day of the week", np.argmin(averages)

Highest average 352.136666667
Top day of the week 2
Lowest average 350.022857143
Bottom day of the week 4


**argmin函数返回的是averages数组中最小元素的索引值，argmax同理**

##3. 周汇总

In [35]:
dates, op, hp, lp, cp = np.loadtxt('data.csv', delimiter=',', usecols=(1,3,4,5,6), converters={1: datestr2num}, unpack=True)
cp = cp[:16]
dates = dates[:16]
#找到第一个星期一
first_monday = np.ravel(np.where(dates == 0))[0]
print "The first Monday index is ",first_monday

The first Monday index is  1


In [36]:
#找到最后一个周五
last_friday = np.ravel(np.where(dates == 4))[-1]
print "The last Friday index is ", last_friday

The last Friday index is  15


In [37]:
#存储三周每一天的索引值
weeks_indices = np.arange(first_monday, last_friday+1)
print "Weeks indices initial ",weeks_indices

Weeks indices initial  [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]


In [38]:
#切分数组，每5个元素一个子数组
weeks_indices = np.split(weeks_indices, 3)
print "Weeks indices after split ", weeks_indices

Weeks indices after split  [array([1, 2, 3, 4, 5]), array([ 6,  7,  8,  9, 10]), array([11, 12, 13, 14, 15])]


**apply_along_axis函数会调用一个自定义函数，作用于每一个数组元素上。**在调用apply_along_axis时提供我们自定义的函数summarize，并指定要作用的轴或维度的编号以及函数的参数

In [68]:
#这里indices是三个array
def summarize(indices, open, high, low, close):
    monday_open = open[indices[0]]
    week_high = np.max(np.take(high, indices))
    week_low = np.min(np.take(low, indices))
    friday_close = close[indices[-1]]
    return ("APPL", monday_open, week_high, week_low, friday_close)

In [69]:
#这里参数1，代表作用于每一行，一共三行，得到三行结果
#相当于axis为1的时候进行计算
week_summary = np.apply_along_axis(summarize, 1, weeks_indices, op, hp, lp, cp)
print "Week summary", week_summary

Week summary [['APPL' '335.8' '346.7' '334.3' '346.5']
 ['APPL' '347.8' '360.0' '347.6' '356.8']
 ['APPL' '356.7' '364.9' '349.5' '350.5']]
