NumPy是用于科学计算的一个开源Python扩充程序库,它为Python提供了高性能的数组与矩阵运算处理能力.NumPy为Python带来了真正的多维数组功能，并且提供了丰富的函数库处理这些数组。它将常用的数学函数都支持向量化运算，使得这些数学函数能够直接对数组进行操作，将本来需要在Python级别进行的循环，放到C语言的运算中，明显地提高了程序的运算速度。

NumPy http://www.numpy.org/  

Pandas http://pandas.pydata.org/  

pandas库大量依赖NumPy数组来实现其Series以及DataFrame对象,NumPy同时也支持分片(slice
)以及向量化操作.所以我们在学习pandas前先来了解一下NumPy.

+ 任意维数的数组对象（ndarray，n-dimensional array object）
+ 通用函数对象（ufunc，universal function object）


In [1]:
import numpy as np

In [4]:
def squares(values):
    result = []
    for v in values:
        result.append(v * v)
    return result

to_square = range(10000)
squares(to_square)[0:10]
#%timeit squares(to_square)


[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [3]:
array_to_square = np.arange(0, 10000)
# vectorized operation
%timeit array_to_square ** 2

The slowest run took 9.10 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.67 µs per loop


# ndarray

NumPy的核心功能是"ndarray"(即n-dimensional array，多维数组)数据结构。特点:

+ 连续内存分配
+ 向量化操作
+ 布尔选择
+ 分片(sliceability)

#### ndarray基础

+ ndarray.ndim 数组轴的个数，在python的世界中，轴的个数被称作秩
+ ndarray.shape 数组的维度。这是一个指示数组在每个维度上大小的整数元组。例如一个n排m列的矩阵，它的shape属性将是(2,3),这个元组的长度显然是秩，即维度或者ndim属性
+ ndarray.size 数组元素的总个数，等于shape属性中元组元素的乘积。
+ ndarray.dtype 一个用来描述数组中元素类型的对象，可以通过创造或指定dtype使用标准Python类型。另外NumPy提供它自己的数据类型。
+ ndarray.itemsize 数组中每个元素的字节大小。例如，一个元素类型为float64的数组itemsiz属性值为8(=64/8),又如，一个元素类型为complex32的数组item属性为4(=32/8).
+ ndarray.data 包含实际数组元素的缓冲区，通常我们不需要使用这个属性，因为我们总是通过索引来使用数组中的元素。


In [4]:
x=np.array([1,2,3,4,5])
x

array([1, 2, 3, 4, 5])

In [5]:
type(x)

numpy.ndarray

In [6]:
x.ndim  #一维

1

In [7]:
x.shape  #5列

(5,)

In [8]:
x.size  #数的多少

5

In [9]:
x.itemsize

4

In [10]:
x.data

<memory at 0x0000000005573888>

In [11]:
y=np.array([[1,2,3,4,5],[6,7,8,9,10]])
y

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [12]:
y.ndim  #2维，2行

2

In [13]:
y.size

10

In [14]:
y.shape

(2, 5)

In [15]:
x1 = np.array([1, 2, 3, 4.0, 5.0])
x1.dtype,x.dtype

(dtype('float64'), dtype('int32'))

#### create ndarray

In [16]:
x2=np.array([1]*5)
x2

array([1, 1, 1, 1, 1])

In [17]:
np.zeros(10)

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [18]:
np.array(range(10))

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [19]:
range?

In [20]:
np.array(range(0,10,2))

array([0, 2, 4, 6, 8])

In [21]:
np.array(range(10,0,-1))

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1])

In [22]:
np.linspace(0,10,5)

array([  0. ,   2.5,   5. ,   7.5,  10. ])

In [5]:
np.linspace?

In [24]:
# vectolized operations
print(x)
print(x1)
x+x1

[1 2 3 4 5]
[ 1.  2.  3.  4.  5.]


array([  2.,   4.,   6.,   8.,  10.])

In [25]:
x2 = np.arange(0, 12).reshape(4, 3) #形状进行改变，4行3列
x2

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [26]:
np.size(x2)

12

In [27]:
np.size(x2,0)  #第二个参数是轴，0表示按行

4

In [28]:
np.size(x2,1)  #1表示列，所以3列。

3

In [29]:
np.zeros((2,3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [30]:
np.ones((2,3,4),dtype=int)

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]])

In [31]:
np.empty((5,6))  

array([[  2.39607303e-316,   9.53546696e-322,   0.00000000e+000,
          0.00000000e+000,   1.90979621e-313,   1.16097020e-028],
       [  9.72161570e-072,   6.37360804e-062,   4.67009157e-062,
          4.96216861e+180,   8.37174974e-144,   1.08977929e-071],
       [  8.67584057e+010,   1.48475428e-076,   9.98288485e-043,
          3.59751658e+252,   8.93185432e+271,   4.76484771e+180],
       [  4.50622287e-144,   1.16071308e-028,   3.54541035e-057,
          1.35043750e-066,   1.52643834e+030,   1.44628172e+030],
       [  1.14428494e+243,   1.71050206e+256,   5.49109388e-143,
          4.82337433e+228,   5.18315232e-144,   5.80812045e+294]])

In [32]:
np.random.randint(0,10,3)  #生成随机整数

array([5, 1, 2])

In [33]:
np.random.random(10)  #生成随机数

array([ 0.12989386,  0.59428616,  0.64542946,  0.61067592,  0.21600184,
        0.09520857,  0.3065091 ,  0.14342125,  0.38271001,  0.28484175])

In [6]:
np.random?

# 选择数组中元素

In [35]:
x

array([1, 2, 3, 4, 5])

In [36]:
x[0],x[3]

(1, 4)

In [37]:
y

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [38]:
y[1,3]  #第二行第四列

9

In [39]:
y[0,]

array([1, 2, 3, 4, 5])

In [40]:
y[:,1]

array([2, 7])

In [41]:
y[:,1:4]  #切片

array([[2, 3, 4],
       [7, 8, 9]])

# boolean selection

In [42]:
x

array([1, 2, 3, 4, 5])

In [43]:
x<2

array([ True, False, False, False, False], dtype=bool)

In [44]:
x<2 or x>4

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [45]:
(x<2) | (x>4)

array([ True, False, False, False,  True], dtype=bool)

In [46]:
mask=x<3
mask

array([ True,  True, False, False, False], dtype=bool)

In [47]:
x[mask]   #选择所有小于3的数，用bollen选择

array([1, 2])

In [48]:
np.sum(x<3)

2

In [49]:
y

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [50]:
x==y[0,]

array([ True,  True,  True,  True,  True], dtype=bool)

In [51]:
a1 = np.arange(9).reshape(3, 3)
a2 = np.arange(9, 0 , -1).reshape(3, 3)
a1 < a2

array([[ True,  True,  True],
       [ True,  True, False],
       [False, False, False]], dtype=bool)

# slice

start:end:step

In [8]:
x=np.arange(0,10)
x[3:9]    #[:]为切片，[,]为行列

array([3, 4, 5, 6, 7, 8])

In [53]:
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [54]:
x[::2]  #所有数都包含，步长为2.

array([0, 2, 4, 6, 8])

In [55]:
x[[0,2]]

array([0, 2])

In [56]:
x[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [57]:
x[:6]

array([0, 1, 2, 3, 4, 5])

In [58]:
x[2:]

array([2, 3, 4, 5, 6, 7, 8, 9])

In [59]:
y=np.arange(0,16).reshape(4,4)
y

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [60]:
y[:,1:3]  #，号前为行，，号后为列。。

array([[ 1,  2],
       [ 5,  6],
       [ 9, 10],
       [13, 14]])

In [61]:
y[1:3,2:4]

array([[ 6,  7],
       [10, 11]])

In [62]:
y[[1,3],:]  #第二行到第四行的所有列，list含首尾

array([[ 4,  5,  6,  7],
       [12, 13, 14, 15]])

# reshape  改变形状

In [9]:
x=np.arange(0,9)
y=x.reshape(3,3)
y

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [85]:
y.reshape(9)  #9个数值的形状

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [66]:
y #注意这里y没有变

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [64]:
y.ravel()

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [67]:
reshaped = y.reshape(np.size(y))
raveled = y.ravel()

reshaped[2] = 1000
raveled[5] = 2000
y

array([[   0,    1, 1000],
       [   3,    4, 2000],
       [   6,    7,    8]])

In [68]:
reshaped

array([   0,    1, 1000,    3,    4, 2000,    6,    7,    8])

In [111]:
y = np.arange(0, 9).reshape(3,3)
flattened = y.flatten()

flattened[0] = 1000
flattened

array([1000,    1,    2,    3,    4,    5,    6,    7,    8])

In [112]:
y

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [113]:
flattened.shape = (3, 3)
flattened

array([[1000,    1,    2],
       [   3,    4,    5],
       [   6,    7,    8]])

In [114]:
flattened.T

array([[1000,    3,    6],
       [   1,    4,    7],
       [   2,    5,    8]])

http://stackoverflow.com/questions/33116936/differences-between-x-ravel-and-x-reshapes0s1s2-when-number-of-axes-known
http://www.python-course.eu/matrix_arithmetic.php

# 合并

In [70]:
a = np.arange(9).reshape(3, 3)
b = (a + 1) * 10  #向量化运算，每个值都+1.
a

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [116]:
b

array([[10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

In [117]:
np.hstack((a, b))

array([[ 0,  1,  2, 10, 20, 30],
       [ 3,  4,  5, 40, 50, 60],
       [ 6,  7,  8, 70, 80, 90]])

In [118]:
np.vstack((a, b))

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

In [119]:
np.concatenate((a, b), axis = 0)  #链接，0表示行方向增加。

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

In [120]:
np.concatenate((a, b), axis = 1)

array([[ 0,  1,  2, 10, 20, 30],
       [ 3,  4,  5, 40, 50, 60],
       [ 6,  7,  8, 70, 80, 90]])

In [71]:
np.dstack((a, b))

array([[[ 0, 10],
        [ 1, 20],
        [ 2, 30]],

       [[ 3, 40],
        [ 4, 50],
        [ 5, 60]],

       [[ 6, 70],
        [ 7, 80],
        [ 8, 90]]])

In [122]:
one_d_a = np.arange(5)
one_d_b = (one_d_a + 1) * 10
np.column_stack((one_d_a, one_d_b))

array([[ 0, 10],
       [ 1, 20],
       [ 2, 30],
       [ 3, 40],
       [ 4, 50]])

In [123]:
np.row_stack((one_d_a, one_d_b))

array([[ 0,  1,  2,  3,  4],
       [10, 20, 30, 40, 50]])

# matrix

+ 加
+ 减
+ 乘
+ 内积
+ 外积

##  算术运算

+ +
+ -
+ *
+ /
+ **
+ %

In [91]:
x = np.array([1,5,2])
y = np.array([7,4,1])
x + y

array([8, 9, 3])

![](./img/vector_addition.png)

In [92]:
x * y

array([ 7, 20,  2])

In [93]:
x - y

array([-6,  1,  1])

![](./img/vector_subtraction.png)

In [94]:
x / y

array([ 0.14285714,  1.25      ,  2.        ])

In [95]:
x % y

array([1, 1, 0], dtype=int32)

# matrix

http://baike.baidu.com/item/%E5%90%91%E9%87%8F%E7%A7%AF?fromtitle=cross+product&type=syn

## scalar product/dot product

$$\vec{a}\cdot\vec{b}=|\vec{a}||\vec{b}|cos\angle(\vec{a},\vec{b})$$

$$\vec{a}\cdot\vec{b}=a_{1}b_{1}+a_{2}b_{2}+a_{3}b_{3}$$

关于[latex](http://www.mohu.org/info/symbols/symbols.htm)的一些常用符号

In [96]:
x = np.array([1,2,3])
y = np.array([-7,8,9])
np.dot(x,y)

36

In [98]:
dot = np.dot(x,y)
x_modulus = np.sqrt((x*x).sum())
y_modulus = np.sqrt((y*y).sum())
cos_angle = dot / x_modulus / y_modulus # cosine of angle between x and y
angle = np.arccos(cos_angle)
print("angle=",angle)
print(angle * 360 / 2 / np.pi) # angle in degrees
x_modulus*y_modulus*cos_angle

angle= 0.808233789011
46.3083849702


36.0

## matrix

In [99]:
x = np.array( ((2,3), (3, 5)) )
y = np.array( ((1,2), (5, -1)) )
print( x * y)
x = np.matrix( ((2,3), (3, 5)) )
y = np.matrix( ((1,2), (5, -1)) )
print(x * y)

[[ 2  6]
 [15 -5]]
[[17  1]
 [28  1]]


![](./img/matrix_product2.jpeg)

In [100]:
x = np.array( ((2,3), (3, 5)) )
y = np.matrix( ((1,2), (5, -1)) )
np.dot(x,y)

matrix([[17,  1],
        [28,  1]])

In [101]:
np.mat(x) * np.mat(y)

matrix([[17,  1],
        [28,  1]])

假设有有4人,Tom,Mike,Jason,Jack买了三种食品  
Tom: 100g A,175g B 210g C  
Mike: 90g A, 160g B ,150g C  
Jason:200g A, 50g B,100g C  
Jack:120g A,310g C  

A 2.98/100g  
B 3.90/100g  
C 1.99/100g  

In [9]:
NumPersons = np.array([[100,175,210],[90,160,150],[200,50,100],[120,0,310]])
Price_per_100_g = np.array([2.98,3.90,1.99])
Price_in_Cent = np.dot(NumPersons,Price_per_100_g)
Price_in_Euro = Price_in_Cent/np.array([100,100,100,100])
Price_in_Euro

array([ 13.984,  11.907,   9.9  ,   9.745])

## cross product

$$\vec{a}\cdot\vec{b}=|\vec{a}||\vec{b}|sin\angle(\vec{a},\vec{b})|\vec{n}|$$

![](./img/Cross_product_vector.png)

这里$|\vec{n}|$是一个垂直于由$\vec{a}$和$\vec{b}$构成平面的单位向量，它的方向根据右手法则获得 

In [102]:
x = np.array([0,0,1])
y = np.array([0,1,0])

np.cross(x,y)

array([-1,  0,  0])

In [103]:
np.cross(y,x)

array([1, 0, 0])

# 通用函数

In [73]:
m = np.arange(10, 19).reshape(3, 3)
print (m)
print ("{0} min of the entire matrix".format(m.min()))
print ("{0} max of entire matrix".format(m.max()))
print ("{0} position of the min value".format(m.argmin()))
print ("{0} position of the max value".format(m.argmax()))
print ("{0} mins down each column".format(m.min(axis = 0)))
print ("{0} mins across each row".format(m.min(axis = 1)))
print ("{0} maxs down each column".format(m.max(axis = 0)))
print ("{0} maxs across each row".format(m.max(axis = 1)))

[[10 11 12]
 [13 14 15]
 [16 17 18]]
10 min of the entire matrix
18 max of entire matrix
0 position of the min value
8 position of the max value
[10 11 12] mins down each column
[10 13 16] mins across each row
[16 17 18] maxs down each column
[12 15 18] maxs across each row


In [74]:
a = np.arange(1,10)
a.mean(), a.std(), a.var()

(5.0, 2.5819888974716112, 6.666666666666667)

In [128]:
a.sum(), a.prod()  #sum表示累计和

(45, 362880)

In [75]:
a

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [129]:
a.cumsum(), a.cumprod()

(array([ 1,  3,  6, 10, 15, 21, 28, 36, 45], dtype=int32),
 array([     1,      2,      6,     24,    120,    720,   5040,  40320,
        362880], dtype=int32))

In [77]:
a = np.arange(10)   #any和all的区别，有和所有的区别，用来做布尔过滤
(a < 5).any() # any < 5? all

False

In [76]:
a<5

array([ True,  True,  True,  True, False, False, False, False, False], dtype=bool)

+ 创建array:arange, array, copy, empty, empty_like, eye, fromfile, fromfunction, identity, linspace, logspace, mgrid, ogrid, ones, ones_like, r , zeros, zeros_like
+ 操作:array split, column stack, concatenate, diagonal, dsplit, dstack, hsplit, hstack, item, newaxis, ravel, repeat, reshape, resize, squeeze, swapaxes, take, transpose, vsplit, vstack
+ all, any, nonzero, where
+ argmax, argmin, argsort, max, min, ptp, searchsorted, sort
+ choose, compress, cumprod, cumsum, inner, fill, imag, prod, put, putmask, real, sum
+ 基本统计:cov, mean, std, var
+ 基本线性代数:cross, dot, outer, svd, vdot
