# Numpy科学计算  

numpy 是 python科学计算的核心库。PYTHON里涉及到科学计算的包括Pandas,sklearn等都是基于numpy进行二次开发包装的。numpy功能非常强大，其余scipy构建了强大的PYTHON数理计算功能，函数接口丰富复杂。

<img src="http://wx1.sinaimg.cn/mw690/d409b13egy1fo90o5jtpqj211i09wjuc.jpg" width = "500" height = "300" alt="图片名称" align=center />

对于本次课程来说，我们重点学习的是以下几点：
    1. 数组的定义和应用
    2. 数组元素的索引选取
    3. 数组的计算
    4. 线性代数的运行计算

## Arrays

array用来存储同类型的序列数据，能够被非负整数进行索引。 维度的数量就是array的秩(rank)。

我们可以通过python的列表来创建array,并且通过方括号进行索引获取元素

In [2]:
import numpy as np 
a = np.array([1,3,4,6,10])

print(a)

print(a.size)
print(a.shape)
print(a[2])

[ 1  3  4  6 10]
5
(5,)
4


In [4]:
# 二维数组

b = np.array([[1,2,3,4],[5,6,7,8]])
print(b.shape)
#b[0,1]

(2, 4)


In [11]:

b = np.array([[[1,2,3,4],[5,6,7,8]]])

In [12]:
print(b.shape)

(1, 2, 4)


## 创建Array
numpy提供了内置的函数来创建一些特殊的数组

In [5]:
np.zeros(3)

array([0., 0., 0.])

In [8]:
b = np.ones([3,3])
print(b)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [9]:
b.shape

(3, 3)

In [10]:
np.zeros_like(b)

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [16]:
np.eye(3)

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

## Array的常用属性和方法

* 统计计算
* 排序
* 按照大小查索引
* 条件查找
* shape

In [13]:
a = np.random.rand(3,4)
a.shape

(3, 4)

In [14]:
a

array([[0.22115568, 0.85374181, 0.0971923 , 0.7749461 ],
       [0.94083223, 0.02671383, 0.73202248, 0.18316195],
       [0.97417192, 0.9745558 , 0.03574694, 0.00215869]])

In [20]:
a.size

12

In [37]:
len(a)

3

In [20]:
a.sum(axis = 1)

array([1.94703589, 1.88273049, 1.98663336])

In [18]:
np.sum(a)
np.sum(a,axis = 1)
np.sum(a,axis = 0)

array([2.13615983, 1.85501145, 0.86496173, 0.96026674])

In [25]:
np.mean(a)
np.std(a)

0.27852909235886786

In [43]:
a

array([[ 0.36963134,  0.12590815,  0.52912576,  0.38604634],
       [ 0.98066039,  0.93271032,  0.30694261,  0.58081517],
       [ 0.85971519,  0.89180773,  0.39815457,  0.73372857]])

In [22]:
# 排序
np.sort(a,axis = 0)

array([[0.22115568, 0.02671383, 0.03574694, 0.00215869],
       [0.94083223, 0.85374181, 0.0971923 , 0.18316195],
       [0.97417192, 0.9745558 , 0.73202248, 0.7749461 ]])

In [28]:
# Returns the indices that would sort this array.
a.argsort()

array([[2, 0, 3, 1],
       [1, 3, 2, 0],
       [3, 2, 0, 1]], dtype=int64)

In [None]:
0.0971923,0.22115568,0.7749461,0.85374181

In [23]:
a

array([[0.22115568, 0.85374181, 0.0971923 , 0.7749461 ],
       [0.94083223, 0.02671383, 0.73202248, 0.18316195],
       [0.97417192, 0.9745558 , 0.03574694, 0.00215869]])

In [26]:
# Returns the indices of the maximum values along an axis.
np.argmax(a,axis =0)

array([2, 2, 1, 0], dtype=int64)

In [50]:
np.max(a,axis = 1)

array([ 0.52912576,  0.98066039,  0.89180773])

In [36]:
a

array([[0.22115568, 0.85374181, 0.0971923 , 0.7749461 ],
       [0.94083223, 0.02671383, 0.73202248, 0.18316195],
       [0.97417192, 0.9745558 , 0.03574694, 0.00215869]])

In [37]:
# Return elements, either from `x` or `y`, depending on `condition`.
# If only `condition` is given, return ``condition.nonzero()``
np.where(a>0.5)

(array([0, 0, 1, 1, 2, 2], dtype=int64),
 array([1, 3, 0, 2, 0, 1], dtype=int64))

In [30]:
b = np.array([1,2,3,4,5,6,7])

In [33]:
b[np.where(b >4)]

array([5, 6, 7])

## Shape改变
一个数组的 shape 是由轴及其元素数量决定的，它一般由一个整型元组表示，且元组中的整数表示对应维度的元素数

In [38]:
import numpy as np
a = np.random.randint(1,100,size =(5,6))
a.shape

(5, 6)

In [39]:
a

array([[16, 24, 15, 98, 11, 46],
       [68, 14, 96,  4, 84, 23],
       [52, 19, 52, 92,  8, 23],
       [99, 44, 93, 19, 47, 43],
       [ 2, 33, 68, 63,  8, 84]])

一个数组的 shape 可以由许多方法改变。例如以下三种方法都可输出一个改变 shape 后的新数组，它们都不会改变原数组。其中 reshape 方法在实践中会经常用到，因为我们需要改变数组的维度以执行不同的运算。

In [40]:
a.ravel()

array([16, 24, 15, 98, 11, 46, 68, 14, 96,  4, 84, 23, 52, 19, 52, 92,  8,
       23, 99, 44, 93, 19, 47, 43,  2, 33, 68, 63,  8, 84])

In [41]:
a.reshape(10,3)

array([[16, 24, 15],
       [98, 11, 46],
       [68, 14, 96],
       [ 4, 84, 23],
       [52, 19, 52],
       [92,  8, 23],
       [99, 44, 93],
       [19, 47, 43],
       [ 2, 33, 68],
       [63,  8, 84]])

In [42]:
a.T 

array([[16, 68, 52, 99,  2],
       [24, 14, 19, 44, 33],
       [15, 96, 52, 93, 68],
       [98,  4, 92, 19, 63],
       [11, 84,  8, 47,  8],
       [46, 23, 23, 43, 84]])

ravel() 和 flatten() 都是将多维数组降位一维，flatten() 返回一份新的数组，且对它所做的修改不会影响原始数组.

In [43]:
a.flatten()

array([16, 24, 15, 98, 11, 46, 68, 14, 96,  4, 84, 23, 52, 19, 52, 92,  8,
       23, 99, 44, 93, 19, 47, 43,  2, 33, 68, 63,  8, 84])

如果在 shape 变换中一个维度设为 - 1，那么这一个维度包含的元素数将会被自动计算。如下所示，a 一共有 30 个元素，在确定一共有 3 行后，-1 会自动计算出应该需要 10 列才能安排所有的元素。

In [45]:
a.reshape(10,-1)

array([[16, 24, 15],
       [98, 11, 46],
       [68, 14, 96],
       [ 4, 84, 23],
       [52, 19, 52],
       [92,  8, 23],
       [99, 44, 93],
       [19, 47, 43],
       [ 2, 33, 68],
       [63,  8, 84]])

## 随机数

numpy可以根据一定的规则创建随机数，随机数的使用会在后面概率论，数据挖掘的时候经常用到。

官方主页[RANDOM](https://docs.scipy.org/doc/numpy/reference/routines.random.html)

常用的一些方法：

* rand(d0, d1, ..., dn)	Random values in a given shape.
* randn(d0, d1, ..., dn)	Return a sample (or samples) from the “standard normal” distribution.
* randint(low[, high, size, dtype])	Return random integers from low (inclusive) to high (exclusive).
* random([size])	Return random floats in the half-open interval [0.0, 1.0).
* sample([size])	Return random floats in the half-open interval [0.0, 1.0).
* choice(a[, size, replace, p])	Generates a random sample from a given 1-D array

In [54]:
np.random.rand(10)
np.random.rand(3,4)

array([[0.57971505, 0.48815695, 0.77907681, 0.50798552],
       [0.1393136 , 0.70995196, 0.11821931, 0.63467837],
       [0.0696019 , 0.17863175, 0.04099122, 0.04504422]])

In [55]:
np.random.randn(100)

array([ 5.51409848e-01,  4.22360041e-01,  1.71040523e+00, -7.04540001e-01,
        1.13518031e+00, -1.46299472e+00, -2.56903806e+00, -2.65053469e-01,
        2.39246210e+00,  1.25045402e+00,  2.58955558e-02, -5.77686234e-01,
        6.18851283e-01, -1.14703856e+00,  7.29162925e-02, -6.01940000e-01,
       -1.24232346e+00, -1.74068012e-01,  6.53268860e-01, -6.70071969e-02,
       -4.13291707e-01, -3.57338719e-01, -1.87471196e+00,  2.28190754e-01,
       -5.31114876e-01,  6.76067986e-01,  2.36430150e+00, -1.01725304e+00,
       -7.91914758e-01,  1.61012062e+00,  6.11852173e-01, -2.98429820e-01,
       -5.63288570e-01,  1.26884981e+00,  3.52867719e-01, -6.14166366e-01,
        1.95929832e-01, -2.70721292e-01,  5.47679986e-01,  8.31818396e-02,
       -5.44712773e-01, -3.47575131e-01, -1.96149073e-01, -1.02551975e+00,
        2.73810997e-01, -1.35373086e+00, -1.67351920e+00,  9.67888609e-01,
       -2.34989043e+00, -2.45327148e-01,  5.72575933e-01, -8.49816923e-01,
       -4.02756936e-01,  

In [75]:
np.random.randint(10)
np.random.randint(1,10,size = (3,4))

array([[2, 4, 1, 9],
       [2, 3, 3, 5],
       [7, 1, 3, 6]])

In [90]:
np.random.random((2,2))

array([[0.60225819, 0.14280808],
       [0.34875274, 0.36513415]])

In [75]:
np.random.choice(10,(3,4))

array([[5, 7, 2, 6],
       [1, 5, 3, 1],
       [8, 1, 8, 7]])

In [76]:
np.random.choice([1,4,5,7.08],(3,4))

array([[ 5.  ,  7.08,  7.08,  4.  ],
       [ 4.  ,  4.  ,  7.08,  4.  ],
       [ 7.08,  7.08,  7.08,  7.08]])

In [92]:
?np.random.choice

## 数组的索引

**切片**选取类似于list，但是array可以是多维度的，因此我们需要指定每一个维度上的操作

In [93]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]]) # 2维数组，shape = 3 * 4

In [96]:
a[1:3,0:1]
#a[:,:1]

array([[5],
       [9]])

**整数索引**

In [80]:
a[[1,2],[0,1]]

array([ 5, 10])

**布尔型索引**

In [82]:
a >4
a[a>4]

array([ 5,  6,  7,  8,  9, 10, 11, 12])

**图解索引**


<img src="http://wx1.sinaimg.cn/mw690/d409b13egy1fo90ob733bj218m0hyjz3.jpg" width = "600" height = "400" alt="图片名称" align=left />



<img src="http://wx2.sinaimg.cn/mw690/d409b13egy1fo90ohhyecj21020kaage.jpg" width = "600" height = "400" alt="图片名称" align=left />



## 数组数学

In [84]:
a = np.random.random([3,4])
b = np.random.random([3,4])

In [85]:
a

array([[ 0.76004347,  0.4258463 ,  0.08326275,  0.93285095],
       [ 0.87100438,  0.89512213,  0.66405053,  0.37225536],
       [ 0.59545034,  0.41663924,  0.51195997,  0.77346328]])

In [33]:
a + 2

array([[ 2.48043437,  2.3062315 ,  2.37038885,  2.24346901],
       [ 2.57617282,  2.45257504,  2.59148344,  2.9932576 ],
       [ 2.01946187,  2.9662433 ,  2.59164076,  2.89874224]])

In [86]:
a * 10

array([[ 7.60043471,  4.25846302,  0.83262751,  9.32850955],
       [ 8.71004379,  8.95122127,  6.64050534,  3.72255362],
       [ 5.95450342,  4.16639238,  5.11959969,  7.73463281]])

In [87]:
b

array([[ 0.56674767,  0.83059901,  0.08406071,  0.60134785],
       [ 0.68305575,  0.85945331,  0.50625002,  0.65044408],
       [ 0.00539243,  0.39640508,  0.43254736,  0.94011285]])

In [88]:
# Elementwise
a  + b
a - b
a * b
a / b

array([[   1.34106147,    0.51269782,    0.99050729,    1.5512668 ],
       [   1.27515856,    1.04150175,    1.31170472,    0.57230955],
       [ 110.42337422,    1.05104414,    1.18359286,    0.82273451]])

In [40]:
# Elementwise
np.add(a,b)
np.subtract(a,b)
np.multiply(a,b)
np.divide(a,b)

array([[ 0.720169  ,  0.67768136,  1.59930077,  0.40340647],
       [ 0.78913739,  0.55002961,  0.7516446 ,  0.42565485],
       [ 1.71390674,  0.76294821,  3.74288752,  0.50604784]])

*是元素力度的计算(Elementwise),并不是矩阵计算。我们使用dot函数进行内积求解

In [89]:
# shape(a) = 3*4  shape(b.T) = 4*3
a.dot(b.T) # (3*4) * (4*3) = 3 * 3
np.dot(a,b.T)

array([[ 1.35242743,  1.53406623,  1.08590637],
       [ 1.51680278,  1.94256711,  0.99672314],
       [ 1.19168644,  1.52708211,  1.11695854]])

## 线性代数
numpy和scipy可以进行线性代数的计算，但是我们目前还没补充线性代数知识。因此这一章节我们会挪动到 线性代数 理论知识章节进行讲解！