## 1. 向量化运算 vs python loop

numpy实现了向量化运算，在进行重复步骤的计算时，比python for loop要快很多。

所谓向量化运算，即运算时采用矩阵运算的方式，直接对整个数组/矩阵进行操作，避免用循环遍历每一个元素。假设要计算数组array中每个元素的倒数，采用'1 / array'的形式，而不是遍历每个元素，称为向量化运算。

向量化运算的实现，其实基于numpy在底层用C和Fortran实现了运算，所以速度比python代码快很多。

In [1]:
import numpy as np

创建一个函数，遍历数组的每个元素，计算其平方根，然后对比for loop和向量化运算的速度。

In [2]:
def cal_squared(arr):
    n = len(arr)
    out = np.zeros(n)
    for i in range(n):
        out[i] = np.sqrt(arr[i])
    return out

In [3]:
arr = np.random.randint(1, 100, 1000000)

In [4]:
%timeit res = cal_squared(arr)

1.07 s ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [5]:
%timeit res2 = np.sqrt(arr)

5.5 ms ± 285 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


计算100万个随机数的平方根，for loop耗时约1秒，向量化运算只需要约5毫秒，速度快近200倍。

## 2. 计算描述统计量

In [6]:
def print_summary_stats(arr):
    print("sum: ", np.sum(arr))
    print("product: ", np.product(arr))
    print("mean: ", np.mean(arr))
    print("std: ", np.std(arr))
    print("min: ", np.min(arr))
    print("max: ", np.max(arr))
    print("median: ", np.median(arr))
    print("25% percentile: ", np.percentile(arr, 0.25))
    print("75% percentile: ", np.percentile(arr, 0.75))

In [7]:
arr = np.random.randint(1, 20, 100)
arr

array([12, 18, 10, 11, 19,  9,  5, 10, 18,  5,  6,  1, 17, 11,  8,  3,  1,
        2,  9, 11,  6,  5,  6, 18, 18,  3, 16,  3, 19,  3, 12, 17, 13, 11,
       19, 10,  1,  1,  7, 13,  9, 18,  2, 11, 16, 13, 11,  5, 11, 15,  9,
       11,  3,  6,  4, 16, 12,  8,  4,  1, 13,  3,  5,  9,  4, 19,  2, 18,
       17,  1,  8,  6,  3,  3, 14, 14,  6,  6, 13,  3,  1,  3, 10, 12,  8,
       10,  8,  2, 13, 16, 15, 18,  6,  9, 12,  1, 10,  7,  8,  3])

In [8]:
print_summary_stats(arr)

sum:  911
product:  0
mean:  9.11
std:  5.477033868801617
min:  1
max:  19
median:  9.0
25% percentile:  1.0
75% percentile:  1.0


In [9]:
arr = np.array([1, -1, 0, 2, -2])

print(np.any(arr > 0))  # 任一元素满足条件返回True
print(np.all(arr > 0))  # 所有元素满足条件返回True
print(np.where(arr > 0))  # 返回满足条件的元素的索引

True
False
(array([0, 3]),)


## 3. 快速排序

In [10]:
arr = np.random.randint(1, 100, 10)
arr

array([81, 25, 42, 32, 35,  9, 31, 79, 26, 19])

用np.sort函数或arr.sort方法实现快速排序

In [11]:
np.sort(arr)

array([ 9, 19, 25, 26, 31, 32, 35, 42, 79, 81])

In [16]:
arr.sort()  # 直接修改array
arr

array([ 9, 19, 25, 26, 31, 32, 35, 42, 79, 81])

np.sort可以对多维数组按维度进行排序

In [18]:
M = np.random.randint(1, 100, (6,3))
M

array([[73, 10, 49],
       [18, 99, 79],
       [12, 20, 52],
       [11, 91, 45],
       [93,  1, 84],
       [15, 81, 55]])

In [19]:
np.sort(M, axis=0)  # 按列排序

array([[11,  1, 45],
       [12, 10, 49],
       [15, 20, 52],
       [18, 81, 55],
       [73, 91, 79],
       [93, 99, 84]])

In [20]:
np.sort(M, axis=1)  # 按行排序

array([[10, 49, 73],
       [18, 79, 99],
       [12, 20, 52],
       [11, 45, 91],
       [ 1, 84, 93],
       [15, 55, 81]])

查找最小的k个元素，用np.partition函数实现

In [22]:
arr2 = np.random.randint(1, 100, 10)
arr2

array([92, 10, 55,  2, 35, 55, 43, 24, 99, 76])

In [25]:
np.partition(arr2, 3)  # 查找最小的3个元素，剩余元素按任意顺序排列

array([10,  2, 24, 35, 43, 55, 55, 76, 99, 92])