# 04 - Numpy

- 创建numpy array
- array的索引、切片
- array的基本操作
- array的运算

In [None]:
import numpy as np

## 创建array

In [None]:
# with sequence
np.array([0, 1, 2, 3, 4])

In [None]:
# with iterable
np.array(range(10))

In [None]:
# with builtin
np.arange(10)

In [None]:
# np.linspace(from, to, number of values)
np.linspace(0, 10, 11)

In [None]:
np.zeros(shape=(3, 4))

In [None]:
np.ones(shape=(3, 4))

In [None]:
# random numbers
np.random.binomial(n=10, p=0.5, size=(2,3))

In [None]:
x = np.random.default_rng(2025)
x.binomial(10, 0.5, size=(2, 3))

In [None]:
np.random.normal(loc=0, scale=1, size=(3, 3))

In [None]:
np.random.standard_normal(size=(3, 3))

--------------

🙋**练习**

下面两段代码多运行几次看看结果有什么区别，为什么会这样？
```python
# code snippet 1
np.random.binomial(n=10, p=0.5, size=(2,3))
```

```python
# code snippet 2
x = np.random.default_rng(2025)
x.binomial(10, 0.5, size=(2, 3))
```
--------------

## subetting (index & slice)

In [None]:
(x := np.arange(12))

In [None]:
x.reshape(3, 4)

In [None]:
(x := x.reshape(3, 4))

In [None]:
# x[row, col]
x[1, 2]

In [None]:
x.dtype

In [None]:
x[1, ]

In [None]:
x[, 1]

In [None]:
# numpy array中行列的索引切片语法与列表的切片索引类似
x[:, 1]

In [None]:
x[:2, 1:]

In [None]:
# boolean indexing
x > 5

In [None]:
x[x > 5]

In [None]:
x

In [None]:
# fancy indexing
x[np.arange(3), np.arange(3)]

In [None]:
x[[0, 0, 1, 2], [1, 1, 2, 3]]

In [None]:
# index、slicing可以与赋值结合
x[:2, 1:] = 0
x

--------------

🙋**练习**

这段代码的输出是什么？

```python
x = np.arange(12).reshape(3, 4)
x[::2, ::-1]
```

---------------

## 基本操作

In [None]:
x = np.arange(3*4).reshape(3, 4)
x

In [None]:
x = x.astype(float)
x

In [None]:
x.dtype

In [None]:
x.reshape(-1, 2, 2)
# 维度与括号的对应关系：左侧对应外层
# "-1" 表示不显示给出，让python自己判断

In [None]:
x.size

In [None]:
x.shape

In [None]:
x.T  # or x.transpose()

In [None]:
x

In [None]:
np.concatenate((x, x), axis=0)  # axis=0表示垂直方向，两个x成为不同的行

In [None]:
np.concatenate((x, x), axis=1)  # axis=1表示水平方向，两个x成为不同的列

In [None]:
np.vstack((x, x))

In [None]:
np.hstack((x, x))

In [None]:
np.hsplit(np.hstack((x, x)), 2)

--------------

🙋**练习**

`np.vstack`的逆操作是什么？猜一下对应的函数名，尝试使用。

--------------

## array 运算

In [None]:
x = np.arange(10_0000)
y = np.random.normal(size=10_0000)
# 如何求x与y对应元素相乘之后的总和呢？

In [None]:
x.shape, y.shape

In [None]:
z = 0
for i in range(x.size):
    z += x[i] * y[i]
z

In [None]:
np.sum(x * y)

In [None]:
np.dot(x, y)

哪种方法更快？

In [None]:
%%timeit -n 10 -r 10
z = 0
for i in range(x.size):
    z += x[i] * y[i]

In [None]:
%timeit -n1000 -r10 np.sum(x * y)

In [None]:
%timeit -n1000 -r10 np.dot(x, y)

**启示**： numpy对常用的运算进行了优化，同样的运算用numpy内置的方法比用基础的python函数效率更高！

进一步了解：https://numpy.org/doc/stable/reference/ufuncs.html#ufuncs

### 两个array运算时的broadcasting

对于两个array之间的基本运算，大多数情况下都是对应元素之间逐一进行运算，但当两个array形状不一样时，需要通过broadcasting机制变成相同的形状进行运算，基本规则如下：

- 维度少的：左侧增加维度，基数为1，例如三元向量变成1x3的数组；
- 维度不同的情况下，如果其中一个array的行数或列数是1，那么通过复制变成相同的行数或列数；

https://numpy.org/doc/stable/user/basics.broadcasting.html

![](https://numpy.org/doc/stable/_images/broadcasting_1.png)

In [None]:
a = np.array([1, 2, 3])
b = 2
a * b

In [None]:
# 以上单元格最后一步运算的具体流程
b2 = [2, 2, 2]  # 1 -> 3
a * b2          # elementwise add/multiply

![](https://numpy.org/doc/stable/_images/broadcasting_2.png)

In [None]:
a = np.array([
    [ 0.,  0.,  0.],
    [10., 10., 10.],
    [20., 20., 20.],
    [30., 30., 30.]])

b = np.array([1, 2, 3])
a, b

In [None]:
a + b

In [None]:
# 以上单元格最后一步运算的具体流程

b2 = b.reshape(1, 3)          # 3 -> 1x3
b3 = np.vstack([b2]*4)        # 1x3 -> 4x3
a + b3                        # elementwise add

# 还记得列表的乘法运算吗？

--------------

🙋**练习**

如何用下面的x、y、z，根据broadcasting规则进行运算，得到该例子中的矩阵`a`？
```python
x = np.array([0, 1, 2, 3])
y = np.ones(3)
z = 10
```

--------------

![](https://numpy.org/doc/stable/_images/broadcasting_4.png)

In [None]:
a = np.array([[0], [10], [20], [30]])
b = np.array([1, 2, 3])
a, b

In [None]:
a + b

--------------

🙋**练习**

参考前面的例子，根据broadcasting的规则，写出这个例子中与broadcasting的详细过程。

--------------

## 拓展

- [numpy中的的线性代数相关函数](https://numpy.org/doc/stable/reference/routines.linalg.html)
- [字符串array](https://numpy.org/doc/stable/user/basics.strings.html)

--------------

🙋**练习**

解释为什么下面的例子中a与b最后完全相等，用什么dtype类型可以避免这种情况？
```python
a = np.array(['a', 'b', 'c'])
b = np.array(['a', 'b', 'd'])
b[2] = 'cat'
a == b
```

---------------

## homework

自学Andrew Ng[《神经网络与深度学习》](https://www.coursera.org/learn/neural-networks-deep-learning?specialization=deep-learning#modules)第二单元的内容，根据课程内容，用numpy实现Logistic regression参数的梯度下降法(gradient descent)求解。回归模型可以用以下X与Y作为训练数据：

```python
import numpy as np
import pandas as pd
import seaborn as sns

iris = sns.load_dataset('iris')
iris = iris[iris['species'].isin(['setosa', 'virginica'])]
X = iris.iloc[:,0:4].to_numpy().T
Y = (iris['species'] == 'setosa').to_numpy().astype(int)
```
提示：如果load_dataset失败的话可以从[这里](https://raw.githubusercontent.com/mwaskom/seaborn-data/refs/heads/master/iris.csv)或本课程的homework文件夹直接下载数据文件。