<a href="https://colab.research.google.com/github/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> 

[View Source Code](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/00_pytorch_fundamentals.ipynb) | [View Slides](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/slides/00_pytorch_and_deep_learning_fundamentals.pdf) | [Watch Video Walkthrough](https://youtu.be/Z_ikDlimN6A?t=76) 

# 00. PyTorch 基础

## 什么是 PyTorch?

[PyTorch](https://pytorch.org/) 是一个开源的机器学习和深度学习框架。

## PyTorch 能干嘛?

PyTorch 协助你用Python操作处理数据和编写机器学习算法。

## 谁在用 PyTorch?

许多大的科技公司例如 [Meta (Facebook)](https://ai.facebook.com/blog/pytorch-builds-the-future-of-ai-and-machine-learning-at-facebook/), Tesla and Microsoft 以及 AI研究公司例如 [OpenAI use PyTorch](https://openai.com/blog/openai-pytorch/)。

![pytorch being used across industry and research](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-pytorch-being-used-across-research-and-industry.png)

For example, Andrej Karpathy (head of AI at Tesla) has given several talks ([PyTorch DevCon 2019](https://youtu.be/oBklltKXtDE), [Tesla AI Day 2021](https://youtu.be/j0z4FweCy4M?t=2904)) about how Tesla use PyTorch to power their self-driving computer vision models.

PyTorch is also used in other industries such as argiculture to [power computer vision on tractors](https://medium.com/pytorch/ai-for-ag-production-machine-learning-for-agriculture-e8cfdb9849a1).

## Why use PyTorch?

机器学习的研究者喜欢用 PyTorch. 至2022年2月, PyTorch 是 [most used deep learning framework on Papers With Code](https://paperswithcode.com/trends), 该网站追踪机器学习的研究和相关的Github仓库。

PyTorch 还在幕后帮助实现了很多东西，例如GPU加速。
因此你可以专注操作数据和实现算法。

如果特斯拉（Tesla）和Meta（Facebook）等公司用它来构建他们部署的模型，为数百个应用程序提供动力，驱动数千辆汽车并向数十亿人提供内容，那么它在开发方面显然也是有能力的。


## 本模块介绍的内容

本课程分为许多小节 (notebooks形式). 

每个notebook都包含PyTorch的一些重要概念.

随后的笔记本建立在前一个笔记本的知识之上（编号从00，01，02开始，然后转到它最终要去的任何东西）。

本notebook涉及机器学习和深度学习的基本构建块，即张量。

具体来说，我们将介绍：

| **主题** | **内容** |
| ----- | ----- |
| **tensors简介** | Tensors 是深度学习中最基本的模块|
| **创建tensors** | Tensors可以表示任何数据(图片, 单词, 表格)。|
| **从tensors中获取信息** | 你可以将信息存入tensor，也可以从tensor中取出。 |
| **操作 tensors** | 机器学习算法 (如神经网络(neural networks)) 设计许多tensor操作，如加（adding）,乘（ multiplying）,合并（combining）。| 
| **处理 tensor 形状（shapes）** | 机器学习的一个常见问题是处理形状不匹配  (试图混合错误形状的tensor)。 |
| **tensors上的索引(index)** | 和Python list 或 NumPy 的array非常类似, 你也可以在tensors上索引, 只是tensor可以有更多维度（dimensions）。 |
| **混合 PyTorch tensors 和 NumPy** | PyTorch plays with tensors ([`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html)), NumPy likes arrays ([`np.ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)) 有时候你需要在两者之间转换. | 
| **再现性（Reproducibility）** | 机器学习非常具有实验性质的，并且因为它经常使用*随机性（randomness）* 来工作， 有时你希望 *randomness* 不那么随机 |
| **在 GPU 上运行tensors** | GPUs (Graphics Processing Units) 计算速度更快, PyTorch 让你可以轻易在GPU上运行你的代码。 |

## 在哪可以获得帮助?

课程材料在 [live on GitHub](https://github.com/mrdbourke/pytorch-deep-learning)。

遇到问题可以在 [Discussions page](https://github.com/mrdbourke/pytorch-deep-learning/discussions) 提问。

此外， [PyTorch developer forums](https://discuss.pytorch.org/)有非常多关于PyTorch的有用信息。 

## Importing PyTorch
>**Note:**在notebook运行代码前, 你需要安装PyTorch [PyTorch setup steps](https://pytorch.org/get-started/locally/) 
>
>然而, **如果你使用Google Colab**, 代码应该可以直接运行 (Google Colab 已经有了 PyTorch 和其它库)。

让我们导入pytorch并查看其版本。

In [1]:
import torch
torch.__version__

'1.10.2'

很好，我们有了 PyTorch 1.10.0+. 

本课程和PyTorch 1.10.0+兼容, 如果你的版本比这个版本高太多, 可能有一些不一致。 

如果有问题，可以在讨论区讨论： [GitHub Discussions page](https://github.com/mrdbourke/pytorch-deep-learning/discussions).

## Introduction to tensors 

现在我们已经导入了torch, 现在让我们开始学习 tensors.

Tensors是机器学习中最基本的模块.

它们的作用是将数据数字化。

例如， 你可以用一个形状为 `[3, 224, 224]`的tensor表示图片，其中三个参数代表`[colour_channels, height, width]`, 该图像有 `3` 个颜色通道(red, green, blue), 高度为 `224` 像素 且宽度为 `224` 像素。

![example of going from an input image to a tensor representation of the image, image gets broken down into 3 colour channels as well as numbers to represent the height and width](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-tensor-shape-example-of-image.png)

In tensor-speak (用描述tensors的语言说), 这个tensor 有3个维度(dimensions), 分别是 `colour_channels`, `height` and `width`。

让我们开始用代码说明吧。


### 创建 tensors 

PyTorch 非常喜欢用tensors，以至于有一个专门的文档来说明Tensor类 [`torch.Tensor`](https://pytorch.org/docs/stable/generated/torch.tensor.html#torch.tensor)。

你的第一个作业是[阅读文档： `torch.Tensor`](https://pytorch.org/docs/stable/generated/torch.tensor.html#torch.tensor)10分钟。 但是你可以晚点再看.

Let's code.

我们首先创建一个**标量（scalar）**.

一个标量是一个单独的数字 ，用tensor-speak的方式说， 标量是一个0维的tensor。

> **Note:** 这是这门课程的一个趋势。我们将专注于编写特定的代码。但我经常会设置一些练习，包括阅读和熟悉PyTorch文档。毕竟，一旦你完成了这门课程，你无疑会想了解更多。而文档提供了更多信息。

In [2]:
# Scalar
scalar = torch.tensor(7)
scalar

tensor(7)

看到了上面的输出 `tensor(7)`？

尽管 `scalar` 是一个单独的数字, 它的类型是 `torch.Tensor`.

可以用 `ndim` 属性查看一个tensor的维度。

In [3]:
scalar.ndim

0

怎么从tensor中取回数字呢？

或者说怎么从 `torch.Tensor` 变为 Python 整数呢?

可以用`item()`方法。

In [4]:
# 从 `torch.Tensor`获得 Python 整数 (只对单元素tensors有效)
scalar.item()

7

接着，来看看 **向量（vector）**。

一个向量是一个1维tensor，可以包含许多数字。

例如, 可以有一个向量 `[3, 2]` 来描述 `[bedrooms, bathrooms]` 在你的房子中. 用 `[3, 2, 2]` 描述 `[bedrooms, bathrooms, car_parks]` 在你的房子中。

可以看到向量的表示是非常灵活的(tensors也是)。

In [5]:
# Vector
vector = torch.tensor([7, 7])
vector

tensor([7, 7])

很棒, `vector` 有两个 7, 我最喜欢的数字。

你觉得它有几个维度呢？

In [6]:
# Check the number of dimensions of vector
vector.ndim

1

Hmm，`vector` 有两个数字却只有一个维度。

告诉你一个技巧来理解这件事。

你可以通过数最外层的`[`来知道tensor的维度，

 `vector` 最外层有几个`[`？

 `shape`属性是另一个tensor的重要属性. shape告诉您其中的元素是如何排列的。

让我们看看 `vector`的shape。

In [7]:
# Check shape of vector
vector.shape

torch.Size([2])

结果为 `torch.Size([2])` ，意味着形状是 `[2]`. 因为方括号`[]`中我们有2个元素 (`[7, 7]`).

接下来看看**矩阵（matrix）**.

In [8]:
# Matrix
MATRIX = torch.tensor([[7, 8], 
                       [9, 10]])
MATRIX

tensor([[ 7,  8],
        [ 9, 10]])

数字更多了。matrix和vector一样灵活，但是matrix有2个维度。



In [9]:
# Check number of dimensions
MATRIX.ndim

2

`MATRIX` 有2个维度(你可以数一数最左侧的`[`数量).

那么它的 `shape` 是多少?

In [10]:
MATRIX.shape

torch.Size([2, 2])

输出为 `torch.Size([2, 2])` 因为 `MATRIX` 深为2，宽为2。

**tensor**又是什么呢?

In [11]:
# Tensor
TENSOR = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [2, 4, 5]]])
TENSOR

tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5]]])

输出了一个漂亮的tensor。

我想说tensor几乎可以表示任何东西。 

我们刚刚创建tensor的可能是牛排、杏仁黄油（我最喜欢的两种食物）的销售数字。

![a simple tensor in google sheets showing day of week, steak sales and almond butter sales](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00_simple_tensor.png)

这个tensor的维度? (hint: 使用`[`技巧)

In [12]:
# Check number of dimensions for TENSOR
TENSOR.ndim

3

shape?

In [13]:
# Check shape of TENSOR
TENSOR.shape

torch.Size([1, 3, 3])

输出为 `torch.Size([1, 3, 3])`.

维度从外到内。

That means there's 1 dimension of 3 by 3.

![example of different tensor dimensions](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-pytorch-different-tensor-dimensions.png)

> **Note:** 您可能已经注意到我使用小写字母表示 `scalar` 和`vector` ，大写字母表示`MATRIX`和`TENSOR`。这是故意的。在实践中，您经常会看到标量和向量表示为小写字母，例如“y”或“a”。矩阵和张量表示为大写字母，如“X”或“W”。
>
> 你可能也注意到了martrix and tensor 可以交替使用。 这很常见。在 PyTorch 经常处理 `torch.Tensor`'s (hence the tensor name), 其内部的形状和尺寸将决定它的实际情况。
总结一下。

| 名字 | 是什么? | 维度dimensions | 大写或小写(通常/例子) |
| ----- | ----- | ----- | ----- |
| **scalar** |单独的数字 | 0 | Lower (`a`) | 
| **vector** | 有方向的量(e.g. 有方向的风速) ，也可以包含很多数字| 1 | Lower (`y`) |
| **matrix** | 2维数组| 2 | Upper (`Q`) |
| **tensor** | n维度数组| n=0时是scalar, n=1时是vector | Upper (`X`) | 

![scalar vector matrix tensor and what they look like](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-scalar-vector-matrix-tensor.png)

### 随机 tensors

我们已经创建了一些tensors.

机器学习模型通常是操作和学习tensors。

但是在创建模型时， 很少手动创建tensors(上面所做的).

通常是从随机tensors开始学习。

基本上：

`从随机数字开始 -> 查看数据-> 更新数字 -> 查看数据 -> 更新数字...`

你可以定义模型的初始状态(初始化), 查看数据 (表示(representation)) 并 更新 (优化(optimization))。



现在，我们看看如何创建随机tensors。

使用 [`torch.rand()`](https://pytorch.org/docs/stable/generated/torch.rand.html) 并传入 `size` 参数。

In [38]:
# Create a random tensor of size (3, 4)
random_tensor = torch.rand(size=(3, 4))
random_tensor, random_tensor.dtype

(tensor([[0.3630, 0.7051, 0.0136, 0.0967],
         [0.4488, 0.5798, 0.6672, 0.1789],
         [0.4929, 0.0945, 0.6258, 0.0679]]),
 torch.float32)

可以通过 `size` 获取不同形状的tensor。

例如： `[224, 224, 3]` (`[height, width, color_channels`]).

In [39]:
# Create a random tensor of size (224, 224, 3)
random_image_size_tensor = torch.rand(size=(224, 224, 3))
random_image_size_tensor.shape, random_image_size_tensor.ndim

(torch.Size([224, 224, 3]), 3)

### Zeros and ones

有时候你想用0或1填充。

在掩蔽操作(masking)时很有用。 (like masking some of the values in one tensor with zeros to let a model know not to learn them).

想创建全零tensor可以用 [`torch.zeros()`](https://pytorch.org/docs/stable/generated/torch.zeros.html)

同样, 可用 `size` 参数设置形状。

In [40]:
# Create a tensor of all zeros
zeros = torch.zeros(size=(3, 4))
zeros, zeros.dtype

(tensor([[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]),
 torch.float32)

类似地，可以用 [`torch.ones()` ](https://pytorch.org/docs/stable/generated/torch.ones.html) 创建全一tensor。

In [41]:
# Create a tensor of all ones
ones = torch.ones(size=(3, 4))
ones, ones.dtype

(tensor([[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]),
 torch.float32)

### Creating a range and tensors like

有时候想创建一列数字, 例如从 1 到 10 或者从 0 到 100。

可以用 `torch.arange(start, end, step)` 实现。

其中：
* `start` = start of range (e.g. 0)
* `end` = end of range (e.g. 10)
* `step` = how many steps in between each value (e.g. 1)

> **Note:** Python中用 `range()` 创建 range.但在 PyTorch, `torch.range()` 不推荐(deprecated )，并且未来可能移除。

In [42]:
# Use torch.arange(), torch.range() is deprecated 
zero_to_ten_deprecated = torch.range(0, 10) # Note: this may return an error in the future

# Create a range of values 0 to 10
zero_to_ten = torch.arange(start=0, end=10, step=1)
zero_to_ten

  zero_to_ten_deprecated = torch.range(0, 10) # Note: this may return an error in the future


tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

有时候想创建一个和某个tensor形状一样的tensor。

例如，一个全零的tensor，和某个tensor形状一样。

可以用 [`torch.zeros_like(input)`](https://pytorch.org/docs/stable/generated/torch.zeros_like.html) 或 [`torch.ones_like(input)`](https://pytorch.org/docs/1.9.1/generated/torch.ones_like.html) 来得到和`input` 形状相同的全0或全1 tensor。

In [43]:
# Can also create a tensor of zeros similar to another tensor
ten_zeros = torch.zeros_like(input=zero_to_ten) # will have same shape
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

### Tensor 数据类型 Tensor datatypes

Pytoch中有很多不同的datetype [tensor datatypes available in PyTorch](https://pytorch.org/docs/stable/tensors.html#data-types).

有些是 CPU 上的，有些是 GPU上的。

想全部了解需要花费时间。

一般地， 如果看到 `torch.cuda` , 则该tensor是在GPU上的 (Nvidia GPUs 使用的计算组件名称为 CUDA)。

最常见的 (并且通常是默认的) 类型是 `torch.float32` or `torch.float`.

指 "32-bit floating point".

但是也有 16-bit floating point (`torch.float16` or `torch.half`) 和 64-bit floating point (`torch.float64` or `torch.double`)。

并且还有 8-bit, 16-bit, 32-bit and 64-bit integers.

Plus more!

> **Note:** An integer is a flat round number like `7` whereas a float has a decimal `7.0`.

设置这么多类型的目的是控制计算的精度 **precision in computing**.


精度越高 (8, 16, 32), 则描述的越细致。

但精度越高计算也会越复杂。

所以低精度计算通常更快(更快的计算但更低的精度).

> **Resources:** 
  * See the [PyTorch documentation for a list of all available tensor datatypes](https://pytorch.org/docs/stable/tensors.html#data-types).
  * Read the [Wikipedia page for an overview of what precision in computing](https://en.wikipedia.org/wiki/Precision_(computer_science)) is.

让我们看看如何创建不同类型的tensor。可以用 `dtype`参数。

In [44]:
# Default datatype for tensors is float32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None, # defaults to None, which uses the default tensor type
                               requires_grad=False) # if True, operations perfromed on the tensor are recorded 

float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

除了形状不匹配问题(tensor shapes don't match up), 另外两个常见问题是 datatype and device。

例如, 一个tensor是`torch.float32` 而另一个是 `torch.float16` (PyTorch often likes tensors to be the same format).

或者 一个在 CPU 而另一个在 GPU。 (PyTorch likes calculations between tensors to be on the same device).


我们来创建一个 `dtype=torch.float16`的tensor。

In [45]:
float_16_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=torch.float16) # torch.half would also work

float_16_tensor.dtype

torch.float16

## 从 tensors 获取信息

既然创建了tensor，就需要从中获取信息。

tensor最常用的三个属性是：
* `shape` - tensor的形状? (some operations require specific shape rules)
* `dtype` - tensor中元素的类型?
* `device` - tensor保存在什么设备上？(一般是 GPU 或 CPU)

让我们尝试一下：

In [46]:
# Create a tensor
some_tensor = torch.rand(3, 4)

# Find out details about it
print(some_tensor)
print(f"Shape of tensor: {some_tensor.shape}")
print(f"Datatype of tensor: {some_tensor.dtype}")
print(f"Device tensor is stored on: {some_tensor.device}") # will default to CPU

tensor([[0.1381, 0.3072, 0.6008, 0.9146],
        [0.2293, 0.3471, 0.4247, 0.0779],
        [0.4762, 0.0860, 0.6812, 0.5242]])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


> **Note:** 你在PyTorch遇到的问题通常和上面的三个属性有关， 因此当报错信息出现时, 请关注这三个属性，也就是"what, what, where": 
  * "*what shape are my tensors? what datatype are they and where are they stored? what shape, what datatype, where where where*"

## 操作 tensors (tensor operations)

深度学习中, 数据(images, text, video, audio, protein structures, etc) 需要用tensors表示。

模型查看这些tensors并且在上面执行一些操作 (could be 1,000,000s+)去学习数据的模式。

这些操作包括；
* 加 Addition
* 减 Substraction
* 乘 Multiplication (element-wise)
* 除 Division
* 矩阵乘法  Matrix multiplication

当然还有更多操作，但以上操作是最基本的。

### 基本操作

让我们从基本操作开始， addition (`+`), subtraction (`-`), mutliplication (`*`).

就像你认为的那样。

In [47]:
# Create a tensor of values and add a number to it
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [48]:
# Multiply it by 10
tensor * 10

tensor([10, 20, 30])

注意上面的值并没有变成`tensor([110, 120, 130])`, 因为tensor的值不会改变，除非重新赋值(reassigned)。

In [49]:
# Tensors don't change unless reassigned
tensor

tensor([1, 2, 3])

让我们试一下减法，并且这次我们对这个 `tensor` 进行重新赋值。 

In [50]:
# Subtract and reassign
tensor = tensor - 10
tensor

tensor([-9, -8, -7])

In [27]:
# Add and reassign
tensor = tensor + 10
tensor

tensor([1, 2, 3])

PyTorch 同样有内置函数如 [`torch.mul()`](https://pytorch.org/docs/stable/generated/torch.mul.html#torch.mul) (short for multiplcation) 和[`torch.add()`](https://pytorch.org/docs/stable/generated/torch.add.html) 来执行这些基本操作。

In [28]:
# Can also use torch functions
torch.multiply(tensor, 10)

tensor([10, 20, 30])

In [29]:
# Original tensor is still unchanged 
tensor

tensor([1, 2, 3])

但是一般用 `*` 而不是 `torch.mul()`

In [30]:
# Element-wise multiplication (each element multiplies its equivalent, index 0->0, 1->1, 2->2)
print(tensor, "*", tensor)
print("Equals:", tensor * tensor)

tensor([1, 2, 3]) * tensor([1, 2, 3])
Equals: tensor([1, 4, 9])


### 矩阵 乘法 Matrix multiplication (is all you need)

深度学习中最常用的操作之一是 [matrix multiplication](https://www.mathsisfun.com/algebra/matrix-multiplying.html).

PyTorch 在 [`torch.matmul()`](https://pytorch.org/docs/stable/generated/torch.matmul.html) 实现了这个方法。

矩阵乘法的最主要的两条规则是：
1. 内部维度匹配 The **inner dimensions** must match:
  * `(3, 2) @ (3, 2)` won't work
  * `(2, 3) @ (3, 2)` will work
  * `(3, 2) @ (2, 3)` will work
2. 结果是外部维度 The resulting matrix has the shape of the **outer dimensions**:
 * `(2, 3) @ (3, 2)` -> `(2, 2)`
 * `(3, 2) @ (2, 3)` -> `(3, 3)`

> **Note:** "`@`" 是PyTorch中矩阵乘法的符号.

> **Resource:** 矩阵乘法相关规则： `torch.matmul()` [in the PyTorch documentation](https://pytorch.org/docs/stable/generated/torch.matmul.html).

让我们开始吧！

In [51]:
import torch
tensor = torch.tensor([1, 2, 3])
tensor.shape

torch.Size([3])

逐元素(element-wise)相乘 和矩阵乘法 的区别 是值是否累加.

For our `tensor` variable with values `[1, 2, 3]`:

| Operation | Calculation | Code |
| ----- | ----- | ----- |
| **Element-wise multiplication** | `[1*1, 2*2, 3*3]` = `[1, 4, 9]` | `tensor * tensor` |
| **Matrix multiplication** | `[1*1 + 2*2 + 3*3]` = `[14]` | `tensor.matmul(tensor)` |


In [52]:
# Element-wise matrix mutlication
tensor * tensor

tensor([1, 4, 9])

In [53]:
# Matrix multiplication
torch.matmul(tensor, tensor)

tensor(14)

In [54]:
# Can also use the "@" symbol for matrix multiplication, though not recommended
tensor @ tensor

tensor(14)

你可以手动实现矩阵乘法，但是非常不推荐。
内置的 `torch.matmul()` 速度更快。

In [55]:
%%time
# Matrix multiplication by hand 
# (avoid doing operations with for loops at all cost, they are computationally expensive)
value = 0
for i in range(len(tensor)):
  value += tensor[i] * tensor[i]
value

Wall time: 517 µs


tensor(14)

In [56]:
%%time
torch.matmul(tensor, tensor)

Wall time: 997 µs


tensor(14)

## 深度学习中最常见的错误 (shape errors)

由于矩阵乘法的两条规则，因此当形状不匹配时就会报错。

In [57]:
# Shapes need to be in the right way  
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11], 
                         [9, 12]], dtype=torch.float32)

torch.matmul(tensor_A, tensor_B) # (this will error)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

通过让`tensor_A` 和 `tensor_B` 内在维度匹配来改正错误。

实现的方法之一是通过转置 **transpose** (switch the dimensions of a given tensor).

可以通过下面任一方法实现转置：
* `torch.transpose(input, dim0, dim1)` - where `input` is the desired tensor to transpose and `dim0` and `dim1` are the dimensions to be swapped.
* `tensor.T` - where `tensor` is the desired tensor to transpose.

Let's try the latter.

In [58]:
# View tensor_A and tensor_B
print(tensor_A)
print(tensor_B)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7., 10.],
        [ 8., 11.],
        [ 9., 12.]])


In [59]:
# View tensor_A and tensor_B.T
print(tensor_A)
print(tensor_B.T)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


In [60]:
# The operation works when tensor_B is transposed
print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}\n")
print(f"New shapes: tensor_A = {tensor_A.shape} (same as above), tensor_B.T = {tensor_B.T.shape}\n")
print(f"Multiplying: {tensor_A.shape} * {tensor_B.T.shape} <- inner dimensions match\n")
print("Output:\n")
output = torch.matmul(tensor_A, tensor_B.T)
print(output) 
print(f"\nOutput shape: {output.shape}")

Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])

New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])

Multiplying: torch.Size([3, 2]) * torch.Size([2, 3]) <- inner dimensions match

Output:

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Output shape: torch.Size([3, 3])


可以用 [`torch.mm()`](https://pytorch.org/docs/stable/generated/torch.mm.html) ，它是 `torch.matmul()`的缩写。

In [62]:
# torch.mm is a shortcut for matmul
torch.mm(tensor_A, tensor_B.T)

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Without the transpose, the rules of matrix mulitplication aren't fulfilled and we get an error like above.

How about a visual? 

![visual demo of matrix multiplication](https://github.com/mrdbourke/pytorch-deep-learning/raw/main/images/00-matrix-multiply-crop.gif)

You can create your own matrix multiplication visuals like this at http://matrixmultiplication.xyz/.

> **Note:** A matrix multiplication like this is also referred to as the [**dot product**](https://www.mathsisfun.com/algebra/vectors-dot-product.html) of two matrices.



神经网络充满矩阵乘法和点乘。

[`torch.nn.Linear()`](https://pytorch.org/docs/1.9.1/generated/torch.nn.Linear.html) 模块 (we'll see this in action later on),也被称为feed-forward layer（前馈层） 或 fully connected layer（全连接层）, 实现了输入 `x` 和权重矩阵`A`的矩阵乘法。

$$
y = x\cdot{A^T} + b
$$

Where:
* `x` 该层的输入(深度学习经常有许多层，如 `torch.nn.Linear()` 堆叠起来).
* `A` 该层的权重矩阵, 通常是从随机数字开始，从数据中学习并调整权重。 (notice the "`T`", that's because the weights matrix gets transposed).
  * **Note:** You might also often see `W` or another letter like `X` used to showcase the weights matrix.
* `b` 偏置.
* `y` 输出 (a manipulation of the input in the hopes to discover patterns in it).

这是一个线性函数 (通常是 $y = mx+b$ 的形式)，画出来是一条直线。


试一试线性层，改变 `in_features` 和`out_features` 看看会发生什么。

In [65]:
# Since the linear layer starts with a random weights matrix, let's make it reproducible (more on this later)
torch.manual_seed(42)
# This uses matrix mutliplcation
linear = torch.nn.Linear(in_features=2, # in_features = matches inner dimension of input 
                         out_features=6) # out_features = describes outer value 
x = tensor_A
output = linear(x)
print(f"Input shape: {x.shape}\n")
print(f"Output:\n{output}\n\nOutput shape: {output.shape}")

Input shape: torch.Size([3, 2])

Output:
tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
        [4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
        [6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
       grad_fn=<AddmmBackward0>)

Output shape: torch.Size([3, 6])


> **Question:** What happens if you change `in_features` from 2 to 3 above? Does it error? How could you change the shape of the input (`x`) to accomodate to the error? Hint: what did we have to do to `tensor_B` above?

如果第一次接触矩阵乘法，上面的事情可能让你困惑。

熟悉之后你会发现这个概念很普遍。

Remember, matrix multiplication is all you need.

![matrix multiplication is all you need](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00_matrix_multiplication_is_all_you_need.jpeg)

*When you start digging into neural network layers and building your own, you'll find matrix multiplications everywhere. **Source:** https://marksaroufim.substack.com/p/working-class-deep-learner*

### Finding the min, max, mean, sum, etc (aggregation)

下面是一些汇聚tensor的方法 (go from more values to less values).

首先我们创建一个tensor 并且计算它的 max, min, mean and sum（最大值、最小值、均值、和）。



In [69]:
# Create a tensor
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

Now let's perform some aggregation.

In [67]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
# print(f"Mean: {x.mean()}") # this will error
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")

Minimum: 0
Maximum: 90
Mean: 45.0
Sum: 450


> **Note:** 一些方法如 `torch.mean()` 要求 tensors 类型为 `torch.float32` (the most common) 或者其他某种datatype, 否则无法操作。 

上面的汇聚操作，也可以用 `torch`相应方法实现。

In [None]:
torch.max(x), torch.min(x), torch.mean(x.type(torch.float32)), torch.sum(x)

### Positional min/max

使用 [`torch.argmax()`](https://pytorch.org/docs/stable/generated/torch.argmax.html) and [`torch.argmin()`](https://pytorch.org/docs/stable/generated/torch.argmin.html) ，可以获取最大值、最小值所在的位置。

该方法在我们只需要位置时很重要(我们会在后续章节中看到这一点 [softmax activation function](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html)).

In [70]:
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")

# Returns index of max and min values
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")

Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0


### 改变 datatype

如上所述，一个常见问题是 datatypes不合适。

如果一个tensor是 `torch.float64` 另一个是 `torch.float32`, 你可能会遇到错误。

下面是修复该问题的方法。

可用 [`torch.Tensor.type(dtype=None)`](https://pytorch.org/docs/stable/generated/torch.Tensor.type.html)修改datetype。 其中 `dtype` 参数为你想要的类型。

下面我们创建一个tensor检查其dtype (the default is `torch.float32`).

In [71]:
# Create a tensor and check its datatype
tensor = torch.arange(10., 100., 10.)
tensor.dtype

torch.float32

然后我们创建一个一样的tensor，但类型为 `torch.float16`.



In [73]:
# Create a float16 tensor
tensor_float16 = tensor.type(torch.float16)
tensor_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

再创建 `torch.int8` 类型。

In [74]:
# Create a int8 tensor
tensor_int8 = tensor.type(torch.int8)
tensor_int8

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)

> **Note:** 不同的类型可能让你困惑，但是考虑这么想：不同的类型是为了不同的精度(e.g. 32, 16, 8), 精度越低需要存储空间越小，计算越快。 Mobile-based neural networks 一般用 8-bit integers,与相应的 float32 相比， 更小更快但精度更低。 更多关于精度内容可见 [precision in computing](https://en.wikipedia.org/wiki/Precision_(computer_science)).

> **Exercise:** 更多tensor内容在文档中 [`torch.Tensor` documentation](https://pytorch.org/docs/stable/tensors.html), 推荐花10分钟浏览一下。

### Reshaping, stacking, squeezing and unsqueezing

很多时候想要改变tensor的形状，但不改变存储在其中的值。

下面的实现这个需求的方法：

| Method | One-line description |
| ----- | ----- |
| [`torch.reshape(input, shape)`](https://pytorch.org/docs/stable/generated/torch.reshape.html#torch.reshape) | Reshapes `input` to `shape` (if compatible), can also use `torch.Tensor.reshape()`. |
| [`torch.Tensor.view(shape)`](https://pytorch.org/docs/stable/generated/torch.Tensor.view.html) | Returns a view of the original tensor in a different `shape` but shares the same data as the original tensor. |
| [`torch.stack(tensors, dim=0)`](https://pytorch.org/docs/1.9.1/generated/torch.stack.html) | Concatenates a sequence of `tensors` along a new dimension (`dim`), all `tensors` must be same size. |
| [`torch.squeeze(input)`](https://pytorch.org/docs/stable/generated/torch.squeeze.html) | Squeezes `input` to remove all the dimenions with value `1`. |
| [`torch.unsqueeze(input, dim)`](https://pytorch.org/docs/1.9.1/generated/torch.unsqueeze.html) | Returns `input` with a dimension value of `1` added at `dim`. | 
| [`torch.permute(input, dims)`](https://pytorch.org/docs/stable/generated/torch.permute.html) | Returns a *view* of the original `input` with its dimensions permuted (rearranged) to `dims`. | 

为什么这么做?

因为深度学习模型 (neural networks)和操作tensor密切相关. 并且由于矩阵乘法的规则, 如果形状不匹配就会出错。这些方法可帮助您将张量的正确元素与其他张量的正确元素混合。

让我们试一试这些方法。
首先创建一个tensor。

In [75]:
# Create a tensor
import torch
x = torch.arange(1., 8.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7.]), torch.Size([7]))

用`torch.reshape()`.增加一个维度。

In [76]:
# Add an extra dimension
x_reshaped = x.reshape(1, 7)
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

用 `torch.view()`改变view。

In [77]:
# Change view (keeps same data as original but changes view)
# See more: https://stackoverflow.com/a/54507446/7900723
z = x.view(1, 7)
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

记住, 用 `torch.view()`改变view 只是创建了该tensor的一个新view。
改变该view也会改变原始的tensor。

In [78]:
# Changing z changes x
z[:, 0] = 5
z, x

(tensor([[5., 2., 3., 4., 5., 6., 7.]]), tensor([5., 2., 3., 4., 5., 6., 7.]))

如果想要堆叠tensor, 可用 `torch.stack()`.

In [79]:
# Stack tensors on top of each other
x_stacked = torch.stack([x, x, x, x], dim=0) # try changing dim to dim=1 and see what happens
x_stacked

tensor([[5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.]])

如何移除单独的维度?

可以用 `torch.squeeze()` (I remember this as *squeezing* the tensor to only have dimensions over 1).

In [80]:
print(f"Previous tensor: {x_reshaped}")
print(f"Previous shape: {x_reshaped.shape}")

# Remove extra dimension from x_reshaped
x_squeezed = x_reshaped.squeeze()
print(f"\nNew tensor: {x_squeezed}")
print(f"New shape: {x_squeezed.shape}")

Previous tensor: tensor([[5., 2., 3., 4., 5., 6., 7.]])
Previous shape: torch.Size([1, 7])

New tensor: tensor([5., 2., 3., 4., 5., 6., 7.])
New shape: torch.Size([7])


与 `torch.squeeze()` 相反，可用 `torch.unsqueeze()` 增加一个维度。

In [81]:
print(f"Previous tensor: {x_squeezed}")
print(f"Previous shape: {x_squeezed.shape}")

## Add an extra dimension with unsqueeze
x_unsqueezed = x_squeezed.unsqueeze(dim=0)
print(f"\nNew tensor: {x_unsqueezed}")
print(f"New shape: {x_unsqueezed.shape}")

Previous tensor: tensor([5., 2., 3., 4., 5., 6., 7.])
Previous shape: torch.Size([7])

New tensor: tensor([[5., 2., 3., 4., 5., 6., 7.]])
New shape: torch.Size([1, 7])


方法 `torch.permute(input, dims)` 可以重新修改轴的顺序，, 其中 `input` 用新的 `dims`变成新的 *view* 。

In [82]:
# Create tensor with specific shape
x_original = torch.rand(size=(224, 224, 3))

# Permute the original tensor to rearrange the axis order
x_permuted = x_original.permute(2, 0, 1) # shifts axis 0->1, 1->2, 2->0

print(f"Previous shape: {x_original.shape}")
print(f"New shape: {x_permuted.shape}")

Previous shape: torch.Size([224, 224, 3])
New shape: torch.Size([3, 224, 224])


> **Note**: 因为 permuting 返回的是 *view* (与原始tensor共享数据), 所以其中数据和原始tensor相同 并且你修改view中的数据也会影响原始tensor中的数据。

## 索引 Indexing (从tensors中取值)

有时候想从tensor中获取特定的数据 (例如，第一列或者第二行的数据).

可用索引（indexing）实现。

如果你对 Python lists或  NumPy arrays做过索引, 会发现PyTorch tensors 索引也非常类似。

In [83]:
# Create a tensor 
import torch
x = torch.arange(1, 10).reshape(1, 3, 3)
x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

Indexing 的顺序从外到内 outer dimension -> inner dimension (check out the square brackets).

In [84]:
# Let's index bracket by bracket
print(f"First square bracket:\n{x[0]}") 
print(f"Second square bracket: {x[0][0]}") 
print(f"Third square bracket: {x[0][0][0]}")

First square bracket:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
Second square bracket: tensor([1, 2, 3])
Third square bracket: 1


可用 `:` 表示该维度的所有值， 然后用`,` 继续表示下一维度。

In [85]:
# Get all values of 0th dimension and the 0 index of 1st dimension
x[:, 0]

tensor([[1, 2, 3]])

In [86]:
# Get all values of 0th & 1st dimensions but only index 1 of 2nd dimension
x[:, :, 1]

tensor([[2, 5, 8]])

In [87]:
# Get all values of the 0 dimension but only the 1 index value of the 1st and 2nd dimension
x[:, 1, 1]

tensor([5])

In [88]:
# Get index 0 of 0th and 1st dimension and all values of 2nd dimension 
x[0, 0, :] # same as x[0][0]

tensor([1, 2, 3])

Indexing can be quite confusing to begin with, especially with larger tensors (I still have to try indexing multiple times to get it right). But with a bit of practice and following the data explorer's motto (***visualize, visualize, visualize***), you'll start to get the hang of it.

## PyTorch tensors & NumPy

NumPy是非常流行的科学计算库，因此PyTorch提供了非常便利的NumPy交互。  

最常用是 在numpy array和 PyTorch tensor间转换的方法: 
* [`torch.from_numpy(ndarray)`](https://pytorch.org/docs/stable/generated/torch.from_numpy.html) - NumPy array -> PyTorch tensor. 
* [`torch.Tensor.numpy()`](https://pytorch.org/docs/stable/generated/torch.Tensor.numpy.html) - PyTorch tensor -> NumPy array.

Let's try them out.

In [89]:
# NumPy array to tensor
import torch
import numpy as np
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

> **Note:** 默认, NumPy arrays的datatype为 `float64` 转化成PyTorch tensor后会保持相同的datatype (as above). 
>
> 然而， PyTorch 大多使用 `float32`. 
> 
> 所以想要转换 NumPy array (float64) -> PyTorch tensor (float64) -> PyTorch tensor (float32), 最好用 `tensor = torch.from_numpy(array).type(torch.float32)`.

由于我们上面给 `tensor` 重新赋值了, 所以你改变array后， tensor不受影响。

In [90]:
# Change the array, keep the tensor
array = array + 1
array, tensor

(array([2., 3., 4., 5., 6., 7., 8.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

如果想从 PyTorch tensor 转换为 NumPy array，可以用 `tensor.numpy()`.

In [91]:
# Tensor to NumPy array
tensor = torch.ones(7) # create a tensor of ones with dtype=float32
numpy_tensor = tensor.numpy() # will be dtype=float32 unless changed
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

同样，改变 `tensor`,  `numpy_tensor` 不受影响。

In [92]:
# Change the tensor, keep the array the same
tensor = tensor + 1
tensor, numpy_tensor

(tensor([2., 2., 2., 2., 2., 2., 2.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

## 再现性 (trying to take the random out of random)

通常，使用rand等函数创建的tensor是随机的，每次生成的都不同。如果你不想要每次都不一样，可以手动设置随机种子。


As you learn more about neural networks and machine learning, you'll start to discover how much randomness plays a part.

Well, pseudorandomness that is. Because after all, as they're designed, a computer is fundamentally deterministic (each step is predictable) so the randomness they create are simulated randomness (though there is debate on this too, but since I'm not a computer scientist, I'll let you find out more yourself).

How does this relate to neural networks and deep learning then?

We've discussed neural networks start with random numbers to describe patterns in data (these numbers are poor descriptions) and try to improve those random numbers using tensor operations (and a few other things we haven't discussed yet) to better describe patterns in data.

In short: 

``start with random numbers -> tensor operations -> try to make better (again and again and again)``

Although randomness is nice and powerful, sometimes you'd like there to be a little less randomness.

Why?

So you can perform repeatable experiments.

For example, you create an algorithm capable of achieving X performance.

And then your friend tries it out to verify you're not crazy.

How could they do such a thing?

That's where **reproducibility** comes in.

In other words, can you get the same (or very similar) results on your computer running the same code as I get on mine?

Let's see a brief example of reproducibility in PyTorch.

We'll start by creating two random tensors, since they're random, you'd expect them to be different right? 

In [94]:
import torch

# Create two random tensors
random_tensor_A = torch.rand(3, 4)
random_tensor_B = torch.rand(3, 4)

print(f"Tensor A:\n{random_tensor_A}\n")
print(f"Tensor B:\n{random_tensor_B}\n")
print(f"Does Tensor A equal Tensor B? (anywhere)")
random_tensor_A == random_tensor_B

Tensor A:
tensor([[0.4563, 0.9719, 0.3968, 0.1496],
        [0.4743, 0.9973, 0.4436, 0.9726],
        [0.5194, 0.5337, 0.7050, 0.3362]])

Tensor B:
tensor([[0.7891, 0.1694, 0.1800, 0.7177],
        [0.6988, 0.5510, 0.2485, 0.8518],
        [0.0963, 0.1338, 0.2741, 0.6142]])

Does Tensor A equal Tensor B? (anywhere)


tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])

毫无疑问，两次随机的tensor不同。

但如果你想要创建两个相同的随机tensor。

可以用 [`torch.manual_seed(seed)`](https://pytorch.org/docs/stable/generated/torch.manual_seed.html), 其中 `seed` 是一个整数 (like `42` but it could be anything) 生成随机性.

Let's try it out by creating some more *flavoured* random tensors.

In [95]:
import torch
import random

# # Set the random seed
RANDOM_SEED=42 # try changing this to different values and see what happens to the numbers below
torch.manual_seed(seed=RANDOM_SEED) 
random_tensor_C = torch.rand(3, 4)

# Have to reset the seed every time a new rand() is called 
# Without this, tensor_D would be different to tensor_C 
torch.random.manual_seed(seed=RANDOM_SEED) # try commenting this line out and seeing what happens
random_tensor_D = torch.rand(3, 4)

print(f"Tensor C:\n{random_tensor_C}\n")
print(f"Tensor D:\n{random_tensor_D}\n")
print(f"Does Tensor C equal Tensor D? (anywhere)")
random_tensor_C == random_tensor_D

Tensor C:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Tensor D:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Does Tensor C equal Tensor D? (anywhere)


tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])

Nice!

看起来随机种子工作了。 

> **Resource:** What we've just covered only scratches the surface of reproducibility in PyTorch. For more, on reproducbility in general and random seeds, I'd checkout:
> * [The PyTorch reproducibility documentation](https://pytorch.org/docs/stable/notes/randomness.html) (a good exericse would be to read through this for 10-minutes and even if you don't understand it now, being aware of it is important).
> * [The Wikipedia random seed page](https://en.wikipedia.org/wiki/Random_seed) (this'll give a good overview of random seeds and pseudorandomness in general).

## 在GPUs 上运行tensors(计算速度更快)

Deep learning algorithms require a lot of numerical operations.

And by default these operations are often done on a CPU (computer processing unit).

However, there's another common piece of hardware called a GPU (graphics processing unit), which is often much faster at performing the specific types of operations neural networks need (matrix multiplications) than CPUs.

Your computer might have one.

If so, you should look to use it whenever you can to train neural networks because chances are it'll speed up the training time dramatically.

There are a few ways to first get access to a GPU and secondly get PyTorch to use the GPU.

> **Note:** When I reference "GPU" throughout this course, I'm referencing a [Nvidia GPU with CUDA](https://developer.nvidia.com/cuda-gpus) enabled (CUDA is a computing platform and API that helps allow GPUs be used for general purpose computing & not just graphics) unless otherwise specified.




### 1. Getting a GPU

You may already know what's going on when I say GPU. But if not, there are a few ways to get access to one.

| **Method** | **Difficulty to setup** | **Pros** | **Cons** | **How to setup** |
| ----- | ----- | ----- | ----- | ----- |
| Google Colab | Easy | Free to use, almost zero setup required, can share work with others as easy as a link | Doesn't save your data outputs, limited compute, subject to timeouts | [Follow the Google Colab Guide](https://colab.research.google.com/notebooks/gpu.ipynb) |
| Use your own | Medium | Run everything locally on your own machine | GPUs aren't free, require upfront cost | Follow the [PyTorch installation guidelines](https://pytorch.org/get-started/locally/) |
| Cloud computing (AWS, GCP, Azure) | Medium-Hard | Small upfront cost, access to almost infinite compute | Can get expensive if running continually, takes some time ot setup right | Follow the [PyTorch installation guidelines](https://pytorch.org/get-started/cloud-partners/) |

There are more options for using GPUs but the above three will suffice for now.

Personally, I use a combination of Google Colab and my own personal computer for small scale experiments (and creating this course) and go to cloud resources when I need more compute power.

> **Resource:** If you're looking to purchase a GPU of your own but not sure what to get, [Tim Dettmers has an excellent guide](https://timdettmers.com/2020/09/07/which-gpu-for-deep-learning/).

To check if you've got access to a Nvidia GPU, you can run `!nvidia-smi` where the `!` (also called bang) means "run this on the command line".



In [99]:
!nvidia-smi

'nvidia-smi' 不是内部或外部命令，也不是可运行的程序
或批处理文件。


If you don't have a Nvidia GPU accessible, the above will output something like:

```
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
```

In that case, go back up and follow the install steps.

If you do have a GPU, the line above will output something like:

```
Wed Jan 19 22:09:08 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
```



### 2. Getting PyTorch to run on the GPU

Once you've got a GPU ready to access, the next step is getting PyTorch to use for storing data (tensors) and computing on data (performing operations on tensors).

To do so, you can use the [`torch.cuda`](https://pytorch.org/docs/stable/cuda.html) package.

Rather than talk about it, let's try it out.

使用 [`torch.cuda.is_available()`](https://pytorch.org/docs/stable/generated/torch.cuda.is_available.html#torch.cuda.is_available)检查GPU是否可用.


In [107]:
# Check for GPU
import torch
torch.cuda.is_available()

True

If the above outputs `True`, PyTorch can see and use the GPU, if it outputs `False`, it can't see the GPU and in that case, you'll have to go back through the installation steps.

Now, let's say you wanted to setup your code so it ran on CPU *or* the GPU if it was available.

That way, if you or someone decides to run your code, it'll work regardless of the computing device they're using. 

Let's create a `device` variable to store what kind of device is available.

In [108]:
# Set device type
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

If the above output `"cuda"` it means we can set all of our PyTorch code to use the available CUDA device (a GPU) and if it output `"cpu"`, our PyTorch code will stick with the CPU.

> **Note:** In PyTorch, it's best practice to write [**device agnostic code**](https://pytorch.org/docs/master/notes/cuda.html#device-agnostic-code). This means code that'll run on CPU (always available) or GPU (if available).

If you want to do faster computing you can use a GPU but if you want to do *much* faster computing, you can use multiple GPUs.

获取GPU数量： [`torch.cuda.device_count()`](https://pytorch.org/docs/stable/generated/torch.cuda.device_count.html#torch.cuda.device_count).

In [109]:
# Count number of devices
torch.cuda.device_count()

1

Knowing the number of GPUs PyTorch has access to is helpful incase you wanted to run a specific process on one GPU and another process on another (PyTorch also has features to let you run a process across *all* GPUs).

### 3. Putting tensors (and models) on the GPU

You can put tensors (and models, we'll see this later) on a specific device by calling [`to(device)`](https://pytorch.org/docs/stable/generated/torch.Tensor.to.html) on them. Where `device` is the target device you'd like the tensor (or model) to go to.

Why do this?

GPUs offer far faster numerical computing than CPUs do and if a GPU isn't available, because of our **device agnostic code** (see above), it'll run on the CPU.

> **Note:** Putting a tensor on GPU using `to(device)` (e.g. `some_tensor.to(device)`) returns a copy of that tensor, e.g. the same tensor will be on CPU and GPU. To overwrite tensors, reassign them:
>
> `some_tensor = some_tensor.to(device)`

创建tensor并放到GPU上 (if it's available).

In [110]:
# Create tensor (default on CPU)
tensor = torch.tensor([1, 2, 3])

# Tensor not on GPU
print(tensor, tensor.device)

# Move tensor to GPU (if available)
tensor_on_gpu = tensor.to(device)
tensor_on_gpu

tensor([1, 2, 3]) cpu


tensor([1, 2, 3], device='cuda:0')

If you have a GPU available, the above code will output something like:

```
tensor([1, 2, 3]) cpu
tensor([1, 2, 3], device='cuda:0')
```

Notice the second tensor has `device='cuda:0'`, this means it's stored on the 0th GPU available (GPUs are 0 indexed, if two GPUs were available, they'd be `'cuda:0'` and `'cuda:1'` respectively, up to `'cuda:n'`).



### 4. Moving tensors back to the CPU

将GPU上的tensor移到CPU。

例如, 你想将tensor转化成numpy array(numpy不支持GPU)。

直接用对`tensor_on_gpu`使用 [`torch.Tensor.numpy()`](https://pytorch.org/docs/stable/generated/torch.Tensor.numpy.html)  （是会出错的

In [111]:
# If tensor is on GPU, can't transform it to NumPy (this will error)
tensor_on_gpu.numpy()

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

所以，你需要先转换到CPU上： [`Tensor.cpu()`](https://pytorch.org/docs/stable/generated/torch.Tensor.cpu.html).

这会将tensor复制到 CPU 上。

In [115]:
# Instead, copy the tensor back to cpu
tensor_back_on_cpu = tensor_on_gpu.cpu().numpy()
tensor_back_on_cpu

array([1, 2, 3], dtype=int64)

该操作是复制，原来GPU上的tensor还在

In [114]:
tensor_on_gpu

tensor([1, 2, 3], device='cuda:0')

## Exercises

All of the exercises are focused on practicing the code above.

You should be able to complete them by referencing each section or by following the resource(s) linked.

**Resources:**

* [Exercise template notebook for 00](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/exercises/00_pytorch_fundamentals_exercises.ipynb).
* [Example solutions notebook for 00](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/solutions/00_pytorch_fundamentals_exercise_solutions.ipynb) (try the exercises *before* looking at this).

1. Documentation reading - A big part of deep learning (and learning to code in general) is getting familiar with the documentation of a certain framework you're using. We'll be using the PyTorch documentation a lot throughout the rest of this course. So I'd recommend spending 10-minutes reading the following (it's okay if you don't get some things for now, the focus is not yet full understanding, it's awareness). See the documentation on [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html#torch-tensor) and for [`torch.cuda`](https://pytorch.org/docs/master/notes/cuda.html#cuda-semantics).
2. Create a random tensor with shape `(7, 7)`.
3. Perform a matrix multiplication on the tensor from 2 with another random tensor with shape `(1, 7)` (hint: you may have to transpose the second tensor).
4. Set the random seed to `0` and do exercises 2 & 3 over again.
5. Speaking of random seeds, we saw how to set it with `torch.manual_seed()` but is there a GPU equivalent? (hint: you'll need to look into the documentation for `torch.cuda` for this one). If there is, set the GPU random seed to `1234`.
6. Create two random tensors of shape `(2, 3)` and send them both to the GPU (you'll need access to a GPU for this). Set `torch.manual_seed(1234)` when creating the tensors (this doesn't have to be the GPU random seed).
7. Perform a matrix multiplication on the tensors you created in 6 (again, you may have to adjust the shapes of one of the tensors).
8. Find the maximum and minimum values of the output of 7.
9. Find the maximum and minimum index values of the output of 7.
10. Make a random tensor with shape `(1, 1, 1, 10)` and then create a new tensor with all the `1` dimensions removed to be left with a tensor of shape `(10)`. Set the seed to `7` when you create it and print out the first tensor and it's shape as well as the second tensor and it's shape.

## Extra-curriculum

* Spend 1-hour going through the [PyTorch basics tutorial](https://pytorch.org/tutorials/beginner/basics/intro.html) (I'd recommend the [Quickstart](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) and [Tensors](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html) sections).
* To learn more on how a tensor can represent data, see this video: [What's a tensor?](https://youtu.be/f5liqUk0ZTw)