<a href="https://colab.research.google.com/github/hemu2014/python-ML/blob/main/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E5%88%9D%E7%BA%A7-%E8%AF%86%E5%88%AB%E6%95%B0%E5%AD%97/site/zh-cn/tutorials/quickstart/beginner.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2019 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# 初学者的 TensorFlow 2.0 教程

<table class="tfo-notebook-buttons" align="left">
  <td>     <a target="_blank" href="https://tensorflow.google.cn/tutorials/quickstart/beginner"><img src="https://tensorflow.google.cn/images/tf_logo_32px.png">在 TensorFlow.org 观看</a>   </td>
  <td><a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/tutorials/quickstart/beginner.ipynb"><img src="https://tensorflow.google.cn/images/colab_logo_32px.png">在 Google Colab 中运行 </a></td>
  <td>     <a target="_blank" href="https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/tutorials/quickstart/beginner.ipynb"><img src="https://tensorflow.google.cn/images/GitHub-Mark-32px.png">在 GitHub 查看源代码</a>   </td>
  <td>     <a href="https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/tutorials/quickstart/beginner.ipynb"><img src="https://tensorflow.google.cn/images/download_logo_32px.png">下载笔记本</a>   </td>
</table>

此简短介绍使用 [Keras](https://tensorflow.google.cn/guide/keras/overview) 进行以下操作：

1. 加载一个预构建的数据集。
2. 构建对图像进行分类的神经网络机器学习模型。
3. 训练此神经网络。
4. 评估模型的准确率。

这是一个 [Google Colaboratory](https://colab.research.google.com/notebooks/welcome.ipynb) 笔记本文件。 Python程序可以直接在浏览器中运行，这是学习 Tensorflow 的绝佳方式。想要学习该教程，请点击此页面顶部的按钮，在Google Colab中运行笔记本。

1. 在 Colab中, 连接到Python运行环境： 在菜单条的右上方, 选择 *CONNECT*。
2. 要在笔记本中运行所有代码，请选择 **Runtime** &gt; **Run all**。要一次运行一个代码单元，请将鼠标悬停在每个单元上并选择 **Run cell** 图标。

![Run cell 图标](https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/tutorials/quickstart/images/beginner/run_cell_icon.png?raw=1)

## 设置 TensorFlow

首先将 TensorFlow 导入到您的程序：

In [1]:
import tensorflow as tf

如果您在自己的开发环境而不是 [Colab](https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/quickstart/beginner.ipynb) 中操作，请参阅设置 TensorFlow 以进行开发的[安装指南](https://tensorflow.google.cn/install)。

注：如果您使用自己的开发环境，请确保您已升级到最新的 `pip` 以安装 TensorFlow 2 软件包。有关详情，请参阅[安装指南](https://tensorflow.google.cn/install)。

## 加载数据集

加载并准备 [MNIST 数据集](http://yann.lecun.com/exdb/mnist/)。图像的像素值范围从 0 到 255。通过将值除以 `255.0` 来将这些值缩放到 0 到 1 的范围。这还会将样本数据从整数转换为浮点数：

In [2]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


## 构建机器学习模型

构建 `tf.keras.Sequential` 模型：



```
# Okay, let's break down this code snippet:
好的，让我们分解这段代码片段：

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])
请谨慎使用代码
This code defines a sequential machine learning model using the Keras API in TensorFlow. This model is designed for image classification, specifically for the MNIST dataset which contains handwritten digits.
这段代码使用 TensorFlow 的 Keras API 定义了一个顺序机器学习模型。该模型设计用于图像分类，特别是用于 MNIST 数据集，该数据集包含手写数字。

Here's a layer-by-layer explanation:
下面是逐层解释：

tf.keras.models.Sequential: This line creates a Sequential model. A Sequential model is a linear stack of layers, where each layer feeds its output to the next layer in the sequence. It's a simple and common way to build neural networks.
tf.keras.models.Sequential ：这一行创建了一个 Sequential 模型。一个 Sequential 模型是由层线性堆叠而成的，其中每一层将它的输出传递给下一层，这是一个简单且常见的构建神经网络的方法。

tf.keras.layers.Flatten(input_shape=(28, 28)): This is the first layer in the model. It's a Flatten layer, which takes the input image (which is a 28x28 pixel grid) and flattens it into a single 784-element vector. This is necessary because the subsequent layers (Dense layers) expect their input to be a 1-dimensional vector.
这是模型的第一层。它是一个 Flatten 层，它接收输入图像（一个 28x28 像素的网格）并将其展平成一个 784 个元素的向量。这是必要的，因为后续的层（密集层）期望它们的输入是一个一维向量。

tf.keras.layers.Dense(128, activation='relu'): This is the second layer, a Dense layer. It's a fully connected layer, meaning each neuron in this layer is connected to every neuron in the previous layer.
这是第二层，一个 Dense 层。它是一个全连接层，意味着这一层的每个神经元都与前一层中的每个神经元相连。

It has 128 neurons (units).
它有 128 个神经元（单元）。
It uses the relu activation function, which introduces non-linearity into the model, allowing it to learn more complex patterns.
它使用 relu 激活函数，该函数将非线性引入模型，使其能够学习更复杂的模式。
tf.keras.layers.Dropout(0.2): This is a Dropout layer. It randomly sets 20% of the input units to 0 during training. This helps prevent overfitting, where the model performs well on training data but poorly on unseen data.
这是第 Dropout 层。在训练过程中，它会随机将 20% 的输入单元设置为 0。这有助于防止过拟合，即模型在训练数据上表现良好，但在未见过的数据上表现较差。

tf.keras.layers.Dense(10): This is the final layer, another Dense layer.
这是最后一层，另一个 Dense 层。

It has 10 neurons, representing the 10 possible digit classes (0-9).
它有 10 个神经元，代表 10 个可能的数字类别（0-9）。
This layer outputs the model's predictions as a vector of 10 values (logits or log-odds), one for each class. These values are then typically converted into probabilities using a softmax function to determine the final classification.
这一层输出模型的预测结果，以 10 个值的向量形式（logits 或 log-odds），每个类别一个。然后通常使用 softmax 函数将这些值转换为概率，以确定最终的分类。
In summary, this code defines a simple neural network model with the following structure:
总结来说，此代码定义了一个具有以下结构的简单神经网络模型：

Input: A flattened 28x28 image.
一个 28x28 的扁平图像。
Hidden Layer: A dense layer with 128 neurons and 'relu' activation.
隐藏层：一个包含 128 个神经元的密集层，使用'relu'激活函数。
Dropout: A dropout layer for regularization.
Dropout：用于正则化的 dropout 层。
Output Layer: A dense layer with 10 neurons for classification.
输出层：用于分类的具有 10 个神经元的密集层。
This model is then used for classifying handwritten digits in the MNIST dataset.
此模型随后用于对 MNIST 数据集中的手写数字进行分类。
```



In [3]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

  super().__init__(**kwargs)


[`Sequential`](https://tensorflow.google.cn/guide/keras/sequential_model) 对于堆叠层很有用，其中每一层都有一个输入[张量](https://tensorflow.google.cn/guide/tensor)和一个输出张量。层是具有已知数学结构的函数，可以重复使用并具有可训练的变量。大多数 TensorFlow 模型都由层组成。此模型使用 [`Flatten`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Flatten)、[`Dense`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Dense) 和 [`Dropout`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Dropout) 层。

对于每个样本，模型都会返回一个包含 [logits](https://developers.google.com/machine-learning/glossary#logits) 或 [log-odds](https://developers.google.com/machine-learning/glossary#log-odds) 分数的向量，每个类一个。



```
# This code snippet is using the trained machine learning model to make predictions on a subset of the training data and then displays those predictions.
这段代码片段正在使用训练好的机器学习模型 model 对训练数据的一个子集进行预测，然后显示这些预测结果。

Line by Line Explanation:
行程解释：

predictions = model(x_train[:1]).numpy():

x_train[:1]: This selects the first image from the training dataset x_train. Remember x_train contains all the images the model was trained on.
x_train[:1] ：这选择了训练数据集中的第一张图片 x_train 。记住 x_train 包含了模型训练所用的所有图片。
model(x_train[:1]): This feeds the selected image to the model for prediction. The model, which we built earlier, processes the image and produces an output. This output represents the model's confidence levels for each of the 10 possible classifications (digits 0-9).
这将选定的图像输入到 model 进行预测。我们之前构建的 model 处理图像并生成输出。这个输出代表了模型对 10 种可能的分类（数字 0-9）的置信度水平。
.numpy(): This converts the model's output (which is a TensorFlow Tensor) to a NumPy array, which is a common format for working with numerical data in Python. This makes it easier to view and manipulate the predictions.
这将 model 的输出（TensorFlow Tensor ）转换为 NumPy 数组，这是在 Python 中处理数值数据的常用格式。这使得查看和操作预测结果变得更加容易。
predictions = ...: This stores the resulting NumPy array containing the predictions into a variable named predictions.
这将包含预测结果的 NumPy 数组存储到名为 predictions 的变量中。
predictions:

This line simply displays the content of the predictions variable. By placing the variable name at the end of a Jupyter Notebook cell, it automatically prints the value of that variable. It will show the model's prediction for the first image in the training dataset in the form of an array of 10 numbers. These numbers represent the model's confidence levels for each digit (0-9).
这行代码只是简单地显示 predictions 变量的内容。通过在 Jupyter Notebook 单元格的末尾放置变量名，它会自动打印该变量的值。它将以 10 个数字的数组形式显示训练数据集中第一张图像的模型预测。这些数字代表了模型对每个数字（0-9）的置信度水平。
In Summary:  总结：

This code section takes the first image from your training data (x_train), feeds it to your trained model (model), and then shows you the model's prediction for that image, stored in predictions. The prediction is an array of 10 numbers representing the model's confidence levels for each of the possible digits (0-9).
这段代码部分从您的训练数据中取出第一张图像（ x_train ），将其输入到您的训练模型（ model ）中，然后显示模型对该图像的预测，存储在 predictions 中。预测是一个包含 10 个数字的数组，代表模型对每个可能的数字（0-9）的置信水平。
```



In [4]:
predictions = model(x_train[:1]).numpy()
predictions

array([[-0.0447921 , -0.48021442,  0.3377464 ,  0.3623109 ,  0.01883672,
         0.5271586 ,  0.32786715,  0.7365732 , -0.2867936 , -0.45823872]],
      dtype=float32)

`tf.nn.softmax` 函数将这些 logits 转换为每个类的*概率*：



```
# 好的，让我们分解这一行代码：

tf.nn.softmax(predictions).numpy()

This line of code takes the raw output (called logits) from the machine learning model and converts them into probabilities. Here's a step-by-step explanation:
这段代码将机器学习模型的原始输出（称为 logits）转换为概率。以下是逐步解释：

predictions: This variable holds the output from your machine learning model. Remember, these are the initial predictions, which are just numerical scores and not yet probabilities. These scores represent how strongly the model believes the input image belongs to each of the 10 possible classes (digits 0-9).
predictions : 这个变量存储了您机器学习模型的输出。请记住，这些是初始预测，只是数值分数，还不是概率。这些分数表示模型认为输入图像属于 10 个可能类别（数字 0-9）的强度。

tf.nn.softmax: This is a function from TensorFlow that applies the "softmax" operation. The softmax function takes a set of scores (the predictions in this case) and transforms them into probabilities. Essentially, it ensures:
tf.nn.softmax : 这是一个 TensorFlow 中的函数，用于应用“softmax”操作。softmax 函数接受一组分数（本例中的 predictions ）并将它们转换为概率。本质上，它确保：
All the output values are between 0 and 1 (representing probabilities).
所有输出值均在 0 到 1 之间（表示概率）。

The output values sum up to 1 (meaning the model assigns a probability distribution across all possible classes).
输出值之和为 1（意味着模型将概率分布分配到所有可能的类别上）。

.numpy(): This part converts the TensorFlow output (which is a special TensorFlow object called a "Tensor") into a regular Python NumPy array. This makes it easier to work with and display the results.
.numpy() ：这一部分将 TensorFlow 的输出（一个特殊的 TensorFlow 对象，称为“张量”）转换为常规的 Python NumPy 数组。这使得处理和显示结果更加容易。

In simpler terms: Imagine the model initially gives you scores like this: [-2, 5, 1, 0, -3, 2, 4, -1, 3, 0] for each of the digits 0 through 9. These scores don't tell you much about the actual probability of each digit. The tf.nn.softmax function takes these scores and converts them into probabilities, perhaps like this: [0.01, 0.45, 0.08, 0.05, 0.005, 0.10, 0.25, 0.02, 0.15, 0.05]. Now, you can easily see that the model thinks there's a 45% chance the input image is the digit "1" and a 25% chance it's the digit "6", and so on.
简单来说：想象一下模型最初给出的分数是这样的：[-2, 5, 1, 0, -3, 2, 4, -1, 3, 0]，对应于数字 0 到 9。这些分数并不能告诉你每个数字的实际概率。 tf.nn.softmax 函数将这些分数转换为概率，可能如下所示：[0.01, 0.45, 0.08, 0.05, 0.005, 0.10, 0.25, 0.02, 0.15, 0.05]。现在，你可以很容易地看出模型认为输入图像是数字“1”的概率为 45%，是数字“6”的概率为 25%，等等。

Therefore, tf.nn.softmax(predictions).numpy() effectively takes the model's raw predictions, converts them into probabilities, and then returns those probabilities in a format that's easy to use in Python.
因此， tf.nn.softmax(predictions).numpy() 有效地获取模型的原始预测，将它们转换为概率，然后以 Python 易于使用的方式返回这些概率。
```



In [5]:
tf.nn.softmax(predictions).numpy()

array([[0.07977451, 0.05161342, 0.11694954, 0.11985791, 0.08501544,
        0.14133808, 0.11579985, 0.17426364, 0.06262738, 0.05276022]],
      dtype=float32)

注：可以将 `tf.nn.softmax` 烘焙到网络最后一层的激活函数中。虽然这可以使模型输出更易解释，但不建议使用这种方式，因为在使用 softmax 输出时不可能为所有模型提供精确且数值稳定的损失计算。

使用 `losses.SparseCategoricalCrossentropy` 定义训练的损失函数：

In [6]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

损失函数采用真实值向量和逻辑向量，并返回每个样本的标量损失。此损失等于真实类的负对数概率：如果模型确定类正确，则损失为零。

这个未经训练的模型给出的概率接近随机（每个类为 1/10），因此初始损失应该接近 `-tf.math.log(1/10) ~= 2.3`。



```
# Okay, let's break down the code snippet loss_fn(y_train[:1], predictions).numpy():
好的，让我们分解代码片段 loss_fn(y_train[:1], predictions).numpy() :

This line of code calculates and displays the loss for the first training sample using the defined loss function.
这行代码使用定义的损失函数计算并显示第一个训练样本的损失。

Here's a detailed explanation:
这里有一个详细的解释：

y_train[:1]: This part selects the first element from the y_train array. y_train contains the correct labels or targets for each image in the training dataset. So, y_train[:1] represents the true label for the very first training image.
y_train[:1] : 这部分从 y_train 数组中选择了第一个元素。 y_train 包含了训练数据集中每个图像的正确标签或目标。因此， y_train[:1] 代表了第一个训练图像的真正标签。

predictions: This variable, as discussed earlier, stores the model's raw prediction for the first training image (in the form of logits or log-odds scores).
predictions : 如前所述，这个变量存储了模型对第一个训练图像的原始预测（以 logits 或 log-odds 分数的形式）。

loss_fn(...): This is where the actual loss calculation happens. The loss_fn is an instance of tf.keras.losses.SparseCategoricalCrossentropy, which is a function designed to calculate the loss in multi-class classification problems. It takes two arguments:
这是实际损失计算发生的地方。 loss_fn 是 tf.keras.losses.SparseCategoricalCrossentropy 的一个实例，tf.keras.losses.SparseCategoricalCrossentropy 是一个用于计算多类分类问题损失的函数。它接受两个参数：

The true labels (y_train[:1])
真实标签（ y_train[:1] ）
The model's predictions (predictions) The function then computes how well the model's predictions match the true label for the given input.
模型的预测（ predictions ）然后该函数计算模型预测与给定输入的真实标签匹配的程度。
.numpy(): The result of loss_fn(...) is a TensorFlow Tensor, a special data structure used in TensorFlow. .numpy() converts this Tensor into a standard NumPy array, which is a common format for numerical data in Python. This allows us to easily view and work with the calculated loss value.
.numpy() : loss_fn(...) 的结果是 TensorFlow 张量，TensorFlow 中使用的特殊数据结构。 .numpy() 将此张量转换为标准的 NumPy 数组，这是 Python 中数值数据的常用格式。这使得我们可以轻松查看和使用计算出的损失值。

In essence, this line of code:
本质上，这一行代码：

Takes the first training image's true label (y_train[:1]) and the model's prediction for that image (predictions).
获取第一张训练图像的真实标签（ y_train[:1] ）和模型对该图像的预测（ predictions ）。
Uses the loss function (loss_fn) to compute the error or loss between the true label and the prediction.
使用损失函数（ loss_fn ）计算真实标签与预测之间的误差或损失。
Converts the loss value from a TensorFlow Tensor to a NumPy array using .numpy() for display or further use in Python.
使用 .numpy() 将损失值从 TensorFlow 张量转换为 NumPy 数组，以便显示或进一步在 Python 中使用。
This calculated loss is a crucial metric during the training process. The goal of training a machine learning model is to minimize this loss, indicating that the model is making better predictions.
这个计算损失是训练过程中的一个关键指标。训练机器学习模型的目的是最小化这个损失，表明模型正在做出更好的预测。
```



In [7]:
loss_fn(y_train[:1], predictions).numpy()

np.float32(1.9566005)

在开始训练之前，使用 Keras `Model.compile` 配置和编译模型。将 [`optimizer`](https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers) 类设置为 `adam`，将 `loss` 设置为您之前定义的 `loss_fn` 函数，并通过将 `metrics` 参数设置为 `accuracy` 来指定要为模型评估的指标。



```
# model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])，解释代码

好的，我可以帮您解释这段代码。这段代码是用 Keras 的 Model.compile 方法来配置和编译模型，为模型训练做准备。

以下是逐行解释：

model.compile(...): 这一行调用了模型对象的 compile 方法，用于配置模型的训练过程。
optimizer='adam': 这一部分指定了模型训练使用的优化器。这里使用了 adam 优化器，它是一种常用的自适应优化算法，能够有效地调整模型的权重，使其更快地收敛到最优解。
loss=loss_fn: 这一部分指定了模型训练使用的损失函数。loss_fn 是您之前定义的损失函数，用于衡量模型预测与真实标签之间的差距。在这里，loss_fn 是 tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)，适用于多分类问题。
metrics=['accuracy']: 这一部分指定了模型训练过程中要评估的指标。这里使用了 accuracy 指标，用于衡量模型预测的准确率。
总结

这段代码的作用是配置模型的训练方式，包括选择优化器、损失函数和评估指标。这些配置将决定模型如何学习和更新其权重，以达到最佳的性能。

希望这个解释对您有所帮助！如果您还有其他问题，请随时提出。
```



In [8]:
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

## 训练并评估模型

使用 `Model.fit` 方法调整您的模型参数并最小化损失：



```
# model.fit(x_train, y_train, epochs=5)
请谨慎使用代码
This line is responsible for training the machine learning model that you've built. Here's a breakdown of what's happening:
这一行负责训练你构建的机器学习模型。以下是发生的详细情况：

model.fit(...): This part calls the fit method of the model object. The fit method is the primary way to train a machine learning model in Keras. It essentially adjusts the model's internal parameters to make it better at predicting the correct output.
model.fit(...) ：这部分调用 fit 对象的 model 方法。 fit 方法是使用 Keras 训练机器学习模型的主体方式。它本质上调整模型的内部参数，使其在预测正确输出方面表现得更好。
x_train: This is the training data, the set of images the model will learn from. It's typically a large collection of examples.
x_train ：这是训练数据，模型将从中学习的图像集合。它通常是一个包含大量示例的大集合。
y_train: This represents the target values or labels corresponding to the training data (x_train). In other words, it tells the model what the correct answer should be for each image in x_train. For example, if an image in x_train shows the handwritten digit "3", the corresponding value in y_train would be "3".
y_train ：这代表与训练数据（ x_train ）相对应的目标值或标签。换句话说，它告诉模型每个图像 x_train 的正确答案应该是什么。例如，如果 x_train 中的图像显示了手写的数字“3”，则 y_train 中的对应值将是“3”。
epochs=5: This argument specifies the number of training epochs. An epoch is a complete pass through the entire training dataset. In this case, the model will be trained for 5 epochs, meaning it will see and learn from the entire training dataset 5 times.
epochs=5 : 此参数指定训练的轮数。一轮是指遍历整个训练数据集一次。在这种情况下，模型将训练 5 轮，意味着它将 5 次看到并从整个训练数据集中学习。
In Summary  总结来说

This line of code initiates the training process for your machine learning model. It feeds the model with the training data (x_train) and the corresponding correct answers (y_train), allowing the model to learn and improve its predictions over 5 epochs. The goal is to adjust the model's internal parameters so that it can accurately predict the handwritten digits in new, unseen images.
这段代码启动了您的机器学习模型的训练过程。它将训练数据（ x_train ）和相应的正确答案（ y_train ）输入模型，使模型能够通过 5 个 epoch 进行学习和改进其预测。目标是调整模型的内部参数，以便它能够准确预测新图像中的手写数字。
```



In [9]:
model.fit(x_train, y_train, epochs=5)

Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.8512 - loss: 0.4987
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - accuracy: 0.9559 - loss: 0.1497
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 4ms/step - accuracy: 0.9679 - loss: 0.1070
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 5ms/step - accuracy: 0.9736 - loss: 0.0872
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9774 - loss: 0.0736


<keras.src.callbacks.history.History at 0x7c189c82f910>

`Model.evaluate` 方法会检查模型的性能（通常是在[验证集](https://developers.google.com/machine-learning/glossary#validation-set)或[测试集](https://developers.google.com/machine-learning/glossary#test-set)上）。



```
#
model.evaluate(x_test,  y_test, verbose=2)

This line of code is used to evaluate the performance of the trained machine learning model (model) on a separate dataset that it hasn't seen during training, which is typically called the test dataset.
这段代码用于评估训练好的机器学习模型（ model ）在训练过程中未见过的独立数据集上的性能，这通常被称为测试数据集。

Here's a breakdown of what each part does:
这里是对每个部分功能的分解：

model.evaluate(...): This calls the evaluate method of the model object. The evaluate method is used to assess how well the model is performing by comparing its predictions to the actual correct answers.
这调用了 evaluate 对象的 model 方法。 evaluate 方法用于评估模型的表现，通过将其预测与实际正确答案进行比较。

x_test: This is the test dataset, which contains images the model has never encountered before. It's used to get a realistic estimate of the model's performance on unseen data.
x_test 是测试数据集，其中包含模型从未见过的图像。它用于获取模型在未见数据上的性能的 realistic 估计。

y_test: These are the correct labels or target values for the images in the x_test dataset. In this case, these labels represent the actual digits (0-9) that the images in x_test are supposed to represent.
这些是#1 数据集中图像的正确标签或目标值。在这种情况下，这些标签代表图像#2 应该表示的实际数字（0-9）。

verbose=2: This argument controls the level of output displayed during evaluation. verbose=2 means it will display progress and metrics (like accuracy and loss) for each epoch.
此参数控制评估期间显示的输出级别。 verbose=2 表示将显示每个 epoch 的进度和指标（如准确率和损失）。

In simpler terms, this line of code is like giving your trained model a final exam using a set of questions (x_test) it hasn't seen before. By comparing the model's answers (predictions) to the answer key (y_test), it calculates how well the model performs on new, unseen data. The verbose=2 setting simply means you'll see some progress and performance reports during this "exam."
用更简单的说法，这一行代码就像给你的训练模型进行一场考试，使用它之前从未见过的题目（ x_test ）。通过将模型的答案（预测）与答案键（ y_test ）进行比较，它计算出模型在新、未见过的数据上的表现如何。 verbose=2 设置仅仅意味着你会在这次“考试”期间看到一些进度和性能报告。

The goal of this evaluation is to determine if the model has truly learned to generalize from the training data and can make accurate predictions on real-world examples it hasn't seen before. This provides insights into how well your model is likely to perform in a practical setting.
本评估的目标是确定模型是否真正学会了从训练数据中泛化，并能在之前未见过的情况下对真实世界的例子做出准确的预测。这有助于了解您的模型在实际环境中的表现可能有多好
```



In [11]:
model.evaluate(x_test,  y_test, verbose=1)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9750 - loss: 0.0845


[0.07189472764730453, 0.9782000184059143]

In [10]:
model.evaluate(x_test,  y_test, verbose=2)

313/313 - 1s - 3ms/step - accuracy: 0.9782 - loss: 0.0719


[0.07189472764730453, 0.9782000184059143]

现在，这个照片分类器的准确度已经达到 98%。想要了解更多，请阅读 [TensorFlow 教程](https://tensorflow.google.cn/tutorials/)。

如果您想让模型返回概率，可以封装经过训练的模型，并将 softmax 附加到该模型：



```
# Okay, let's focus on the specific line you asked about within the larger code:
好的，让我们专注于您所要求的更大代码块中的特定行：

probability_model = tf.keras.Sequential([
  model,  # This is the line you're asking about
  tf.keras.layers.Softmax()
])
请谨慎使用代码
Explanation:  说明：

In this code snippet, you're creating a new model called probability_model. This model is built by stacking two elements sequentially using tf.keras.Sequential:
在这个代码片段中，你正在创建一个名为 probability_model 的新模型。该模型通过使用 tf.keras.Sequential 顺序堆叠两个元素来构建：

model: This is the trained model that you created earlier in the code. It has already learned to recognize handwritten digits from the MNIST dataset. You're essentially reusing this trained model as a component within the probability_model.
model ：这是你在代码中之前创建的已训练模型。它已经学会了从 MNIST 数据集中识别手写数字。你实际上是将这个已训练模型作为 probability_model 的组件进行重用。

tf.keras.layers.Softmax(): This is a Softmax layer added on top of your trained model. The purpose of this layer is to take the output (logits) from the model and convert them into probabilities.
tf.keras.layers.Softmax() ：这是添加在您训练模型之上的 Softmax 层。该层的目的是将 model 的输出（logits）转换为概率。

In simpler terms:  简单来说：

Imagine your trained model makes a prediction about a handwritten digit. Its output might be a set of scores for each digit (0-9), indicating its confidence level for each digit. However, these scores aren't directly interpretable as probabilities.
想象一下你的训练好的模型对一张手写数字进行预测。它的输出可能是一组每个数字（0-9）的分数，表示它对每个数字的置信水平。然而，这些分数并不能直接解释为概率。

The Softmax layer takes these scores and transforms them into probabilities. This means that the final output of probability_model will be a set of probabilities for each digit, summing up to 1.
softmax 层将这些分数转换为概率。这意味着 probability_model 的最终输出将是一组每个数字的概率，总和为 1。

Why is this useful?  为什么这很有用？

By adding the Softmax layer, you can get more easily understandable results from your model. Instead of just seeing raw scores, you'll see the probabilities for each digit, which makes it easier to see how confident the model is about its prediction.
通过添加 Softmax 层，你可以更容易地理解模型的结果。而不是只看到原始分数，你将看到每个数字的概率，这使得更容易看到模型对其预测的信心程度。

In essence, this line (model) is adding the trained model as the first step in a new, extended model that outputs probabilities. This allows you to use the existing model's knowledge and add a final layer to format the output in a more user-friendly way.
本质上，这一行（ model ）是将训练好的模型作为新扩展模型的第一步，该模型输出概率。这允许您使用现有模型的知识，并添加一个最终层以更用户友好的方式格式化输出。
```



In [12]:
probability_model = tf.keras.Sequential([
  model,
  tf.keras.layers.Softmax()
])



```
# Okay, let's explain the line of code probability_model(x_test[:5]):
好的，让我们解释代码行 probability_model(x_test[:5]) ：

This line of code is using the probability_model (which you defined earlier) to predict the probabilities for the first 5 images in the x_test dataset.
这行代码正在使用你之前定义的 probability_model 来预测 x_test 数据集中前 5 张图片的概率。

Here's a breakdown:  下面是详细说明：

x_test[:5]: This selects the first 5 images from the x_test dataset. The x_test dataset typically contains images that were not used during the training process and are reserved for evaluating the model's performance on unseen data.
这部分代码从 x_test 数据集中选择了前 5 张图像。 x_test 数据集通常包含在训练过程中未使用的图像，并保留用于评估模型在未见数据上的性能。
probability_model(...): This part uses the probability_model to make predictions on the selected images (x_test[:5]). Remember that the probability_model was created by taking your original trained model (model) and adding a Softmax layer on top of it. This Softmax layer converts the model's raw outputs (logits) into probabilities.
这部分代码使用 probability_model 对选定的图像（ x_test[:5] ）进行预测。请记住， probability_model 是通过将您的原始训练模型（ model ）添加一个 Softmax 层创建的。这个 Softmax 层将模型的原始输出（logits）转换为概率。
In simpler terms:  简单来说：

You're taking the first 5 images from your test dataset (x_test[:5]).
您正在从测试数据集中取前 5 张图片（ x_test[:5] ）。
You're feeding these images to your probability_model, which is designed to output probabilities.
您将这些图片输入到您的 probability_model 中，该模型旨在输出概率。
The probability_model processes the images and produces a set of probabilities for each image, indicating the likelihood of each image belonging to each of the 10 possible digit classes (0-9).
第 0 号处理图像并为每个图像生成一组概率，表示每个图像属于 10 个可能的数字类别（0-9）的可能性。
Why is this useful?  为什么这很有用？

Evaluating Performance: You can use these probabilities to see how confident your model is about its predictions on unseen data. For example, if the model assigns a high probability to the correct digit for most of the 5 images, it suggests that the model is performing well.
评估性能：您可以使用这些概率来查看模型对其在未见数据上的预测的信心程度。例如，如果模型在 5 张图像的大部分中为正确的数字分配了高概率，这表明模型表现良好。
Making Decisions: In real-world applications, you might use these probabilities to make decisions based on the model's predictions. For example, if the model predicts a high probability for a certain digit, you might use that information to automatically classify or categorize the image.
在现实世界的应用中，您可能会使用这些概率根据模型的预测做出决策。例如，如果模型预测某个数字的概率很高，您可能会使用这个信息来自动分类或对图像进行分类。
In essence, probability_model(x_test[:5]) is a way to get your model's predictions in the form of probabilities for a small sample of your test data. This helps you understand how your model is likely to perform on real-world data and allows you to use those probabilities for further analysis or decision-making.
本质上， probability_model(x_test[:5]) 是一种获取您模型对测试数据小样本预测概率的方法。这有助于您了解模型在现实世界数据上的可能表现，并允许您使用这些概率进行进一步分析或决策。
```



In [13]:
probability_model(x_test[:5])

<tf.Tensor: shape=(5, 10), dtype=float32, numpy=
array([[1.4097507e-08, 1.2330316e-10, 1.0518686e-06, 1.6375237e-05,
        4.0629262e-12, 1.1858759e-07, 1.1978489e-15, 9.9998164e-01,
        1.5857450e-08, 8.4896220e-07],
       [5.6619944e-09, 5.8851412e-05, 9.9992466e-01, 2.8975305e-06,
        6.0194922e-15, 1.1718380e-05, 6.1723320e-08, 1.0554468e-14,
        1.7324928e-06, 7.6257074e-16],
       [1.3362143e-07, 9.9960762e-01, 3.9587103e-05, 1.3031118e-05,
        2.8080909e-05, 1.3133340e-06, 7.5568219e-06, 2.4732243e-04,
        5.4696611e-05, 7.2432010e-07],
       [9.9911422e-01, 7.3258121e-08, 4.9981417e-04, 9.3554860e-07,
        7.1571179e-05, 2.2504746e-05, 2.1756640e-04, 3.4446595e-05,
        6.2740469e-06, 3.2556422e-05],
       [3.3515838e-07, 6.6569514e-09, 7.2725425e-06, 4.7795923e-09,
        9.9915171e-01, 1.9920884e-07, 5.0967246e-06, 3.5152949e-05,
        6.1489817e-07, 7.9959573e-04]], dtype=float32)>

Okay, let's explain the line of code probability_model(x_test[:5]):
好的，让我们解释代码行

```
# probability_model(x_test[:5])
```

This line of code is using the probability_model (which you defined earlier) to predict the probabilities for the first 5 images in the x_test dataset.
这行代码正在使用你之前定义的 probability_model 来预测 x_test 数据集中前 5 张图片的概率。

Here's a breakdown:  下面是详细说明：

* x_test[:5]: This selects the first 5 images from the x_test dataset. The x_test dataset typically contains images that were not used during the training process and are reserved for evaluating the model's performance on unseen data.
这部分代码从 x_test 数据集中选择了前 5 张图像。 x_test 数据集通常包含在训练过程中未使用的图像，并保留用于评估模型在未见数据上的性能。

* probability_model(...): This part uses the probability_model to make predictions on the selected images (x_test[:5]). Remember that the probability_model was created by taking your original trained model (model) and adding a Softmax layer on top of it. This Softmax layer converts the model's raw outputs (logits) into probabilities.
这部分代码使用 probability_model 对选定的图像（ x_test[:5] ）进行预测。请记住， probability_model 是通过将您的原始训练模型（ model ）添加一个 Softmax 层创建的。这个 Softmax 层将模型的原始输出（logits）转换为概率。

###In simpler terms:  简单来说：

* You're taking the first 5 images from your test dataset (x_test[:5]).
您正在从测试数据集中取前 5 张图片（ x_test[:5] ）。
* You're feeding these images to your probability_model, which is designed to output probabilities.
您将这些图片输入到您的 probability_model 中，该模型旨在输出概率。
* The probability_model processes the images and produces a set of probabilities for each image, indicating the likelihood of each image belonging to each of the 10 possible digit classes (0-9).
第 0 号处理图像并为每个图像生成一组概率，表示每个图像属于 10 个可能的数字类别（0-9）的可能性。

###Why is this useful?  为什么这很有用？

* Evaluating Performance: You can use these probabilities to see how confident your model is about its predictions on unseen data. For example, if the model assigns a high probability to the correct digit for most of the 5 images, it suggests that the model is performing well.

评估性能：您可以使用这些概率来查看模型对其在未见数据上的预测的信心程度。例如，如果模型在 5 张图像的大部分中为正确的数字分配了高概率，这表明模型表现良好。
* Making Decisions: In real-world applications, you might use these probabilities to make decisions based on the model's predictions. For example, if the model predicts a high probability for a certain digit, you might use that information to automatically classify or categorize the image.
在现实世界的应用中，您可能会使用这些概率根据模型的预测做出决策。例如，如果模型预测某个数字的概率很高，您可能会使用这个信息来自动分类或对图像进行分类。
* In essence, probability_model(x_test[:5]) is a way to get your model's predictions in the form of probabilities for a small sample of your test data. This helps you understand how your model is likely to perform on real-world data and allows you to use those probabilities for further analysis or decision-making.

本质上， probability_model(x_test[:5]) 是一种获取您模型对测试数据小样本预测概率的方法。这有助于您了解模型在现实世界数据上的可能表现，并允许您使用这些概率进行进一步分析或决策。

###怎么返回最高的那个概率，并对应相应的数字




Okay, let's figure out how to get the highest probability and the corresponding digit:
好的，让我们找出如何获得最高概率以及对应的数字：
```
# import numpy as np

# Assuming you have the output from probability_model(x_test[:5]) stored in a variable called 'predictions'
# For example: predictions = probability_model(x_test[:5])

# Loop through each prediction (for each of the 5 images)
for image_prediction in predictions:
  # Get the index of the highest probability using np.argmax
  predicted_digit = np.argmax(image_prediction)
  
  # Get the highest probability value
  highest_probability = image_prediction[predicted_digit]
  
  print(f"Predicted digit: {predicted_digit}, Highest Probability: {highest_probability}")
```
Here's a breakdown:  这里是一个分解：

*
```
# import numpy as np
```
*  We import the numpy library for numerical operations, specifically to use the argmax function.
我们导入 numpy 库进行数值运算，特别是使用 argmax 函数。
* predictions = probability_model(x_test[:5]): This line (which you likely have in your code already) gets the predictions from your probability_model for the first 5 images in your test dataset. It stores these predictions in a variable called predictions.
predictions = probability_model(x_test[:5]) : 这行代码（你可能在你的代码中已经有了）从你的 probability_model 中获取测试数据集中前 5 个图像的预测结果。它将这些预测结果存储在一个名为 predictions 的变量中。
* for image_prediction in predictions:: This loop iterates through each individual prediction within the predictions array. Each image_prediction represents the probabilities for a single image.
此循环遍历 predictions 数组中的每个单独预测。每个 image_prediction 代表单个图像的概率。
* predicted_digit = np.argmax(image_prediction): This is the key line. np.argmax finds the index of the maximum value within the image_prediction array. Since the array contains probabilities for each digit (0-9), the index corresponds to the predicted digit.
这是关键行。 np.argmax 在 image_prediction 数组中找到最大值的索引。由于数组包含每个数字（0-9）的概率，索引对应于预测的数字。
* highest_probability = image_prediction[predicted_digit]: This line retrieves the actual probability value of the predicted digit using the index we found in the previous step.
这一行使用上一步找到的索引获取预测数字的实际概率值。
* print(f"Predicted digit: {predicted_digit}, Highest Probability: {highest_probability}"): This line prints the predicted digit and its corresponding highest probability in a user-friendly format.
这一行代码打印出预测的数字及其对应最高概率，并以用户友好的格式显示。
In simpler terms:  简单来说：

For each of the 5 images, the code finds the digit with the highest probability assigned by the model and then prints both the digit and that probability.
对于每张 5 张图片，代码会找到模型分配的最高概率的数字，然后打印出该数字及其概率。
Example:  示例：

If for a particular image, the probability_model outputs the following probabilities: [0.02, 0.01, 0.03, 0.90, 0.01, 0.01, 0.005, 0.005, 0.01, 0.0]
如果对于特定图像， probability_model 输出以下概率： [0.02, 0.01, 0.03, 0.90, 0.01, 0.01, 0.005, 0.005, 0.01, 0.0]

np.argmax would return 3 (because the highest probability is at index 3).
np.argmax 将返回 3 （因为最高概率在索引 3）。
The code would then print: "Predicted digit: 3, Highest Probability: 0.90" indicating that the model is most confident that the image represents the digit 3 with a probability of 90%.
代码将打印：“预测数字：3，最高概率：0.90”，表示模型最有信心认为图像代表数字 3，概率为 90%。

In [20]:
import numpy as np

# Assuming you have the output from probability_model(x_test[:5]) stored in a variable called 'predictions'
# For example: predictions = probability_model(x_test[:5])
predictions = probability_model(x_test[:5])
# Loop through each prediction (for each of the 5 images)
for i, image_prediction in enumerate(predictions):
  print(f"第几次的测试： {i + 1}")
  # Get the index of the highest probability using np.argmax
  print(image_prediction)
  print(type(image_prediction))
  predicted_digit = np.argmax(image_prediction) # 列表的索引就是对应的数字
  print(predicted_digit)
  # Get the highest probability value
  highest_probability = image_prediction[predicted_digit]
  print(f"Image {i + 1}:")
  print(f"Actual digit: {y_test[i]}")
  print(f"Predicted digit: {predicted_digit}, Highest Probability: {highest_probability}")
  print("---------")

第几次的测试： 1
tf.Tensor(
[1.4097507e-08 1.2330316e-10 1.0518686e-06 1.6375237e-05 4.0629262e-12
 1.1858759e-07 1.1978489e-15 9.9998164e-01 1.5857450e-08 8.4896220e-07], shape=(10,), dtype=float32)
<class 'tensorflow.python.framework.ops.EagerTensor'>
7
Image 1:
Actual digit: 7
Predicted digit: 7, Highest Probability: 0.9999816417694092
---------
第几次的测试： 2
tf.Tensor(
[5.6619944e-09 5.8851412e-05 9.9992466e-01 2.8975305e-06 6.0194922e-15
 1.1718380e-05 6.1723320e-08 1.0554468e-14 1.7324928e-06 7.6257074e-16], shape=(10,), dtype=float32)
<class 'tensorflow.python.framework.ops.EagerTensor'>
2
Image 2:
Actual digit: 2
Predicted digit: 2, Highest Probability: 0.9999246597290039
---------
第几次的测试： 3
tf.Tensor(
[1.3362143e-07 9.9960762e-01 3.9587103e-05 1.3031118e-05 2.8080909e-05
 1.3133340e-06 7.5568219e-06 2.4732243e-04 5.4696611e-05 7.2432010e-07], shape=(10,), dtype=float32)
<class 'tensorflow.python.framework.ops.EagerTensor'>
1
Image 3:
Actual digit: 1
Predicted digit: 1, Highest Probabilit

###列表的索引就是对应的数字

## 结论

恭喜！您已经利用 [Keras](https://tensorflow.google.cn/guide/keras/overview) API 借助预构建数据集训练了一个机器学习模型。

有关使用 Keras 的更多示例，请查阅[教程](https://tensorflow.google.cn/tutorials/keras/)。要详细了解如何使用 Keras 构建模型，请阅读[指南](https://tensorflow.google.cn/guide/keras)。如果您想详细了解如何加载和准备数据，请参阅有关[图像数据加载](https://tensorflow.google.cn/tutorials/load_data/images)或 [CSV 数据加载](https://tensorflow.google.cn/tutorials/load_data/csv)的教程。
