<a href="https://colab.research.google.com/github/smmurdock/Learn-TensorFlow/blob/main/02_neural_network_classification_with_tf.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 02. Neural Network Classification with TensorFlow

Let's look at how we can approach a classification problem.

A classification problem involves predicting whether something is one thing or another.

For example, you might want to:
* Predict whether or not someone has heart disease based on their health parameters. This is called **binary classification** since there are only two options.
* Decide whether a photo is of a food, a person, or a dog. This is called **multi-class classification** since there are more than two options.
* Predict which categories should be assigned to a Wikipedia article. This is called **multi-label classification** since a single article could have more than one category assigned.

In this notebook, we're going to work through a number of different classification problems with TensorFlow. In other words, taking a set of inputs and predicting what class those set of inputs belong to.

## What we're going to cover

Specifically, we're going to go through doing the following with TensorFlow:
* Architecture of a classification model
* Input shapes and output shapes
  * `X`: features/data (input)
  * `y`: labels (outputs)
    * "What class do the inputs belong to?"
* Creating custom data to view and fit
* Steps in modeling for binary and multi-class classification
  * Creating a model
  * Compiling a model
    * Defining a loss function
    * Setting up an optimizer
      * Finding the best learning rate
    * Creating evaluation metrics
  * Fitting a model (getting it to find patterns in our data)
  * Improving a model
* The power of non-linearity
* Evaluating classification models
  * Visualizing the model ("visualize, visualize, visualize")
  * Looking at training curves
  * Compare predictions to ground truth (using our evaluation metrics)

## Typical architecture of a classification neural network

The word _typical_ is on purpose.

That's because the architecture of a classification neural network can widely vary depending on the problem you're working on.

However, there are some fundamentals all deep neural networks contain:

* An input layer.
* Some hidden layers.
* An output layer.

Much of the rest is up to the data analyst creating the model.

The following are some standard values you'll often use in your classification neural networks:

Hyperparamter | Binary Classification | Multiclass Classification
:-- | :-- | :--
Input layer shape | Same as number of features (e.g. 5 for age, sex, height, weight, smoking status in heart disease prediction) | Same as binary classification
Hidden layer(s) | Problem specific, minimum = 1, maximum = unlimited | Same as binary classification
Neurons per hidden layer | Problem specific, generally 10 to 100 | Same as binary classification
Outer layer shape | 1 (one class or the other) | 1 per class (e.g. 3 for food, person, or dog photo)
Hidden activation | Usually ReLU (rectified linear unit) | Same as binary classification
Output activation | Sigmoid | Softmax
Loss function | Cross entropy (`tf.keras.losses.BinaryCrossentropy` in TensorFlow) | Cross entropy (`tf.keras.losses.CategoricalCrossentropy` in TensorFlow)
Optimizer | SGD (stochastic gradient descent), Adam | Same as binary classification

Table 1: Typical architecture of a classification network. Source: Adapted from page 295 of Hands-On Machine Learning with Scikit-Learn, Keras, & TensorFlow book by Aurelien Geron.

In [1]:
import tensorflow as tf
print(tf.__version__)

import datetime
print(f"Notebook last run (end-to-end): {datetime.datetime.now()}")

2.19.0
Notebook last run (end-to-end): 2025-09-29 18:45:11.931923
