## Multi-Class Neural Networks

Earlier, you encountered binary classification models that could pick between one of two possible choices, such as whether:

- A given email is spam or not spam.
- A given tumor is malignant or benign.

In this module, we'll investigate **multi-class** classification, which can pick from multiple possibilities. For example:

- Is this dog a beagle, a basset hound, or a bloodhound?
- Is this flower a Siberian Iris, Dutch Iris, Blue Flag Iris, or Dwarf Bearded Iris?
- Is that plane a Boeing 747, Airbus 320, Boeing 777, or Embraer 190?
- Is this an image of an apple, bear, candy, dog, or egg?

Some real-world multi-class problems entail choosing from millions of separate classes. For example, consider a multi-class classification model that can identify the image of just about anything.

## Multi-Class Neural Networks: One vs. All

**One vs. all** provides a way to **leverage binary classification**. Given a classification problem with N possible solutions, a one-vs.-all solution consists of N separate binary classifiers—one binary classifier for each possible outcome. During training, the model runs through a sequence of binary classifiers, training each to answer a separate classification question. For example, given a picture of a dog, five different recognizers might be trained, four seeing the image as a negative example (not a dog) and one seeing the image as a positive example (a dog). That is:

1. Is this image an apple? No.
2. Is this image a bear? No.
3. Is this image candy? No.
4. Is this image a dog? Yes.
5. Is this image an egg? No.

This approach is fairly reasonable when the total number of classes is small, but becomes increasingly inefficient as the number of classes rises.

We can create a significantly more efficient one-vs.-all mocel with a deep neural network in which each output node represents a defferent class. The following figure suggests this approach:

![](img/17-1.png)

## Multi-Class Neural Networks: Softmax

Recall that [logistic regression](https://developers.google.com/machine-learning/crash-course/logistic-regression/video-lecture?hl=en) produces a decimal between 0 and 1.0. For example, a logistic regression output of 0.8 from an email classifier suggests an 80% chance of an email being spam and a 20% chance of it being not spam. Clearly, the sum of the probabilities of an email being either spam or not spam is 1.0.

**Softmax** extends this idea into a multi-class world. That is, Softmax assigns decimal probabilities to each class in a **multi-class problem**. Those decimal probabilities must add up to 1.0. This additional constraint helps training converge more quickly than it otherwise would.

For example, returning to the image analysis we saw in Figure 1, Softmax might produce the following likelihoods of an image belonging to a particular class:

![](img/17-2.png)

Soft max is impemented through a nerual network layer just before the output layer. The Softmax layer must have the same number of nodes as the output layer.

![](img/17-3.png)



This is Softmax equation is as follows:
$$ p(y=j|x) = \frac{e^{(W^T_jX+b_j)}}{\sum_{k \in K}{e^{(W^T_kX+b_k)}}} $$

Note that this formula basically extends the formula for logistic regression into multiple classes.


### Softmax Options

Consider the following variants of Softmax:

- **Full Softmax** is the Softmax we've been discussing; that is, Softmax calcuates a probability for every possible class.
- **Candidate sampling** means that Softmax calcuates a probability for all the positive labels but only for a drandom sample of negative labels. For example, if we are interested in determining whether an input image is a beagle or a bloodhound, we don't have to provide probabilites for every non-doggy example.

Full Softmax is fairly cheap when the number of classes is small but becomes prohibitively expensive when the number of classes climbs. Candidate sampling can improve efficiency in problems having a large number of classes.

### One Label cs. Many Labels

Softmax assumes that each example is a member of exactly one class. Some examples, however, can simultaneously be a member of multople classes. For such examples:

- You may not use Softmax
- You must rely on mutiple logistic regressions.

For example, suppose your exaples are images containing exactly one item-a piece of fruit. Softmax can determine the likelihood of that one item being a pear, an orange, an apple, and so on. If your examples are images containing all sorts of things—bowls of different kinds of fruit—then you'll have to use multiple logistic regressions instead.