# Softmax Regression 
## Multiclass Classification
**Generalized Regression** 


$Z_j = W_j \cdot X + b_j$


$ a_j = \frac{\exp^{Z_j}}{\sum_{k=1}^{N} \exp^{Z_k}} = P(y = j | X)$


For example, in case of classification of digits 0 to 9, 

$ a_7 = \frac{\exp^{Z_7}}{\sum_{k=1}^{10} \exp^{Z_k}} = P(y = 7 | X)$


#### Loss function

$ loss(a_1, a_2, \dots, a_N, y) = -\log(a_j)$ if $y=j$

### NN with Softmax Output

$[X] \rightarrow [ReLU a^{[1]}] \rightarrow [ReLU a^{[2]}] \rightarrow [\text{Softmax } a^{[3]}] \rightarrow [y]$

### Loss function

```python
SparseCategoricalCrossentropy(from_logits=True)
```
Directly calculate logistics formula instead of calculating intermediate terms $a_j$

Note: Use `activation='linear'` as the output layer.


### SparseCategorialCrossentropy or CategoricalCrossEntropy
Tensorflow has two potential formats for target values and the selection of the loss defines which is expected.
- SparseCategorialCrossentropy: expects the target to be an integer corresponding to the index. For example, if there are 10 potential target values, y would be between 0 and 9. 
- CategoricalCrossEntropy: Expects the target value of an example to be one-hot encoded where the value at the target index is 1 while the other N-1 entries are zero. An example with 10 potential target values, where the target is 2 would be [0,0,1,0,0,0,0,0,0,0].

### Prediction
`tf.nn.sigmoid(logit)`

## Multilabel-classification

$[X] \rightarrow [ReLU a^{[1]}] \rightarrow [ReLU a^{[2]}] \rightarrow [\text{Sigmoid } a^{[3]}] \rightarrow [y_1, y_2, \dots, y_K]$

