### Multi-Class Classification
Now that we've explored a binary classification problem with linear and non-linear architectures, we now want to shift to a multi-class problem where there are more than two options that the model needs to be able to classify.

The multi-class data will be artificial data from the scikit-learn `make_blobs()` function. The general flow is as follows:
1. Make the artificial data and convert to tensors
2. Visualize the data
3. Define the model architecture
4. Train the model
5. Adjust hyperparameters as necessary

In [None]:
# First, lets explore the make_blobs() function. According to the documentation,
# make_blobs is designed for creating artificial multiclass data by creating isotropic, Gaussian clusters
# of points. The data is quite literally "blobs" of points around a "center" in R^n space. The classes could be based 
# on the number of centers in a feature set.

from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt
import torch
from torch import nn

n_points = 50
X_blob, y_blob = make_blobs([n_points, n_points], # array length = num blobs, n_points = points per blob
                            n_features=3,
                            centers=None, random_state=42) # returns coordinates of points (X) and its blob membership

fig = plt.figure()
plt.title('Two blobs with 3 features (x, y, z)')
ax = fig.add_subplot(projection='3d')
ax.scatter(X_blob[:,0], X_blob[:,1], X_blob[:, 2], c=y_blob)

In [None]:
# Now lets standardize this a little for the actual model. Will define the const values that will be used when creating
# the architecture (allow things to be updated once). This could probably eventually be refactored into a dataclass.

NUM_CLASSES = 4 # This is self explanatory, the number of blobs per training data instance
CLUSTER_POINTS = 100 # This is the number of points that are in each blob
NUM_FEATURES = 2 # This refers to the dimension of the data. In this case, the dimension of the the points in the blobs (the above example is 3D)
CLUSTER_STD_DEV = 1.0 # This changes the spread in each blob (makes classification more difficult!)
RANDOM_SEED = 42
TRAIN_TEST_RATIO = 0.2

In [None]:
# Now lets create our training data and move to tensors
X_blob, y_blob = make_blobs([CLUSTER_POINTS for centers in range(NUM_CLASSES)],
                            n_features=NUM_FEATURES,
                            centers=None,
                            random_state=RANDOM_SEED,
                            cluster_std=CLUSTER_STD_DEV)

X_blob = torch.from_numpy(X_blob).type(torch.float)
y_blob = torch.from_numpy(y_blob).type(torch.float)

X_train, X_test, y_train, y_test = train_test_split(
    X_blob,
    y_blob,
    test_size=TRAIN_TEST_RATIO, # Ratio of test data to use from full dataset; Training is the complement
    random_state=RANDOM_SEED,
)

In [None]:
# Now lets inspect our dataset to make sure it looks as expected
print(X_train[0:9]) # expect coordinates from R^(NUM_FEATURES)
print(y_train[0:14]) # expect values from 0-(NUM_CLASSES -1)
print(f'X ratio: {len(X_test)/len(X_train)}, y ratio: {len(y_test)/len(y_train)}') # should be ~TRAIN_TEST_RATIO
for obj in [X_train, X_test, y_train, y_test]: # expecting all to be torch.float
    print(obj.dtype)

In [None]:
# Now that the dataset properties look good, lets visualize it!
fig = plt.figure()
base_title = f'{NUM_CLASSES} blobs with {NUM_FEATURES} features'
if NUM_FEATURES >= 3:
    plt.title(base_title + ' (first 3 dims.)')
    ax = fig.add_subplot(projection='3d')
    ax.scatter(X_train[:,0], X_train[:,1], X_train[:, 2], c=y_train)
elif NUM_FEATURES == 2:
    plt.title(base_title)
    plt.scatter(X_train[:,0], X_train[:,1], c=y_train)

In [None]:
# Now lets (again) create a linear model first to see how well it performs on the dataset.
# Unlike before, we want the architecture to be somewhat modular; we'll leave the number of hidden
# layers fixed, but the number of hidden units and input dims will be modular.

class LinearBlobModel(nn.Module):
    def __init__(self, num_in_features: int, num_hidden_units: int, num_out_features: int):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(in_features=num_in_features, out_features=num_hidden_units), # Note that these are LINEAR layers (activation f'n is linear...)
            nn.Linear(in_features=num_hidden_units, out_features=num_hidden_units),
            nn.Linear(in_features=num_hidden_units, out_features=num_out_features)
        )

    def forward(self, x) -> torch.Tensor:
        return self.network(x)

lin_model_0 = LinearBlobModel(NUM_FEATURES, 8, NUM_CLASSES) 
lin_model_0

In [None]:
# Now that the model is created, lets see what the output looks like
with torch.inference_mode():
    lin_model_0.eval()
    untrained_logits = lin_model_0(X_train)
untrained_logits[0:4]

Looks like the model spits out an array of logit values for each training point. Each positiion in the array corresponds to a class. The ouput logit value for each class determines the "confidence" the model has that the training sample is a member point for that class. We can get these logits to probabilities by using the Softmax function.

In [None]:
untrained_preds = torch.softmax(untrained_logits, dim=1)
print(untrained_preds[0:4])
print(f'Predicted class for training sample 0: {torch.argmax(untrained_preds[0][0:4])}')
print(f'Actual class for training sample 0: {y_train[0]}')


Now all the values are between 0 and 1 and the sum of the output values for a given training sample sum to 1. In the printed example, the class at index 2 has the highest probability, so the model is saying it thinks that training sample 0 is a member of class 2. The model gets it wrong, as expected, since it's untrained.

In [None]:
# With the linear blob model created, lets again go ahead and define the optimizer and loss functions.
# Since we're dealing with multiclass classification now, we need to use a loss function that is multivariable.
# CrossEntropyLoss, it's your time! It's just the multi-class version of BCE
optimizer = torch.optim.SGD(lr=0.2, params=lin_model_0.parameters())
loss_fn = torch.nn.CrossEntropyLoss() # docs say this loss fn expects LOGITS, not probabilities like BCE.