**a. Problem formulation:**

Identification handwritten digits from 0-9 in gray-scaled images with a resolution of 28 x 28.

**b. How train and test data are sampled:**
- Train and test data are already splitted with 6/1 ratio, around 60,000 images for training, and 10,000 for test.

**c. Feature transformation process:**

*CNN:*
- Pixels are normalized from the range of [0-255] to the range of [0-1].
- As images are two dimensional (width, height), another dimension (channel) is added. It will always have 1 as value. This is added because CNN implementation expects three-dimensional inputs.
- To properly format outputs for the output layer, categorical digit labels are converted into a binary matrix using one-hot encoding.

*ANN:*
- Edge detecting with sobel algorithm (edges are an important feature in images).
- Pixels are normalized from the range of [0-255] to the range of [0-1].
- Images are flattened to 1D vector.
- To properly format outputs for the output layer, the vector is converted into a binary using one-hot encoded labels.
**d. Architectures:**

*CNN:*
- Input layer: Accepts a three dimensional image with single channel.
- First convolutional layer: 32 filters, kernal size of 3x3, ReLU as activation function. Outputs a 3d shape of (26, 26, 32)
- First max-pooling layer: Downsamples the feature map with a 2x2 window by taking the max value. Outputs a shape of (13, 13, 32)
- Second convolutional layer: Applies 64 filters with a kernal of size 3x3, ReLU as activation function. Outputs a shape of (11, 11, 64).
- Second max-pooling layer: Downsamples the feature map with 2x2 window. Outputs a shape of (5,5,64).
- Flatten layer: flattens the output of second max-pooling layer to a 1D vector. Output size = 1600.
- Fully connected layer: Fully connects the flattened layer to 128 neurones, uses ReLU as activation function, and L2 as a regularizer with a penalty of 0.001 to avoid overfitting. Outputs size =  128.
- Dropout layer: Randomly disables 50% of neurones which also helps to avoid overfitting.
- Output layer: Fully connects the output of the dropout layer to 10 neurones representing the categorical classes for the outpout (digits from 0 - 9)
- Hyperparameters: 
	- Optimizer: Adam with a learning rate of 0.001
	- Loss Function: Categorical cross-entropy (multi-class classification).
	- Batch Size: 32
	- Epochs: 10
	- Regularization: L2 in the fully-connected layer with 0.001 for penalty factor.
	- Dropout rate: 0.5
	- Callbacks are done to stop training if no improvement is noticed over 3 epochs, 	and another to reduce learning rate if needed.

*ANN:*
- Input layer: equals the shape of the flattened input, automatically detected.
- First hidden layer: Dense layer, 128 neurones, with ReLU as activation function, and L2 as a regularizer. Followed by a dropout layer of 30% of neurones to handle overfitting.
- Second hidden layer: Dense layer, 64 neurones, ReLU as activation function and L2 as a regularized. Followed by a dropout layer with a rate of 30%.
- Third hidden layer: : Dense layer, 32 neurones, ReLU as activation function and L2 as a regularized. Followed by a dropout layer with a rate of 30%.
- Output layer: Dense layer, 10 neurones with softmax as activation function to convert raw output into probability distribution over the 10 digit classes.
- Hyper parameter:
	- Optimizer: Adam with an adaptive learning rate.
	- Loss Function: Categorical cross-entropy (multi-class classification).

**e. Evaluation metrics:**

*CNN*
- Training and validation loss: to monitor overfitting (if training loss decreased but validation loss increased) and underfitting (both validation and training loss are high).
- Train accuracy: determine the performance of the model
- Test accuracy: determine the performance of the model on unseen data.
- F1-score: As we are in the case of multi-class classification, it enables to ensure the performance even in class imbalance situations
- Confusion Matrix: Provides a detailed view of model performance at the class level.

*ANN*
- Training and Validation Loss: Loss stabilizes at 0.40 for both, with no signs of overfitting or underfitting. 
-Training Accuracy: Reaches 92%, showing strong performance on training data. 
-Validation Accuracy: Achieves 90%, indicating good generalization to unseen data. 
-F1-Score: Likely high, as the confusion matrix shows strong class-wise performance with minimal imbalance effects. 
-Confusion Matrix: High accuracy across classes, with minor misclassifications (e.g., between 8 and 9).

**3. Results:**

*CNN*
- loss and accuracy significantly change after the first epoch, then it starts to improve little by little until reaching the 10th epoch with an accuracy and loss of (0.9858, 0.1058) respectivly for train, and (0.9891, 0.0892) respectively for validation. We notice no improvement in performance after the 9th epoch.

*ANN*

-The loss and accuracy significantly improve after the first few epochs, with both metrics gradually stabilizing as training progresses. By the 18th epoch, the training accuracy and loss reach approximately (0.92, 0.40), while the validation accuracy and loss are around (0.90, 0.40). The model shows consistent performance across training and validation datasets, with no major overfitting or divergence. We observe minimal improvement in accuracy and loss after the 15th epoch, indicating convergence in performance.

**4. In addition to result discussion in the last section**

- Some classes are harder to identify than others as they resemble one another. Ex. 	8 and 9.
- Real-world images may be different from the MNIST dataset, with maybe noise or 	different colors that may affect the result even if converted to grayscale.
