This project implements a classic single-layer neural network for multiclass classification, following the Winner-Takes-All (WTA) approach. The key idea is that every language has a unique distribution of letter frequencies, and the model utilizes this characteristic for classification.
The model architecture:
-
Single layer (no hidden layers).
-
One perceptron per class — each perceptron is responsible for detecting one language.
-
Each perceptron computes its own net value.
-
The final predicted class is determined by selecting the perceptron with the maximum net output.
Load text files for multiple languages written in the Latin alphabet.
Generate a normalized vector of letter occurrences (26 letters A–Z).
Randomly shuffle and split the dataset into training and test sets (e.g., 80/20 split).
Train each perceptron independently using the delta rule and update the threshold. Initial weights are randomized and normalized.
For new inputs, compute net values for all perceptrons and assign the label of the perceptron with the highest net.
The model evaluates accuracy on both training and test dataset.
Simple terminal interaction to classify custom input texts after training
Sample text data has been added to the text directory. These files contain text in multiple languages (English, Polish and Spanish). The program expects text files in this format to train and classify the neural network.
Important: When classifying text manually, the longer the text, the better the classification result, as it allows the model to better capture the letter distribution.