#Digit Recognition Using Random Forest Classifier

 Introduction
This project demonstrates the use of a Random Forest Classifier to recognize handwritten digits. The dataset used contains images of digits, and the goal is to correctly classify each image into one of the 10 digit classes (0-9). Random Forest, an ensemble learning method, is employed to enhance the model's accuracy.

##1. Loading the Dataset
We start by importing the necessary libraries and loading the dataset, which is stored in a CSV file. The dataset's shape and the first row are displayed to get an initial understanding of the data.

In [17]:
# Importing necessary libraries
import pandas as pd

# Loading the dataset
dataset = pd.read_csv('trainset.csv')

# Display the shape of the dataset and the first row
print(dataset.shape)
print(dataset.head(1))

(42000, 255)
   label  pixel0  pixel1  pixel2  pixel3  pixel4  pixel5  pixel6  pixel7  \
0      1       0       0       0       0       0       0       0       0   

   pixel8  ...  pixel244  pixel245  pixel246  pixel247  pixel248  pixel249  \
0       0  ...        77         0         0         0         0         0   

   pixel250  pixel251  pixel252  pixel253  
0         0         0         0         0  

[1 rows x 255 columns]


##2. Data Preprocessing
The dataset is divided into features (`x`) and the target variable (`y`). In this case, `x` represents the pixel values of the images, and `y` represents the corresponding digit labels.

In [7]:
# Splitting the dataset into features (X) and target variable (y)
x = dataset.iloc[:, 1:].values
y = dataset.iloc[:, 0].values

##3. Splitting the Data
To evaluate the model's performance, we split the data into training and testing sets. We use 75% of the data for training and reserve 25% for testing.

In [8]:
# Splitting the data into training and testing sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25 , random_state=0)

# Displaying the shapes of the training and testing sets
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

(31500, 254)
(31500,)
(10500, 254)
(10500,)


##4. Model Training
We import the `RandomForestClassifier` from the `sklearn.ensemble` module and fit the model on the training data. Random Forest is chosen for its ability to handle complex datasets and its effectiveness in preventing overfitting.

In [9]:
# Importing and fitting the RandomForestClassifier
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier()
classifier.fit(x_train, y_train)

##5. Making Predictions
With the trained model, we predict the digit labels for the test set. This will allow us to compare the predicted labels with the actual labels and assess the model's accuracy.

In [11]:
# Predicting the test set results
y_pred = classifier.predict(x_test)


##6. Model Evaluation
To evaluate the model's performance, we randomly select five predictions and display them alongside their actual labels. This provides a quick visual check of how well the model is performing. Finally, the accuracy score is calculated, which indicates the percentage of correctly classified digits.

In [15]:
# Displaying 5 random predictions with their actual values
from random import randint
count = 0
while count<5 :
  index = randint(1, len(y_pred))
  print(f'Actual : {y_test[index]} | Predicted : {y_pred[index]}')
  count+=1



Actual : 3 | Predicted : 3
Actual : 3 | Predicted : 3
Actual : 4 | Predicted : 4
Actual : 8 | Predicted : 3
Actual : 1 | Predicted : 1


##7. Conclusion
The Random Forest Classifier achieved an accuracy of 76% on this digit recognition task, which is a solid baseline performance. To further improve accuracy, additional steps such as hyperparameter tuning, feature engineering, and experimenting with different algorithms could be explored.



In [16]:
# Calculating and printing the accuracy of the model
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)*100
print(f'Accuracy : {accuracy.round()}')

Accuracy : 76.0


###`Github: hellopavi`