Model Info
This code is an example of a machine learning pipeline for Digit Recognition.
This logistic regression model used in this code for classifying digits from the MNIST dataset has several practical applications and implications in the tech field:
    -Forms Processing: Automatic recognition and digitization of handwritten forms and documents.
    -Document Scanning: Enhancing the ability of scanners and software to recognize and digitize handwritten text.
    -Bank Forms: Digitization and processing of handwritten banking forms.


1 - Preparing the Data:
    It imports necessary libraries for machine learning, data handling, and visualization.
    scikit-learn (sklearn): A machine learning library that provides tools for data preprocessing, model training, and evaluation.
    pandas: A library for data manipulation and analysis.
    matplotlib.pyplot: For plotting and visualization.
    The MNIST dataset is fetched from OpenML. This dataset contains images of handwritten digits.

In [2]:

from sklearn.model_selection import cross_val_score
from sklearn.datasets import fetch_openml
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import  LogisticRegression
import seaborn as sns

2 - Fetching Data Extracting Data
    fetch_openml('mnist_784', version=1): Fetches the MNIST dataset, which consists of 70,000 images of handwritten digits (0-9), each of size 28x28 pixels.
    X: Contains the image data (features) in a flat array of 784 (28x28) pixels.
    Y: Contains the corresponding labels (targets), which are the digits (0-9).    

In [4]:
# Fetching mnist data
dtaset = fetch_openml('mnist_784', version=1)

OpenMLError: Dataset emnist-balanced with version 1 not found.

In [None]:
# Extracting data and target
X = dtaset['data']
Y = dtaset['target'] 

3 - Splitting the Data
    The dataset is split into training and testing sets using an 80-20 split. The train_test_split function is used for this purpose.
    The training set (x_train, y_train) is used to train the model, and the testing set (x_test, y_test) is used to evaluate its performance.

In [None]:
# Split the dataset into training and testing sets
x_train,x_test,y_train,y_test = train_test_split(dtaset.data,dtaset.target,test_size=0.2,random_state=45)
x_train = x_train.values

4 - Reshaping and Visualizing Data:
    An image from the training set is selected and reshaped to 28x28 pixels for visualization.
    The reshaped image is displayed using matplotlib.
    plt.imshow(some_digit_image, cmap='gray'): Displays the image using a grayscale colormap.
    plt.axis('off'): Hides the axes for a cleaner image display.
    plt.show(): Renders the image.

In [None]:
# Reshaping the image and Ploting
some_digit = x_train[36010]
some_digit_image = some_digit.reshape(28, 28)

plt.imshow(some_digit_image, cmap='gray')
plt.axis('off')
plt.show()

5 - Training the Logistic Regression Model
    A logistic regression model is instantiated and trained on the training data.
    The model is fitted using the training data (x_train, y_train).

In [None]:
#Training the model
model = LogisticRegression(max_iter=1000)
model.fit(x_train,y_train)

6 - Making Predictions:
    The trained model makes a prediction on the selected image.
    The predicted digit is printed.

In [None]:
# Making Prediction
model.predict([some_digit])
print('The predicted digit is ',model.predict([some_digit]))

7 - Evaluating the Model:
    The accuracy of the model is evaluated using cross-validation with 3 folds. The cross_val_score function is used to perform this evaluation.

In [None]:
# Accuracy testing
cross_val_score(model,x_train,y_train,cv=3,scoring='accuracy')