## MonReader (Computer Vision)

#### Background:

##### MonReader is a new mobile document digitization experience for the blind, for researchers and for everyone else in need for fully automatic, highly fast and high-quality document scanning in bulk. It is composed of a mobile app and all the user needs to do is flip pages and everything is handled by MonReader: it detects page flips from low-resolution camera preview and takes a high-resolution picture of the document, recognizing its corners and crops it accordingly, and it dewarps the cropped document to obtain a bird's eye view, sharpens the contrast between the text and the background and finally recognizes the text with formatting kept intact, being further corrected by MonReader's ML powered redactor.

#### Data Description:

##### We collected page flipping video from smart phones and labelled them as flipping and not flipping.

##### We clipped the videos as short videos and labelled them as flipping or not flipping. The extracted frames are then saved to disk in a sequential order with the following naming structure: VideoID_FrameNu

#### Goal(s):

##### Predict if the page is being flipped using a single image.

#### Success Metrics:

##### Evaluate model performance based on F1 score, the higher the better.

#### Bonus(es):

##### Predict if a given sequence of images contains an action of flipping.

##### First, import necessary libraries.

In [31]:
import pandas as pd
import numpy as np
import torch # PyTorch package
from torch import nn, optim
from torch.utils.data import Dataset, DataLoader
from torchvision import *
from torchsummary import summary
from pytorchtools import EarlyStopping
import time
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import splitfolders
import os
import re
import PIL
from PIL import Image
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
matplotlib.use("Agg")
import warnings
warnings.filterwarnings('ignore')

#importing functions and parameters files
from functions import Net, file_screener
from params import bs, num_epochs, device, patience, lr

##### Let's split the training folder into training and validation folders, load and transform the data.

In [2]:
#only run one time
#splitfolders.ratio("images/training", output="training-validation",
#    seed=1337, ratio=(.8, .2), group_prefix=None, move=False) #

In [3]:
transform = transforms.Compose([
    transforms.CenterCrop(1080),
    transforms.Resize(224),
    transforms.ToTensor()])

In [4]:
#loading the data
train_data = datasets.ImageFolder('training-validation/train', transform = transform)
val_data = datasets.ImageFolder('training-validation/val', transform = transform)
test_data = datasets.ImageFolder('images/testing', transform = transform)
print(train_data.classes)

['flip', 'notflip']


##### We can see that there are two classes, flip and notflip.

In [5]:
# initialize the train, validation, and test data loaders
trainloader = torch.utils.data.DataLoader(train_data, shuffle = True, batch_size = bs)
valloader = torch.utils.data.DataLoader(val_data, batch_size = bs)
testloader = torch.utils.data.DataLoader(test_data, batch_size = bs)

In [6]:
# calculate steps per epoch for training and validation set
trainSteps = len(trainloader.dataset) // bs
valSteps = len(valloader.dataset) // bs

In [7]:
train_inputs, train_classes = next(iter(trainloader))
print(f'train input size: {train_inputs.shape}, train class size: {train_classes.shape}')
val_inputs, val_classes = next(iter(valloader))
print(f'val input size: {val_inputs.shape}, val class size: {val_classes.shape}')
test_inputs, test_classes = next(iter(testloader))
print(f'test input size: {test_inputs.shape}, test class size: {test_classes.shape}')

train input size: torch.Size([16, 3, 224, 224]), train class size: torch.Size([16])
val input size: torch.Size([16, 3, 224, 224]), val class size: torch.Size([16])
test input size: torch.Size([16, 3, 224, 224]), test class size: torch.Size([16])


In [8]:
# visualizing the images
plt.figure(figsize=(20, 10))
for i in range(5):
    plt.subplot(1, 5, i+1)
    plt.imshow(train_inputs[i].permute(1, 2, 0))
    plt.title([train_data.classes[train_classes[i]]])

In [9]:
# measure how long training is going to take
print("[INFO] training the network...")
startTime = time.time()

[INFO] training the network...


##### We created a CNN model to use for image prediction using the Binary Cross Entropy as our loss function and Adam as our optimizer.

In [10]:
#calling our model
model = Net()

# initialize our optimizer and loss function
# specify loss function
lossFn = nn.BCELoss() 
# specify optimizer
opt = optim.Adam(model.parameters(), lr=lr)

##### We are implementing early stopping while training the model in order to prevent overfitting. Early stopping keeps track of the validation loss and is used to stop the training, if the loss stops decreasing.

In [11]:
#Train the model using early stopping
def train_model(model, bs, patience, num_epochs):

	# to store training history
	train_loss = []
	train_acc = []
	val_loss = []
	val_acc = []

	# initialize the early_stopping object
	early_stopping = EarlyStopping(patience=patience, verbose=True)

	#training loop
	# loop over our epochs
	for e in range(0, num_epochs):
		# set the model in training mode
		model.train()
		# initialize the total training and validation loss
		totalTrainLoss = 0
		totalValLoss = 0
		# initialize the number of correct predictions in the training
		# and validation step
		trainCorrect = 0
		valCorrect = 0
		# loop over the training set
		for (x, y) in trainloader:
			# send the input to the device
			(x, y) = (x.to(device), y.to(device))
			# clear the gradients of all optimized variables
			opt.zero_grad()
			# perform a forward pass and calculate the training loss
			pred = model(x)
			loss = lossFn(pred, y.type(torch.float32).unsqueeze(1))
			# zero out the gradients, perform the backpropagation step,
			# and update the weights
			loss.backward()
			opt.step()
			# add the loss to the total training loss so far and
			# calculate the number of correct predictions
			totalTrainLoss += loss
			trainCorrect += (torch.round(pred) == y).type(torch.float).mean().item()

		# Zeroing gradient, performing backpropagation, updating weights of the model
		# set the model in evaluation mode
		with torch.no_grad():
			model.eval() # prep model for evaluation
			# loop over the validation set
			for (x, y) in valloader:
				# send the input to the device
				(x, y) = (x.to(device), y.to(device))
				# make the predictions and calculate the validation loss
				pred = model(x)
				totalValLoss += lossFn(pred, y.type(torch.float32).unsqueeze(1))
				# calculate the number of correct predictions
				valCorrect += (torch.round(pred) == y).type(torch.float).mean().item()
	
		# calculate the average training and validation loss
		avgTrainLoss = totalTrainLoss / trainSteps
		avgValLoss = totalValLoss / valSteps
		
		# calculate the training and validation accuracy
		avgtrainCorrect = trainCorrect / len(trainloader)
		avgvalCorrect = valCorrect / len(valloader)
		
		# update our training history
		train_loss.append(avgTrainLoss.cpu().detach().numpy())
		train_acc.append(avgtrainCorrect)
		val_loss.append(avgValLoss.cpu().detach().numpy())
		val_acc.append(avgvalCorrect)
		
		# print the model training and validation information
		print("[INFO] Epoch: {}/{}".format(e + 1, num_epochs))
		print("Train loss: {:.6f}, Train accuracy: {:.4f}".format(avgTrainLoss, avgtrainCorrect))
		print("Valid loss: {:.6f}, Valid accuracy: {:.4f}\n".format(avgValLoss, avgvalCorrect))

		# early_stopping needs the validation loss to check if it has decresed, 
		# and if it has, it will make a checkpoint of the current model
		early_stopping(avgValLoss, model)
        
		if early_stopping.early_stop:
			print("Early stopping")
			break
	
	# load the last checkpoint with the best model
	model.load_state_dict(torch.load('checkpoint.pt'))
	
	return  model, train_loss, train_acc, val_loss, val_acc

In [13]:
model, AvgTrainLoss, avgtrainCorrect, avgValLoss, avgvalCorrect = train_model(model, bs, patience, num_epochs)

[INFO] Epoch: 1/50
Train loss: 0.701250, Train accuracy: 0.5156
Valid loss: 0.716323, Valid accuracy: 0.5146

Validation loss decreased (inf --> 0.716323).  Saving model ...
[INFO] Epoch: 2/50
Train loss: 0.672559, Train accuracy: 0.4917
Valid loss: 0.582616, Valid accuracy: 0.7659

Validation loss decreased (0.716323 --> 0.582616).  Saving model ...
[INFO] Epoch: 3/50
Train loss: 0.342422, Train accuracy: 0.5254
Valid loss: 0.318296, Valid accuracy: 0.8668

Validation loss decreased (0.582616 --> 0.318296).  Saving model ...
[INFO] Epoch: 4/50
Train loss: 0.177804, Train accuracy: 0.5238
Valid loss: 0.198784, Valid accuracy: 0.9184

Validation loss decreased (0.318296 --> 0.198784).  Saving model ...
[INFO] Epoch: 5/50
Train loss: 0.115631, Train accuracy: 0.5321
Valid loss: 0.136663, Valid accuracy: 0.9413

Validation loss decreased (0.198784 --> 0.136663).  Saving model ...
[INFO] Epoch: 6/50
Train loss: 0.048369, Train accuracy: 0.5309
Valid loss: 0.174245, Valid accuracy: 0.9057



##### The model stopped training after the 26th epoch.

In [14]:
# visualize the loss as the network trained
fig = plt.figure(figsize=(10,8))
plt.plot(range(1,len(AvgTrainLoss)+1),AvgTrainLoss, label='Training Loss')
plt.plot(range(1,len(avgValLoss)+1),avgValLoss,label='Validation Loss')

# find position of lowest validation loss
minposs = avgValLoss.index(min(avgValLoss))+1 
plt.axvline(minposs, linestyle='--', color='r',label='Early Stopping Checkpoint')

plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.ylim(0, 0.5) # consistent scale
plt.xlim(0, len(AvgTrainLoss)+1) # consistent scale
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.show()
fig.savefig('loss_plot.png', bbox_inches='tight')

In [15]:
# finish measuring how long training took
endTime = time.time()
print("[INFO] total time taken to train the model: {:.2f}s".format(endTime - startTime))

[INFO] total time taken to train the model: 5053.38s


##### We will now test the model.

In [16]:
def test_model(model, lossFn, testloader):
  model.eval()
  test_loss = 0
  correct = 0
  pred_list, true_list = [], []

  with torch.no_grad():
    for (x, y) in testloader:
      if len(x.shape) == 3:
        x = torch.unsqueeze(x,0)
      out = model(x).flatten()
      if type(x)!=type(y):
        y=torch.Tensor([y])
      test_loss += lossFn(out, y.type(torch.float32))
      correct += torch.round(out).eq(y).sum()
      pred_list.append(torch.round(out))
      true_list.append(y.type(torch.float32))

      # Print every 100 iterations
      if (i + 1) % 100 == 0:
        print(f"Iteration {i+1}/{len(testloader)}")
      
    test_loss /= len(testloader)
    print('\nTest set: Avg. loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
            test_loss, correct, len(testloader),
            100. * correct / len(testloader)))
    
  return pred_list, true_list

predictions, labels = test_model(model, lossFn, test_data)


Test set: Avg. loss: 0.0356, Accuracy: 592/597 (99%)



In [17]:
# generate a classification report
print(classification_report([i.item() for i in predictions], [i.item() for i in labels]))

              precision    recall  f1-score   support

         0.0       0.99      1.00      0.99       287
         1.0       1.00      0.99      0.99       310

    accuracy                           0.99       597
   macro avg       0.99      0.99      0.99       597
weighted avg       0.99      0.99      0.99       597



##### We got a F1-score of 99%.

#### Classifying Sequence of Images

##### In order to classify sequences of images, we first divide the files by name (ID).

In [18]:
#To write a list of file names

#Define the path to the folder
#for loop for folders inside all folders
folder_path = "images/training"

# Get all files
all_files = []
for fold in os.listdir(folder_path):
    all_files.extend(os.listdir(os.path.join(folder_path, fold)))

# Get the IDs
all_files = pd.Series(all_files)
unique_ids = all_files.str.split('_', n=1, expand=True)[0].unique()

# Dictionary to store counts of files for each unique ID
file_counts = {id_: 0 for id_ in unique_ids}

# Count files for each unique ID
for file_name in all_files:
    file_id = file_name.split('_', 1)[0]
    if file_id in file_counts:
        file_counts[file_id] += 1

##### We will now train the model by iterating over each sequence (ID).

In [27]:
#calling our model
seqmodel = Net()

# Iterate over the items of the dictionary
for id_, count in file_counts.items():
    print(f'Processing ID: {id_}, Count: {count}')

    # Create a DataLoader for the current ID
    seq_data = datasets.ImageFolder('images/training', transform=transform, is_valid_file=file_screener)
    seq_data.samples = [(sample, label) for sample, label in seq_data.samples if f'{id_}_' in sample]
    seq_loader = DataLoader(seq_data, batch_size=1, shuffle=False)

    # set the model in training mode
    seqmodel.train()
    # Iterate over the data loader
    for images, labels in seq_loader:
        opt.zero_grad()
        outputs = seqmodel(images)
        loss = lossFn(outputs, labels.unsqueeze(1).type(torch.float32))
        loss.backward()
        opt.step()

Processing ID: 0001, Count: 40
Processing ID: 0002, Count: 31
Processing ID: 0003, Count: 48
Processing ID: 0004, Count: 46
Processing ID: 0005, Count: 50
Processing ID: 0006, Count: 41
Processing ID: 0007, Count: 51
Processing ID: 0008, Count: 48
Processing ID: 0009, Count: 43
Processing ID: 0010, Count: 48
Processing ID: 0011, Count: 48
Processing ID: 0012, Count: 17
Processing ID: 0013, Count: 22
Processing ID: 0014, Count: 12
Processing ID: 0015, Count: 49
Processing ID: 0016, Count: 44
Processing ID: 0017, Count: 17
Processing ID: 0018, Count: 21
Processing ID: 0019, Count: 40
Processing ID: 0020, Count: 31
Processing ID: 0021, Count: 23
Processing ID: 0022, Count: 16
Processing ID: 0023, Count: 39
Processing ID: 0024, Count: 30
Processing ID: 0025, Count: 34
Processing ID: 0026, Count: 36
Processing ID: 0027, Count: 30
Processing ID: 0028, Count: 53
Processing ID: 0029, Count: 37
Processing ID: 0030, Count: 52
Processing ID: 0031, Count: 34
Processing ID: 0032, Count: 32
Processi

##### We will now test the model over sequences of images. In order to test accuracy, if the maximum probability of a sequence exceeeds 0.5, the sequence will be labeled as 1.

In [28]:
#Testing for sequence of images.
#To write a list of file names

#Define the path to the folder
#for loop for folders inside all folders
folder_path_test = "images/testing"

# Get all files
test_files = []
for fold in os.listdir(folder_path_test):
    test_files.extend(os.listdir(os.path.join(folder_path_test, fold)))

# Get the IDs
test_files = pd.Series(test_files)
test_unique_ids = test_files.str.split('_', n=1, expand=True)[0].unique()

# Dictionary to store counts of files for each unique ID
testfile_counts = {id_: 0 for id_ in test_unique_ids}

# Count files for each unique ID
for file_name in test_files:
    test_file_id = file_name.split('_', 1)[0]
    if test_file_id in testfile_counts:
        testfile_counts[test_file_id] += 1

In [30]:
# Initialize lists to collect true and predicted labels for all IDs
all_true_labels = []
all_predicted_labels = []

# Iterate over each unique ID
for id_ in test_unique_ids:
    print(f'Processing Test ID: {id_}')

    # Create a DataLoader for the current ID
    test_seq_data = datasets.ImageFolder('images/testing', transform=transform, is_valid_file=file_screener)
    test_seq_data.samples = [(sample, label) for sample, label in test_seq_data.samples if f'{id_}_' in sample]
    test_seq_loader = DataLoader(test_seq_data, batch_size=1, shuffle=False)

    true_labels = []
    predicted_labels = []

    # Iterate over the DataLoader for the current ID
    for images, labels in test_seq_loader:
        with torch.no_grad():
            if len(images.shape) == 3:
                images = torch.unsqueeze(images, 0)
            outputs = seqmodel(images).flatten()

            # Append true and predicted labels
            true_labels.extend(labels.cpu().numpy())
            predicted_labels.extend(torch.round(outputs).cpu().numpy())

    # Calculate the predicted label as the maximum probability
    predicted_label = 1 if np.max(outputs.numpy()) > 0.5 else 0
    
    # Append the predicted label for the current ID
    all_predicted_labels.append(predicted_label)

    # Check if there is at least one positive label in the ground truth labels
    true_label = 1 if np.sum(true_labels) > 0 else 0

    # Append the true label for the current ID
    all_true_labels.append(true_label)

# Convert the lists of true and predicted labels to numpy arrays
all_true_labels = np.array(all_true_labels)
all_predicted_labels = np.array(all_predicted_labels)

# Calculate the overall accuracy
overall_accuracy = np.sum(all_true_labels == all_predicted_labels) / len(all_true_labels) * 100

# Print the overall accuracy
print(f'Overall Test Accuracy: {overall_accuracy:.2f}%')

# Generate and print the overall classification report
overall_report = classification_report(all_true_labels, all_predicted_labels)
print("Overall Classification Report:")
print(overall_report)


Processing Test ID: 0001
Processing Test ID: 0002
Processing Test ID: 0003
Processing Test ID: 0004
Processing Test ID: 0005
Processing Test ID: 0006
Processing Test ID: 0007
Processing Test ID: 0008
Processing Test ID: 0009
Processing Test ID: 0010
Processing Test ID: 0011
Processing Test ID: 0012
Processing Test ID: 0013
Processing Test ID: 0014
Processing Test ID: 0015
Processing Test ID: 0016
Processing Test ID: 0017
Processing Test ID: 0018
Processing Test ID: 0019
Processing Test ID: 0020
Processing Test ID: 0021
Processing Test ID: 0022
Processing Test ID: 0024
Processing Test ID: 0026
Processing Test ID: 0027
Processing Test ID: 0028
Processing Test ID: 0029
Processing Test ID: 0030
Processing Test ID: 0031
Processing Test ID: 0032
Processing Test ID: 0033
Processing Test ID: 0034
Processing Test ID: 0035
Processing Test ID: 0036
Processing Test ID: 0037
Processing Test ID: 0038
Processing Test ID: 0039
Processing Test ID: 0040
Processing Test ID: 0041
Processing Test ID: 0042


##### The overall test accuracy for the sequences of images was found to be 80%.

### Conclusion

##### In this project, we've successfully trained a conventional CNN model achieving a 99% accuracy in distinguishing between flipped and non-flipped images, and an 80% accuracy in categorizing sequences of images. The insights gained from image and video analyses conducted in this project hold significant relevance across a wide range of image classification tasks, offering valuable contributions to various domains requiring such classification capabilities.