# Captcha Recognition Model
By Luiz Bossetto -- CS387 Final Project

## Introduction
This project is an approach to solving text-based Captcha images using deep learning concepts related to image classification. This notebook contains all the steps taken to tackle this challenge head-on, from data preprocessing to model development, training, and evaluation. This notebook aims to demonstrate the effectiveness of deep learning in overcoming CAPTCHA challenges.

###  Motivation
While working on a paper for a History classe that I am taking this semester, I had a quite frustrating experience accessing an academic paper hosted in a specific website. Every time I refreshed the browser tab, some security system would block my access and make me solve a text-based captcha puzzle. What made this a frustrating experience for me was that I had to go through the same process of solving the puzzle every time, even though I had solved it a minute ago. So I thought to myself: "Why not build something that can help me with this?". This is how I came up with my final project idea.

### What topics should you know before diving into the project?
* Knowing the following topics will enhance your experience understanding and going through the project:
    * Basic knowledge of PyTorch: how to define a custom dataset, how to define a Convolutional Neural Network (CNN).
    * Familiarity with deep-learning-related topics such as data normalization, character segmentation, dataset splitting, and the steps it takes to train a model.
    * Understanding of binary thresholding, bitwise operations in images, and morphological operations (used in noise reduction) 
    

## Methodology

### Step by step
I have decided to approach this challenge by breaking the problem down into smaller steps:

1. Setup - All requirements will be installed in the notebook.  
2. Data Processing - Given a dataset of captcha images, the model will pass them to a noise reduction filter. This filter will be responsible for removing the noise present in the image.
3. Data segmentation - After the images are passed to the nosie reduction filter, the model will crop the characters ("areas of interest") and store them in a folder for character classification.
4. Data setup - Save all images in a custom dataset class. This can be used using pytorch's tools from <code>torch.utils.data</code>.
5. Training - The model wil be trained on the cropped characters from the denoised images provided by the noise reduction filter. This is a character classification problem that will involve 35 possible classes.
6. Evalutation - The model will be evaluated using both the training and test set created by a random data split to avoid any possible bias. In this step, I aim to find the best hyperparameters for maximized performance.

### 1. Setup

For this project, a few python modules were necessary:

* OpenCV (cv2) -- OpenCV is a library of programming functions mainly for real-time computer vision. This will be used in the data preprocessing and segmentation step.
* Numpy -- NumPy is a library that adds support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions. This will be used in the data preprocessing and character segmentation step.
* Matplotlib -- Matplotlib is a plotting library. It provides an API for embedding plots into applications. This will be used in the model evaluation step.
* Torch -- PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing. This will be used as the main framework throughout the whole project.
* PIL (Python Imaging Library) -- PIL is a library for that adds support for opening, manipulating, and saving many different image file formats.
* os -- The OS module provides a portable way of using operating system dependent functionality. In particular, this will be used in the dataprocessing step to properly store data in the right folders. <br><br>


In [1]:
!pip install opencv-python
!pip install numpy
!pip install matplotlib

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


In [2]:
import torch
from torch.utils.data import DataLoader,Dataset, random_split
from torch import optim
import torchvision.transforms as transforms
from torchvision.io import read_image
import torch.nn as nn
from PIL import Image
import cv2
import numpy as np
import matplotlib.pyplot as plt # plotting data after evaluation
import os # used for path and image storage


For training, using the GPU is better than the CPU due to parallelism: a device's ability to run several calculations simultaneously. By using the GPU, I could train the model using several examples at a time. This makes the process more efficient.

In [3]:
# Set device to point to a GPU if we have one, CPU otherwise.
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda:0


### 2. Data Preprocessing

The first part of the model involves preprocessing the images in the dataset. This is an important step because this task involves noisy images, and the easier it is for the model to deal with it, the better.

This step contains important methods for data preprocessing. They include: 

### <code>remove_noise(image_path, save_path)</code> 
This is the function that will be responsible for removing the noise in the captcha images using functionalities from the cv2 module. <br>

#### How it works
1. The filter first converts the image to grayscale to make it easier to deal with the colors.
2. Turns the new image into a numpy array to apply binary thresholding (binarization).
3. Apllies binary thresholding that will turn the image into a black-and-white image.
4. Apply morphological operations using a 3x3 kernel to remove the noise from the image.
5. Turns this black-and-white image (with gray pixels) into a pure black and white image through bitwise operations. This will normalize the data by keeping the range of its pixels in between 0 and 1 instead of 0 and 255.
6. After this process, the new image is stored in a folder.

It takes two parameters: <br>
* <code>image_path</code>: where the image dataset is located (folder).
* <code>save_path</code>: where the new dataset will be stored (folder).



In [4]:
def remove_noise(image_path, save_path):
    # Open the image using Pillow
    image = Image.open(image_path)

    # Convert the image to grayscale
    gray_image = image.convert('L')

    # Convert PIL image to numpy array
    np_image = np.array(gray_image)

    # Apply binary thresholding
    _, binary_image = cv2.threshold(np_image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

    # Apply morphological operations
    kernel = np.ones((3, 3), np.uint8)
    opening = cv2.morphologyEx(binary_image, cv2.MORPH_OPEN, kernel, iterations=1)

    # Invert the binary image
    inverted_image = cv2.bitwise_not(opening)

    # Save the inverted binary image
    inverted_image_pil = Image.fromarray(inverted_image)

    # Invert the colors again to have black background and white letters
    inverted_image_pil = inverted_image_pil.convert('L')
    inverted_image = np.array(inverted_image_pil)
    inverted_image = cv2.bitwise_not(inverted_image)

    # Save the final image in output directory
    final_image = Image.fromarray(inverted_image)
    final_image.save(save_path)


### 3. Data Segmentation
This step involves image segmentation. The images generated by the <code>remove_noise()</code> method will be transferred to a method that will be responsible for identifying the areas of interest, or characters.
<br>

### <code>resize_image(image, new_width, new_height)</code> 
This is the function that will be responsible for resizing the cropped characters from the denoised images generated by the <code>remove_noise()</code> function. This step is considered part of the data normalization process by resizing all images that will be transferred to the network to a standard size. <br>

#### How it works
1. It takes images of different shapes and resizes them to a new <code>new_width</code> x <code>new_height</code> image.
2. Applies binary thresholding to ensure black and white pixels only. While working on this method, it could be seen that the resized image would not be pure black and white, so binary thresholding was added to prevent this.
3. Returns a resized image. 

It takes two parameters: <br>
* <code>image_path</code>: where the denoised images are located (folder).
* <code>save_path</code>: where the cropped characters will be stored (folder).



In [5]:
# Resize the image
def resize_image(image, new_width, new_height):
    # Resize the image
    resized_image = cv2.resize(image, (new_width, new_height))

    # Apply binary thresholding to ensure black and white pixels only
    _, binary_image = cv2.threshold(resized_image, 127, 255, cv2.THRESH_BINARY)

    # Return the final inverted image
    return binary_image

<br>

### <code>save_contours_as_images(image_path, output_directory, image_id)</code> 
This is the function that will be responsible for the selection of areas of interest in a given denoised image generated by the <code>remove_noise()</code> function.

#### How it works
1. It takes a denoised image and crops contours that are identified by the <code>cv2.findcontours()</code> function.
2. Since all portions of white pixels in the image are considered contours in this <code>cv2</code> method, the function should classify which ones are characters and which ones are just noise. This is done by checking the dimensions of the contours. Since all noise follows a similar dimensional pattern, (5x5 pixels), an <code>if</code> statement checking the width and height of each contour is enough to filter the characters from the contours.
3. Renames the image to its label, for later label organization.

It takes three parameters: <br>
* <code>image_path</code>: where the images are located (folder).
* <code>output_directory</code>: where the new dataset will be stored (folder).
* <code>image_id</code>: an ID to the new generated character image. This is important to prevent file name duplicates. In case images have the the same name, the newly generated one will replace the one that already exists in the folder.



In [6]:
def save_contours_as_images(image_path, output_directory, image_id):
    # Load the image in grayscale
    image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)

    # Threshold the image to obtain binary image
    _, binary_image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY)

    # Find contours in the binary image
    contours, _ = cv2.findContours(binary_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    # Sort contours based on x-coordinate.
    # The reason for this is so the images can be correctly labeled to prevent mislabeled data in the future.
    contours = sorted(contours, key=lambda contour: cv2.boundingRect(contour)[0])

    # Create output directory if it doesn't exist
    os.makedirs(output_directory, exist_ok=True)

    counter = 0 # keep track of how many characters have been saved
    
    # extract label from image file's name
    label = image_path.split('/')[0].split('.')[0].split("\\")[1]
    image_name = label.split("--")[0]
    char_labels = [char_label for char_label in label.split("_")[0]]
    
    for i, contour in enumerate(contours):
        # Get bounding box for each contour
        x, y, w, h = cv2.boundingRect(contour)

        # if 4 characters have been extracted, break.
        if counter > 3:
            break

        # Check if contour is too small (possibly noise)
        if w > 5 and h > 5:
            # Add some padding around the character bounding box
            padding = 10
            x_padding = max(0, x - padding)
            y_padding = max(0, y - padding)
            w_padding = min(image.shape[1], w + 2 * padding)
            h_padding = min(image.shape[0], h + 2 * padding)

            # Create a black canvas with padded dimensions
            padded_image = np.zeros((h_padding, w_padding), dtype=np.uint8)

            # Calculate coordinates to place the character in the center
            x_offset = (w_padding - w) // 2
            y_offset = (h_padding - h) // 2

            # Copy the character region from the original image to padded image
            padded_image[y_offset:y_offset+h, x_offset:x_offset+w] = image[y:y+h, x:x+w]

            # Resize the padded image
            resized_image = resize_image(padded_image, 100, 100)

            # Save the resized image as a separate image
            character_filename = os.path.join(output_directory, f'{image_name}_{char_labels[counter]}--{image_id}.png')
            cv2.imwrite(character_filename, resized_image)
            counter += 1 # adding one to the counter variable means that the method has found one of the four characters.
            image_id += 1

### Passing images to the filter (new image generation)

This code snippet will run the dataset through the noise reduction filter, and store the outputs in a folder.

In [12]:
# Generate denoised images. Do not run this code snippet if folder and data already exists.
folder = 'captcha_images'
output_folder = 'denoised_images'

# Get list of all files in the folder
file_list = os.listdir(folder)

# check if output directory exists
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

# Iterate through the first 10 images in the folder
for i, filename in enumerate(file_list):    
    # Check if the file is an image (you can add more image extensions if needed)
    if filename.lower().endswith(('.png')):
        # Construct the full path to the image
        image_path = os.path.join(folder, filename)
        
        # Save path for the denoised image
        # original image -> denoised image (now named after its label)
        label = filename.split('-')[0]
        save_filename = f'{label}_{i}--denoised.png'
        save_path = os.path.join(output_folder, save_filename)
        
        # Call the remove_noise function
        remove_noise(image_path, save_path)


### Cropping Characters (New Image Generation)
This code snippet will run the dataset through the character selection filter, and store the outputs in a folder.

In [13]:
# Folder path containing the images
folder_path = 'denoised_images'

# Output directory for saved contour images
output_directory = 'cropped_grayscale'

# Get list of all files in the folder
file_list = os.listdir(folder_path)

# check if output directory exists
if not os.path.exists(output_directory):
    os.makedirs(output_directory)

image_id = 0 # variable to keep track of each image's ID

# Iterate through each image with no noise
for i, filename in enumerate(file_list):
    # Check if the file is an image
    if filename.lower().endswith(('.png')):
        # Construct the full path to the denoised image
        image_path = os.path.join(folder_path, filename)
        # Extract areas of interest from denoised image
        save_contours_as_images(image_path, output_directory, image_id)

        image_id += 4

### 4. Data Setup
After all data has gone through the preprocessing and segmentation steps, the new data will then be transferred to a custom dataset. This dataset will be the one responsible for storing all the information needed for future training, such as <code>images</code> and <code>labels</code>.
<br><br>
The dataset will take two parameters:
* <code>root_dir</code>: this is the folder where all the new data is located.
* <code>transform</code>: this parameter stores all transformations that will be applied to the information in the dataset.

For this task specifically, the data will be stored in a tuple that will use the following format: <code>(tensor([[image]]), label)</code>

For reference, the code template used in this step can be found here: https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
This link contains a template, the step by step process on how to build a custom dataset, and how to properly store the information in it.

In this project, the data was generated by gathering samples of 35 different classes. They include digits that range from 1 to 9, and upper case characters that range from A to Z. 


In [7]:
classes = ['1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] # 35 classes

In [8]:
class CroppedCharacterDataset(Dataset):
    # Constructor
    def __init__(self, root_dir, transform=None):
        self.root_dir = root_dir # where the data is located (folder)
        self.transform = transform # a transforms.Compose() that contains all data transformations
        # image paths are stored in an array for easy iteration over the data
        self.image_paths = [os.path.join(root_dir, img) for img in os.listdir(root_dir) if img.endswith('.png')]

    # returns dataset size
    def __len__(self):
        return len(self.image_paths)

    # returns a tensor that contains the image and the label of that image, given an index.
    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        image = Image.open(img_path).convert('L')  # Convert image to grayscale
        
        # Extract label and image id from file name
        filename = os.path.splitext(os.path.basename(img_path))[0]

        # format used in the image file name creation: {ParentImageName}_{image_num}_{char_label}--{id}.png
        parts = filename.split('_')
        label, image_id = parts[-1].split('--')  # Split last part
        label = label.strip()  # Remove any leading/trailing whitespace

        # apply transforms
        if self.transform:
            image = self.transform(image)

        return image, label

### Creating instances and dataset visualization
The following code snippet defines the transformations that will be applied to the dataset, creates an instance of a dataset, and shows how the data is accessed through the magic methods defined in the class.

In [9]:
# root directory where images are stored
root_dir = "./cropped_grayscale"

# Define transformations. Here, since the previous filters already normalized all the data, 
# the only transformation that will be applied to the dataset will be turning it into tensors.
transform = transforms.Compose([
    transforms.ToTensor()
])

# Create dataset instance
cropped_chars_dataset = CroppedCharacterDataset(root_dir, transform=transform)

# Demonstration of how the data in the dataset is accessed and stored.
print(f"Number of images in the dataset: {len(cropped_chars_dataset)}\n") 
print("Example of a tensor stored in the dataset:")
print(cropped_chars_dataset[0])

Number of images in the dataset: 40415

Example of a tensor stored in the dataset:
(tensor([[[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]]]), 'S')



### Train and Test Sets
This section covers how the train and test sets are defined. 

In this project, I used an 80%/20% split between both sets. Since the <code>cropped_chars_dataset</code> contains 40415 different images, this split is enough to train and evaluate the model later. 

The random splitting was done by storing the dataset size in separate variables defined <code>train_size</code> and <code>test_size</code>. With these variables, it was possible to call the <code>random_split</code> function that will do the job of randomly splitting the data into two different sets of the given sizes. 

After that, <code>Dataloaders</code> were created for each set. These will be used later in the training step.

In [10]:
# Define the sizes of each split
train_size = int(0.8 * len(cropped_chars_dataset)) # 80%
test_size = len(cropped_chars_dataset) - train_size # 100% - 80% = 20%

# Split dataset into train, and test sets (randomly)
train_data, test_data = random_split(cropped_chars_dataset, [train_size, test_size])

# Create data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=32, shuffle=True) 
test_loader = torch.utils.data.DataLoader(test_data, batch_size=32, shuffle=True) 

# Visualize training and dataset sizes
print(f"Train: {len(train_loader.dataset)} examples into {len(train_loader)} batches")
print(f"Test: {len(test_loader.dataset)} examples into {len(test_loader)} batches")

Train: 32332 examples into 1011 batches
Test: 8083 examples into 253 batches


## The Model

Since this part of the project deals with image recognition and classification, a Convolutional Neural Network (CNN) is the perfect fit for this task. 

The model contains 3 different <code>conv2d</code> (convolutional) layers with a <code>padding</code> of 1, followed by 3x3 <code>kernels</code>. In between each layer, a 2x2 <code>MaxPool2d</code> was included as a way to downsample the spatial dimensions of the input feature and reduces their size while preserving important information. At the end, two fully-connected layers were added. 

In [11]:
class CNNModel(nn.Module):
    def __init__(self, num_classes=35):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(128 * 12 * 12, 512)
        self.fc2 = nn.Linear(512, num_classes)

    def forward(self, x):
        x = self.pool(nn.functional.relu(self.conv1(x)))
        x = self.pool(nn.functional.relu(self.conv2(x)))
        x = self.pool(nn.functional.relu(self.conv3(x)))
        x = x.view(-1, 128 * 12 * 12)
        x = nn.functional.relu(self.fc1(x))
        x = self.fc2(x)
        return x

## Training

The following function executes one whole training run for a given model.

The <code>train_model()</code> function takes the following parameters:

* <code>model</code>: this is the model that will be used in training.
* <code>train_loader</code>: the data loader that stores the data that will be passed to the model.
* <code>num_epochs</code>: the number of epochs used in training.
* <code>learning rate</code>: the learning rate used in training.
* <code>loss_arr</code>: a python list that will store the loss values for each epoch that will be used later to plot the model's overall performance.
* <code>acc_arr</code>: a python list that will store the accuracy values for each epoch that will be used later to plot the model's overall performance.


In [12]:
def train_model(model, train_loader, num_epochs, learning_rate, loss_arr, acc_arr):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    
    criterion = nn.CrossEntropyLoss() # Here, I used CrossEntropyLoss because it suits well when working with multiple-class classification.
    optimizer = optim.Adam(model.parameters(), lr=learning_rate) # For optimizer, among all optimizers that I tested in the model, Adam worked best.
    
    # Training run
    for epoch in range(num_epochs):
        # Variables that will keep track of model's performance 
        running_loss = 0.0
        correct = 0
        total = 0
        
        for images, labels in train_loader:
            images = images.to(device)
            
            # Extract and convert labels to integers
            label_strings = [label_tuple[0] for label_tuple in labels]
            label_indices = [classes.index(label) for label in label_strings]
            
            # creating tensor of labels
            labels = torch.tensor(label_indices, dtype=torch.long).to(device)
            
            # zero the gradients
            optimizer.zero_grad()

            # forward propagation
            outputs = model(images)

            # computer loss
            loss = criterion(outputs, labels)
            
            # backward prop
            loss.backward()

            # update
            optimizer.step()
            
            # calculating stats
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            
            running_loss += loss.item() * images.size(0)
        
        train_loss = running_loss / len(train_loader.dataset)
        train_accuracy = correct / total

        print(f"Epoch [{epoch+1}/{num_epochs}], Train Loss: {train_loss:.4f}, Train Accuracy: {100 * train_accuracy:.2f}%")
        
        loss_arr.append(train_loss)
        acc_arr.append(train_accuracy)
        

## Testing

The following function tests the model on the test set using a model.

The <code>test_model()</code> function takes the following parameters:

* <code>model</code>: this is the model that will be used in training.
* <code>test_loader</code>: the data loader that stores the data that will be passed to the model.


In [13]:
def test_model(model, test_loader):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    criterion = nn.CrossEntropyLoss() # Here, I used CrossEntropyLoss because it suits well when working with multiple class classification.

    # These are variables that will be used to calculate the training statistics
    total_correct = 0
    total = 0
    running_loss = 0.0

    with torch.no_grad():
        # Iterating through data loader
        for images, labels in test_loader:
            images = images.to(device) # Extract and convert labels to integers
            
            # Converting 
            label_strings = [label_tuple[0] for label_tuple in labels]
            label_indices = [classes.index(label) for label in label_strings]
            labels = torch.tensor(label_indices, dtype=torch.long).to(device)

            # Forward pass
            outputs = model(images)

            # Compute loss
            loss = criterion(outputs, labels)
            running_loss += loss.item() * images.size(0)

            # Compute accuracy
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            total_correct += (predicted == labels).sum().item()

    # Calculate loss and accuracy
    test_loss = running_loss / len(test_loader.dataset)
    test_accuracy = total_correct / total

    print(f"Test Loss: {test_loss:.4f}, Test Accuracy: {100 * test_accuracy:.2f}%")

### Instances and hyperparameters
All instances and hyperparameters for training are defined here. Values used were acquired through hyperparameter tuning.

In [None]:
# Create model instance
model = CNNModel()

# Define number of epochs and learning rate 
epochs = 10
learning_rate = 0.0001

# python lists to store information that will be used to plot the results.
losses_train = []
accuracies_train = []

# Check accuracies on both test and training datasets
print("Training Model...")
train_model(model, train_loader, epochs, learning_rate, losses_train, accuracies_train)

Training Model...
Epoch [1/10], Train Loss: 0.4515, Train Accuracy: 87.03%
Epoch [2/10], Train Loss: 0.0501, Train Accuracy: 98.58%
Epoch [3/10], Train Loss: 0.0303, Train Accuracy: 99.12%
Epoch [4/10], Train Loss: 0.0213, Train Accuracy: 99.39%
Epoch [5/10], Train Loss: 0.0132, Train Accuracy: 99.60%


### Testing Model

Once the model parameters have been adjusted, I tested the model's accuracy on the test set. 

In [None]:
print("Testing Model...")
test_model(model, test_loader)

### Plotting the results

This section will show the results from the training runs for a given model.

For this, I used the <code>matplotlib.pyplot</code> module, commonly used in data visualization.

In [None]:
# Plot loss
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.plot(losses_train, color="green", label='Training Loss', linewidth=2.0) 
plt.title('Overall Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

# Annotate last value of loss on the plot
plt.annotate(f'{losses_train[-1]:.3f}', xy=(len(losses_train)-1, losses_train[-1]), xytext=(-20, 10), textcoords='offset points', color='green')

# Plot accuracy
plt.subplot(1, 2, 2)
plt.plot(accuracies_train, color="green", label='Training Accuracy', linewidth=2.0) 
plt.title('Overall Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

# Annotate last value of accuracy on the plot
plt.annotate(f'{accuracies_train[-1]:.2%}', xy=(len(accuracies_train)-1, accuracies_train[-1]), xytext=(-30, -15), textcoords='offset points', color='green')

plt.show()


### Result Analysis

After some training runs, I was able to reach an overall accuracy of 99.85%, and an overall loss value of 0.005, which were values above the expected. As a way to measure how well the model performs, I calculated the human-level performance in this task. In the given dataset, I was able to accurately solve 9800 images (or 98% of the dataset). It was quite intriguing to see that the model performed better than I did.  

While working on the project, I had different expectations on how well the model would work. Before coming up with a concrete project proposal, I thought that models would have a hard time solving the text-based captcha puzzles because these puzzles were designed to prevent computers from accessing confidential data. At first, I expected the model to reach quite poor accuracy (<50%). Doing some research on how to approach this problem and coming up with a project proposal, I could see that by breaking the problem into smaller steps and turning it into a simple classification task, I could reach more satisfying results. During this time, I was expecting the model to reach an overall accuracy a little below 90% because I still thought that the distortion and random patterns in the images were going to be an obstacle. By the time I trained the model, I was surprised to see that it performed better than a human, reaching results above the expected.

## Conclusion 
The development and implementation of the captcha recognition model demonstrated the efficacy of deep learning in solving complex image recognition tasks. Through the use of a convolutional neural networks (CNNs) and data preprocessing, I was able to achieve a high level of accuracy in solving these puzzles. The model's ability to generalize to new, unseen, distorted captcha characters shows its robustness and potential for a real-world application. 


### Future work
While I was able to build a fully functional model, I strongly believe there is still plenty of research to do. 

Thanks to advancements in Computer Science, end-to-end models have been quite common in solving tasks of many different types. There are approaches to character recognition that involve sequential models that could definitely be used in a task like captcha recognition. Models that implement CNNs and RNNs with attention have reached great results (>90% accuracy) in reading text from images. My plans for future work is to develop an end-to-end model that uses a sequential neural network, aiming to reach an overall accuracy above 90%, possibly reaching results like the ones in this project (>95% accuracy). 

While I could not build a fully functional end-to-end model now, the research I did has shown me that it is possible to attack this problem and reach accuracies above 90%. Some more research must be done, and I will spend the next weeks and months working on this.

### What I have been learning for future work
Here is an article that shows a little of what I have been learning: https://codingvision.net/pytorch-crnn-seq2seq-digits-recognition-ctc. This article shows an approach to reading distorted handwritten characters from images using a sequential model (GRU). I believe that a similar approach can be used in captcha recognition tasks. In this case, the model learns how to distinguish noise and characters, so no noise reduction filter and character segmentation functions are needed to improve performance.

Another really interesting article that implements an end-to-end captcha model is this one: https://www.sciencedirect.com/science/article/pii/S0925231220318518. What makes this approach even more interesting is that the model involves Generative Adversarial Networks (GANs) to synthesize data and make the dataset larger, the use of a Bi-Directional Long-Short Term Memory model instead of Gated Recurrent Units (GRUs), and the ability to solve many types of text-based captcha images. The model in this approach is more complex, but it is able to generalize really well on many different distortions and patterns.

During the summer, I will be diving a little deeper into these articles and build an end-to-end model that is able to solve text-based captcha puzzles. I strongly believe that extending this project to an end-to-end model will deepen my understanding of more advanced techniques and approaches that will be useful later in academic projects and future career. 

### What the project has taught me

Overall, this project has not only deepened my understanding of deep learning principles learned throughout the semester but also highlighted its practical utility in solving challenging image recognition problems. Working on this project has taught me about how computer vision works, its importance, and how much can be done with it. It has also shown me the iterative nature of model development, where constant experimentation and evaluation are necessary to achieve good performance. Moreover, this project has taught me about how to create and handle my own dataset, and the significance of its quality during the preprocessing steps to normalize the data. Also, I was able to put the concept of hyperparameter tuning and error analysis into practice, two things that I learned in class this semester but never had the chance to do it "from scratch" in a big project. This project has deepened my appreciation for the complexity and potential of deep learning in real-world applications. It has definitely changed my mind career wise. It is a challenging field to work on, but personally, it is rewarding when I see a project like this one working. I would definitely pursue a career in AI/machine learning/deep learning thanks to this class and this project. 

When it comes to captchas, working on this project has made me reflect on how captcha puzzles in the future must change in order to prevent computers from accessing web pages that can contain confidential data. With the right tools and knowledge, some text-based captchas can be solved with ease, so the implementation of more challenging captcha formats, such as image-based captchas, and drag-and-drop captchas can provide a more resistant barrier between robots and data, preventing any data from being compromised. 