# The Asirra data set
Web services are often protected with a challenge that's supposed to be easy for people to solve, but difficult for computers. Such a challenge is often called a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) or HIP (Human Interactive Proof). HIPs are used for many purposes, such as to reduce email and blog spam and prevent brute-force attacks on web site passwords.

Asirra (Animal Species Image Recognition for Restricting Access) is a HIP that works by asking users to identify photographs of cats and dogs. This task is difficult for computers, but studies have shown that people can accomplish it quickly and accurately. Many even think it's fun! Here is an example of the Asirra interface:

Asirra is unique because of its partnership with Petfinder.com, the world's largest site devoted to finding homes for homeless pets. They've provided Microsoft Research with over three million images of cats and dogs, manually classified by people at thousands of animal shelters across the United States. Kaggle is fortunate to offer a subset of this data for fun and research.

# Image recognition attacks
While random guessing is the easiest form of attack, various forms of image recognition can allow an attacker to make guesses that are better than random. There is enormous diversity in the photo database (a wide variety of backgrounds, angles, poses, lighting, etc.), making accurate automatic classification difficult. In an informal poll conducted many years ago, computer vision experts posited that a classifier with better than 60% accuracy would be difficult without a major advance in the state of the art. For reference, a 60% classifier improves the guessing probability of a 12-image HIP from 1/4096 to 1/459.

# State of the art
The current literature suggests machine classifiers can score above 80% accuracy on this task [1]. Therfore, Asirra is no longer considered safe from attack.  We have created this contest to benchmark the latest computer vision and deep learning approaches to this problem. Can you crack the CAPTCHA? Can you improve the state of the art? Can you create lasting peace between cats and dogs?

Okay, we'll settle for the former.

# Acknowledgements
We extend our thanks to Microsoft Research for providing the data for this competition.

# Dataset Description
The training archive contains 25,000 images of dogs and cats. Train your algorithm on these files and predict the labels for test1.zip (1 = dog, 0 = cat).

In [48]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import zipfile
import os
from keras.preprocessing.image import  ImageDataGenerator
from keras.callbacks import EarlyStopping, TensorBoard, ModelCheckpoint
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn import set_config
import requests
import datetime

set_config('diagram')

import warnings
warnings.filterwarnings('ignore')

In [32]:
gpu = tf.config.list_physical_devices('GPU')

if gpu:
    print('GPU Available {}'.format(gpu))
else:
    print('No GPU')

GPU Available [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


In [33]:
url = "https://storage.googleapis.com/tensorflow-1-public/course2/cats_and_dogs_filtered.zip"
response = requests.get(url, stream=True)
filename = os.path.join(os.getcwd(), "cats_and_dogs_filtered.zip")

with open(filename, 'wb') as fd:
    for chunk in response.iter_content(chunk_size=1024):
        if chunk:
            fd.write(chunk)

print("Download complete.")

Download complete.


In [34]:
# Unzipping the file
with zipfile.ZipFile('cats_and_dogs_filtered.zip') as file:
    file.extractall()
    print('Extraction Success!!')

In [36]:
current_working_direc = os.getcwd()

train_data_path = 'cats_and_dogs_filtered/train/'
test_data_path = 'cats_and_dogs_filtered/validation/'

def count_items_direc(path):
    return len(os.listdir(path))

print(f'The number of items in the training directory {train_data_path}, is {count_items_direc(train_data_path)}')
print(f'The number of items in the test directory {test_data_path}, is {count_items_direc(test_data_path)}')

The number of items in the training directory cats_and_dogs_filtered/train/, is 2
The number of items in the test directory cats_and_dogs_filtered/validation/, is 2


In [62]:
optimizer = 'RMSprop'
class_mode = 'binary'
epochs = 100
loss = 'binary_crossentropy'
image_size = 150

In [63]:
# Initialize ImageDataGenerator instances for training and testing
train_data_gen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.25,
    rotation_range=45,
    zoom_range=0.15,
    horizontal_flip=True,
    vertical_flip=True,
    fill_mode='nearest'
)

test_data_gen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.25,
    rotation_range=45,
    zoom_range=0.15,
    horizontal_flip=True,
    vertical_flip=True,
    fill_mode='nearest'
)

# Generator for training data
train_generator = train_data_gen.flow_from_directory(
    train_data_path,
    target_size=(image_size, image_size),
    class_mode=class_mode
)

# Generator for test data
test_generator = test_data_gen.flow_from_directory(
    test_data_path,
    target_size=(image_size, image_size),
    class_mode=class_mode
)


Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.


In [64]:
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Conv2D(input_shape=(image_size, image_size, 3), filters=16, activation='relu', padding='valid',
                               kernel_size=(3, 3)),
        tf.keras.layers.MaxPooling2D(2, 2),
        tf.keras.layers.Conv2D(filters=32, kernel_size=(32, 32), padding='valid', activation='relu'),
        tf.keras.layers.MaxPooling2D(2, 2),
        tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), padding='valid', activation='relu'),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(units=1, activation='sigmoid')
    ]
)

model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_12 (Conv2D)          (None, 148, 148, 16)      448       
                                                                 
 max_pooling2d_8 (MaxPooling  (None, 74, 74, 16)       0         
 2D)                                                             
                                                                 
 conv2d_13 (Conv2D)          (None, 43, 43, 32)        524320    
                                                                 
 max_pooling2d_9 (MaxPooling  (None, 21, 21, 32)       0         
 2D)                                                             
                                                                 
 conv2d_14 (Conv2D)          (None, 19, 19, 64)        18496     
                                                                 
 flatten_4 (Flatten)         (None, 23104)            

In [65]:
early_stop = EarlyStopping(restore_best_weights=True, patience=10, monitor='val_loss', verbose=1)

log_dir = os.path.join(os.getcwd(), 'logs/fit/', datetime.datetime.now().strftime('%Y%m%d-%H%M%S'))
tensorboard_callback = TensorBoard(log_dir=log_dir,histogram_freq=1)

model_checkpoint_path = 'model_checkpoints/'
if not os.path.exists(model_checkpoint_path):
    os.mkdir(model_checkpoint_path)

checkpoint_callback = ModelCheckpoint(filepath=model_checkpoint_path+'model-{epoch:02d}.h5')

In [66]:
log_dir

'C:\\Users\\sayan\\PycharmProjects\\Machine Learning\\logs/fit/20230630-192306'

In [69]:
history = model.fit(
    train_generator,
    validation_data=test_generator,
    epochs=epochs,
    callbacks=[tensorboard_callback,checkpoint_callback,early_stop]
)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 22: early stopping
