# Example: Horse vs Human Binary Classification

* For this task, we will build a simple neural network to classify images of horses and humans. We’ll use the TensorFlow tf.data API for data loading and preprocessing.

* Assuming you have the dataset available locally or in a directory structure like this:

/data
    /train
        /horses
        /humans
    /test
        /horses
        /humans


In [None]:
# Import

import tensorflow as tf
import numpy as np
import os
import matplotlib.pyplot as plt

In [None]:
# Load data

BATCH_SIZE = 32
IMG_HEIGHT = 150
IMG_WIDTH = 150

# Define directories
train_dir = '/data/train'
test_dir = '/data/test'

# Create image dataset from directories
train_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    train_dir,
    image_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    label_mode='binary',  # binary classification: 0 for horses, 1 for humans
)

test_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    test_dir,
    image_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    label_mode='binary',
)

* image_dataset_from_directory: This function automatically labels images based on subdirectories (e.g., horses as class 0, humans as class 1).
* image_size: We resize all images to 150x150 pixels for consistency.
* batch_size: The number of images per batch. This affects how efficiently we train the model (typically 32 or 64).
* label_mode='binary': We specify binary labels for classification.

* 3. Data Augmentation (Optional)
* You can augment your training dataset by applying random transformations to the images. This can improve generalization.

In [None]:
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip('horizontal'),
    tf.keras.layers.RandomRotation(0.2),
    tf.keras.layers.RandomZoom(0.2),
])

# Apply the augmentation only to the training dataset
train_dataset = train_dataset.map(lambda x, y: (data_augmentation(x, training=True), y))

* RandomFlip, RandomRotation, RandomZoom: These augmentations help the model generalize better by creating slight variations in the images.

* 4. Prefetching for Performance
* To improve input pipeline performance, we use prefetching. This allows data loading to happen in parallel with model training.
* AUTOTUNE: This optimizes the number of elements loaded at a time to maximize training throughput without overloading the system.

In [None]:
AUTOTUNE = tf.data.AUTOTUNE
train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE)
test_dataset = test_dataset.prefetch(buffer_size=AUTOTUNE)

5. Model Creation

* Conv2D and MaxPooling2D: These layers are part of the convolutional architecture commonly used in image classification tasks. They help the model learn spatial hierarchies in images.
* The Conv2D layers use 32, 64, and 128 filters. The number of filters increases as we go deeper into the network, which helps capture more complex patterns.
* Flatten: Converts the 2D feature maps from the convolution layers into a 1D vector to feed into the dense layers.
* Dense(128, activation='relu'): A fully connected layer with 128 neurons, typically used to capture high-level features.
* Dense(1, activation='sigmoid'): The final layer. Since we have a binary classification problem (horse vs human), we use:
* Sigmoid: Outputs a probability between 0 and 1, which corresponds to a binary classification decision.
* 1 Neuron: One output neuron for binary classification (the probability of one class; the other class is implicitly the complement).

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.InputLayer(input_shape=(IMG_HEIGHT, IMG_WIDTH, 3)),  # Input layer (images have 3 channels for RGB)
    
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),  # Convolutional layer with 32 filters
    tf.keras.layers.MaxPooling2D(),  # MaxPooling to downsample
    
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    
    tf.keras.layers.Conv2D(128, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    
    tf.keras.layers.Flatten(),  # Flatten the output of the convolutional layers
    tf.keras.layers.Dense(128, activation='relu'),  # Dense layer with 128 neurons
    
    tf.keras.layers.Dense(1, activation='sigmoid'),  # Output layer (binary classification)
])

model.summary()

6. Compilation and Training
* Adam optimizer: A commonly used optimizer for image classification tasks, combining the advantages of both RMSProp and SGD.
* BinaryCrossentropy: This loss function is suitable for binary classification tasks. It measures the difference between the predicted probabilities and the actual class labels.

In [None]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.BinaryCrossentropy(),
    metrics=['accuracy']
)

# Train the model
history = model.fit(
    train_dataset,
    epochs=10,
    validation_data=test_dataset,
)

* 7. Evaluting model

In [None]:
test_loss, test_acc = model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc * 100:.2f}%")

### Hyperparameter Selection:
* Number of Neurons and Layers: We chose 32, 64, and 128 filters in the convolutional layers, progressively increasing the number of neurons as we go deeper. This is common in CNNs, as deeper layers typically learn more abstract and complex features.

* You can experiment with these numbers (e.g., using 256 filters or adding more layers), but be mindful of overfitting if you have limited data.

* Binary Crossentropy Loss: Since this is a binary classification task (horse vs. human), we use BinaryCrossentropy. This loss function is appropriate because we’re predicting a probability (i.e., the likelihood of one class), not a multi-class output.

* Sigmoid Activation: This activation is chosen for the output layer because it produces values in the range of 0 to 1, making it ideal for binary classification.

In [None]:
   Input (150x150x3)
        │
        ▼
  +---------------------+    (Conv2D with 32 filters)
  |      150x150x3      |
  +---------------------+
        │
        ▼
  +---------------------+    (MaxPooling 2x2)
  |      75x75x32       |
  +---------------------+
        │
        ▼
  +---------------------+    (Conv2D with 64 filters)
  |      75x75x32       |
  +---------------------+
        │
        ▼
  +---------------------+    (MaxPooling 2x2)
  |      37x37x64       |
  +---------------------+
        │
        ▼
  +---------------------+    (Conv2D with 128 filters)
  |      37x37x64       |
  +---------------------+
        │
        ▼
  +---------------------+    (MaxPooling 2x2)
  |      18x18x128      |
  +---------------------+
        │
        ▼
  +---------------------+    (Flatten to 1D)
  |      18x18x128      |
  +---------------------+
        │
        ▼
  +---------------------+    (Dense Layer with 128 neurons)
  |     Flattened       |
  |    (18*18*128)      |
  +---------------------+
        │
        ▼
  +---------------------+    (Output layer with 1 neuron for binary classification)
  |      Dense (1)      |
  +---------------------+


* Conv2D layers: Change the depth (number of filters), but the spatial dimensions (height and width) generally stay the same unless you use a stride greater than 1.
* MaxPooling layers: Reduce the spatial dimensions (height and width) by a factor of 2 (if using 2x2 pooling).
* Flattening: Converts the final 3D feature map (height, width, depth) into a 1D vector.
* Dense layers: Do not alter spatial dimensions but reduce the features into a fixed-size vector for classification.