# Pill Type Classification using TensorFlow

This notebook builds a robust TensorFlow classification model for the OGYEIv2 pill dataset from Kaggle. It includes a strong data pipeline using `tf.data.Dataset` for augmentation, evaluation with F1 score, backtesting, and conversion to TensorFlow Lite for deployment.

Supreme Leader, ensure you have downloaded the dataset from Kaggle and placed it in the `./data/ogyeiv2` directory.

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import os

# For evaluation metrics
from sklearn.metrics import f1_score, classification_report

# Set random seed for reproducibility
tf.random.set_seed(42)
print('Libraries imported.')

## Data Preparation

Place the downloaded OGYEIv2 dataset in the `./data/ogyeiv2` folder. The folder structure should be organized by class.

In [None]:
# Define dataset path and parameters
data_dir = os.path.join(os.getcwd(), 'data', '')
batch_size = 32
img_height = 224
img_width = 224

# Load training and validation datasets using an 80-20 split
train_ds = tf.keras.utils.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset='training',
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size
)

val_ds = tf.keras.utils.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset='validation',
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size
)

# Optimize dataset performance
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

print('Datasets prepared.')

## Data Augmentation

Data augmentation improves model generalization by applying random transformations to the training images.

In [None]:
data_augmentation = tf.keras.Sequential([
  tf.keras.layers.RandomFlip('horizontal'),
  tf.keras.layers.RandomRotation(0.1),
  tf.keras.layers.RandomZoom(0.1)
])

# Visualize some augmented images (optional)
plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
    augmented_images = data_augmentation(images)
    for i in range(9):
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(augmented_images[i].numpy().astype('uint8'))
        plt.axis('off')
    break
plt.show()

print('Data augmentation pipeline created.')

## Model Architecture

We define a CNN model using the Keras API. Data augmentation is integrated into the model so that each input image is augmented on the fly.

In [None]:
num_classes = len(train_ds.class_names)

model = tf.keras.Sequential([
    tf.keras.layers.InputLayer(input_shape=(img_height, img_width, 3)),
    data_augmentation,
    tf.keras.layers.Rescaling(1./255),

    tf.keras.layers.Conv2D(32, 3, activation='relu'),
    tf.keras.layers.MaxPooling2D(),

    tf.keras.layers.Conv2D(64, 3, activation='relu'),
    tf.keras.layers.MaxPooling2D(),

    tf.keras.layers.Conv2D(128, 3, activation='relu'),
    tf.keras.layers.MaxPooling2D(),

    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(num_classes, activation='softmax')
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

model.summary()
print('Model defined and compiled.')

## Training the Model

We use early stopping and learning rate reduction callbacks to ensure robust training.

In [None]:
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=3, min_lr=1e-6)

epochs = 30
history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=epochs,
    callbacks=[early_stop, reduce_lr]
)

print('Training complete.')

## Evaluation and F1 Score

We now evaluate the model on the validation set and compute the weighted F1 score along with a detailed classification report.

In [None]:
# Collect true labels and predictions
y_true = []
y_pred = []

for images, labels in val_ds:
    preds = model.predict(images)
    y_true.extend(labels.numpy())
    y_pred.extend(np.argmax(preds, axis=1))

# Compute weighted F1 score
f1 = f1_score(y_true, y_pred, average='weighted')
print('Weighted F1 Score:', f1)

# Detailed classification report
print('\nClassification Report:')
print(classification_report(y_true, y_pred, target_names=train_ds.class_names))

## Convert Model to TensorFlow Lite

Convert the trained model to a TFLite model for deployment on edge devices.

In [None]:
# Convert the Keras model to TFLite format
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

# Save the TFLite model
with open('model.tflite', 'wb') as f:
    f.write(tflite_model)

print('TFLite model saved as model.tflite')

## Backtesting

For backtesting, you can simulate predictions on unseen data or perform cross-validation. The evaluation above on the validation set serves as a backtest. Adjust prediction thresholds or test on a separate hold-out set as needed.