🧠 Convolutional Autoencoder in Keras with Data Generator

This repository contains a deep-learning iteration of my anomaly detection project for elderly home surveillance, developed as part of my thesis internship for the Master in Data Analysis for Business Intelligence and Data Science.

While the first iteration used a scikit-learn MLPRegressor as a simple baseline autoencoder, this repository introduces a more realistic setup:

A convolutional autoencoder implemented in TensorFlow/Keras
A data generator that streams frames from disk (Colab and RAM friendly)
Checkpoints, early stopping, and learning rate scheduling
Support for resuming training from saved weights
The same per-person folder structure for frames, compatible with real datasets such as the Université de Bourgogne Europe - Fall Detection Dataset

This iteration focuses on engineering and scalability rather than final production performance.

🔧 Design Choice: Modular Preprocessing

A deliberate design decision in this repository is to train on pre-extracted frames rather than hard-wiring heavy preprocessing (silhouette extraction, background subtraction, YOLO person crops, etc.) into the training code.

The training pipeline expects clean, ready-to-use frames in a standard format.
More complex preprocessing (e.g. silhouettes or bounding-box crops) is performed in separate scripts or pipelines, so it can evolve independently.
This keeps the model code clean, modular, and easier to reuse across different datasets.

In other words: this repository trains an autoencoder on images; how those images are generated (from videos, silhouettes, or person crops) is up to an upstream preprocessing step.

📁 Repository Structure

anomaly-autoencoder-keras-generator/
│
├─ src/
│  ├─ __init__.py                 # Marks src as a package
│  ├─ model.py                    # Conv autoencoder definition (build_autoencoder)
│  ├─ data_generator.py           # FrameGenerator to stream frames from disk
│  ├─ train.py                    # Training script with callbacks and resume support
│  └─ evaluate.py                 # Evaluate reconstruction errors on data/test
│
├─ data/
│  ├─ train/                      # Synthetic example frames (per person)
│  ├─ val/                        # Synthetic validation frames (per person)
│  ├─ test/                       # Synthetic test frames (per person)
│  └─ README.md                   # Expected data structure and usage notes
│
├─ tests/
│  ├─ test_model_smoke.py         # Build model and run a forward pass
│  └─ test_generator_smoke.py     # Build FrameGenerator on a tiny temp dataset
│
├─ notebooks/
│  └─ demo_colab.ipynb            # Colab demo
│
├─ models/                        # (Created at runtime) saved Keras models
├─ checkpoints/                   # (Created at runtime) best-model weights
│
├─ requirements.txt               # TensorFlow, OpenCV, NumPy, Matplotlib, tqdm, pytest
├─ .gitignore                     # Ignore caches, environments, models, checkpoints
└─ README.md                      # This file

⚙️ Installation

Clone the repository:

git clone https://github.com/giacomobettas/anomaly-autoencoder-keras-generator.git
cd anomaly-autoencoder-keras-generator

Install dependencies:

pip install -r requirements.txt

▶️ Training

Make sure data/train and data/val follow the structure described in data/README.md. A tiny synthetic dataset is provided as an example; you can replace it with your own frames.

Example training run:

python -m src.train \
  --train_dir data/train \
  --val_dir data/val \
  --image_size 64 64 \
  --color_mode grayscale \
  --batch_size 8 \
  --epochs 10 \
  --checkpoint_path checkpoints/best_autoencoder.weights.h5 \
  --model_path models/autoencoder_full.keras

To resume training from a previous best checkpoint:

python -m src.train \
  --train_dir data/train \
  --val_dir data/val \
  --image_size 64 64 \
  --color_mode grayscale \
  --batch_size 8 \
  --epochs 20 \
  --checkpoint_path checkpoints/best_autoencoder.weights.h5 \
  --model_path models/autoencoder_full.keras \
  --resume_from checkpoints/best_autoencoder.weights.h5

📊 Evaluation

To compute reconstruction errors on data/test:

python -m src.evaluate \
  --test_dir data/test \
  --model_path models/autoencoder_full.keras \
  --image_size 64 64 \
  --color_mode grayscale \
  --batch_size 8 \
  --show_hist \
  --output_csv results/test_errors.csv

This will:

Print global mean and standard deviation of reconstruction MSE
Print per-person mean reconstruction error
Optionally show a histogram of reconstruction errors
Optionally save per-frame errors to a CSV file

💻 Google Colab Usage

A Colab demo notebook is provided in notebooks/demo_colab.ipynb. Typical workflow in Colab:

Clone the repository.
Install the requirements.
Mount Google Drive (optional) if your dataset is stored there.
Set --train_dir and --val_dir to your Drive paths or to data/.
Run the training and evaluation commands from within the notebook.

See the notebook cells for a step-by-step example.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Convolutional Autoencoder in Keras with Data Generator

🔧 Design Choice: Modular Preprocessing

📁 Repository Structure

⚙️ Installation

▶️ Training

📊 Evaluation

💻 Google Colab Usage

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

giacomobettas/anomaly-autoencoder-keras-generator

Folders and files

Latest commit

History

Repository files navigation

🧠 Convolutional Autoencoder in Keras with Data Generator

🔧 Design Choice: Modular Preprocessing

📁 Repository Structure

⚙️ Installation

▶️ Training

📊 Evaluation

💻 Google Colab Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages