[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/digital-marketing-tum/image-analyzer/blob/main/src/notebooks/pipeline_colab.ipynb)

# Introduction

**Image Analyzer** is a comprehensive image analytics pipeline that extracts visual features from images using computer vision and machine learning techniques.  
It provides both a **web interface** and a **Jupyter/Colab interface** for batch image processing.

---

### Using Image Analyzer in Google Colab

This notebook will guide you through:

1. Setting up the Image Analyzer environment.  
2. Configuring where your images and results are stored.  
3. Running the analysis pipeline and generating summary reports.

Simply run each cell **in order** to complete your analysis.

---

### Using GPU Acceleration

To speed up processing (especially for object detection & caption generation):

1. Go to **Runtime ‚Üí Change runtime type** in Colab  
2. Select a GPU option e.g. **T4 GPU**
3. Save and restart the runtime if prompted

GPU acceleration can significantly reduce analysis time.

---

### ‚ö†Ô∏è HTTP Errors

Some features such as object detection require Google Colab to download model weights from external sources.  In some cases, this leads to error messages if an external source blocks the IP address of Google Colab from further downloads.

Simply try again at a later point in time. It typically works when you try again. As a fallback, you have to run the analysis locally.


## üõ† Step 1 ‚Äî Install & Initialize Image Analyzer

Before we can run the analysis, we need to:

1. **Download the Image Analyzer code base** from GitHub.
2. **Install all required dependencies** for Google Colab.
3. **Initialize Image Analyzer** with the configuration file.

üí° **Note:** You only need to run this step **once per session**.

In [3]:
# Run this cell to load the code base
!git clone https://github.com/digital-marketing-tum/image-analyzer.git

fatal: destination path 'image-analyzer' already exists and is not an empty directory.


In [1]:
!pip uninstall -y tensorflow tensorflow-gpu keras keras-nightly keras-preprocessing protobuf

Found existing installation: tensorflow 2.20.0
Uninstalling tensorflow-2.20.0:
  Successfully uninstalled tensorflow-2.20.0
[0mFound existing installation: keras 3.12.0
Uninstalling keras-3.12.0:
  Successfully uninstalled keras-3.12.0
[0mFound existing installation: protobuf 6.33.2
Uninstalling protobuf-6.33.2:
  Successfully uninstalled protobuf-6.33.2


In [None]:
!pip install tensorflow==2.17.1 keras==3.3.3

Collecting tensorflow==2.17.1
  Using cached tensorflow-2.17.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.2 kB)
Collecting keras==3.3.3
  Using cached keras-3.3.3-py3-none-any.whl.metadata (5.7 kB)
Collecting ml-dtypes<0.5.0,>=0.3.1 (from tensorflow==2.17.1)
  Using cached ml_dtypes-0.4.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (20 kB)
Collecting protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 (from tensorflow==2.17.1)
  Using cached protobuf-4.25.8-cp37-abi3-manylinux2014_x86_64.whl.metadata (541 bytes)
Collecting tensorboard<2.18,>=2.17 (from tensorflow==2.17.1)
  Using cached tensorboard-2.17.1-py3-none-any.whl.metadata (1.6 kB)
Collecting numpy<2.0.0,>=1.26.0 (from tensorflow==2.17.1)
  Using cached numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Using cached tensorflow-2.17.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (601.4 MB)
Using ca

In [9]:
!pip install -r /content/image-analyzer/requirementsColab_py312.txt

Collecting numpy (from -r /content/image-analyzer/requirementsColab_py312.txt (line 62))
  Downloading numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m62.0/62.0 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
INFO: pip is looking at multiple versions of tensorflow[and-cuda] to determine which version is compatible with other requirements. This could take a while.
Collecting tensorflow[and-cuda] (from -r /content/image-analyzer/requirementsColab_py312.txt (line 105))
  Using cached tensorflow-2.20.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.5 kB)
Collecting protobuf (from -r /content/image-analyzer/requirementsColab_py312.txt (line 76))
  Downloading protobuf-6.33.2-cp39-abi3-manylinux2014_x86_64.whl.metadata (593 bytes)
Collecting tensorboard (from -r /content/image-analyzer/req

In [None]:
# Initialize Image Analyzer using the configuration
import os
import sys
sys.path.append('/content/image-analyzer/src')
import image_analyzer as IA
ia = IA.IA(config_path = "/content/image-analyzer/config/configuration.yaml")
print("‚úÖ Image Analyzer initialized successfully!")

## üìÇ Step 2 ‚Äî Set Your Input & Output Folders

Before running the analysis, we need to tell **Image Analyzer**:

- **Where to find your images** (input directory).  
- **Where to save results** (output directory).

### Option 1 ‚Äî Local Colab Storage  
- Temporary ‚Äî all files will be deleted after the session ends.
- Upload your images to `/content/image-analyzer/data/`.

### Option 2 ‚Äî Google Drive (Recommended)  
- Persistent ‚Äî files stay after the session ends.  
- Requires mounting Google Drive and specifying the image folder.


In [None]:
# === Set Input & Output Directories ===

USE_GOOGLE_DRIVE = False  # Change to True if you want to use Google Drive

if not USE_GOOGLE_DRIVE:
    ia.input_dir  = "/content/image-analyzer/data/test_human_20"                    # Local images folder
    ia.output_dir = "/content/image-analyzer/outputs"                               # Local results folder

else:
    from google.colab import drive
    drive.mount('/content/drive')

    ia.input_dir  = "/content/drive/MyDrive/image-analyzer/data/test_human_20"     # Google Drive images folder
    ia.output_dir = "/content/drive/MyDrive/image-analyzer/outputs"                # Google Drive results folder

# Create output directory if it doesn't exist
os.makedirs(ia.output_dir, exist_ok=True)

print(f"üìÅ Input directory: {ia.input_dir}")
print(f"‚úÖ Output directory: {ia.output_dir}")

## ‚ñ∂Ô∏è Step 3 ‚Äî Run the Analysis Pipeline

Now that Image Analyzer is set up and your input/output folders are configured, run the pipeline to process all images in the input directory.  

Results will be saved in the output directory in **CSV** and **Excel** formats.  

üí° *This step may take a while depending on dataset size and whether GPU acceleration is enabled.*

In [None]:
# Run the pipeline; you will find the results in the output directory mentioned above upon completion of this cell
results, logs = ia.process_batch()

## üìÑ Step 4 (Optional) ‚Äî Generate Min/Max Summary PDF

Once the analysis is complete, you can create a PDF report showing:

- An example image for each feature‚Äôs **lowest** value.  
- An example image for each feature‚Äôs **highest** value.

This is helpful for quickly reviewing feature extremes in your dataset.

In [None]:
# Create a PDF file that shows an exemplary image for each feature's min and max values
# By default, all object detection features are excluded (i.e., coco_*, imagenet_*, contains_*)
ia.create_argmin_argmax_pdf(exclude_object_detection_features = True)