# **PROJECT: Mini Reverse Image Search using CNN + Flask**

1. Feature Extraction Script (run once)

2. Flask Backend Code (app.py)

3. HTML Templates

**By the end, you'll have a mini Reverse Image Search engine using:**

✔ CNN (VGG16)

✔ TensorFlow/Keras

✔ Cosine similarity

✔ Flask deployment

✔ A neat HTML UI

In [1]:
# from google.colab import drive
# drive.mount('/content/drive')

In [2]:
# !pip install pillow

# **Step 1 - Feature Extraction Script (feature_extraction.py)**

In [3]:
import os
import numpy as np
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input
from tensorflow.keras.models import Model

# Load Pretrained Model

In [4]:
# We remove the final classification layers and only keep the feature extractor


base_model = VGG16(weights='imagenet')
model = Model(inputs=base_model.input, outputs=base_model.get_layer('fc1').output)

In [5]:
# Set your Google Drive image folder path
# IMAGE_DIR = "/content/drive/MyDrive/Images/"
IMAGE_DIR = 'static\Images'

# Lists to store extracted data
features = []
image_paths = []



# **extract_features() function**


**The function extract_features(img_path) takes an image file path as input and returns a 4096-dimensional feature vector produced by the FC1 layer of VGG16**


This vector is a compact numerical representation of the image and is commonly used for:

Image similarity

Image search engines

Clustering

Recommendation systems


The steps inside the function:

1. Load image & resize to (224×224), because VGG16 requires this input size.

2. Convert the image to a NumPy array.

3. Expand the array to shape (1, 224, 224, 3) to simulate a batch (VGG expects batches).

4. Preprocess using VGG16 preprocessing (RGB → BGR, mean subtraction).

5. Pass the image into the VGG16 model and collect the FC1 layer output.

6. Normalize the resulting vector so comparisons (like cosine similarity) become stable.

7. Return the final feature vector.

In [6]:
# Step 2.2 — Extract features from a single image
# ---------------------------------------
def extract_features(img_path):
    img = image.load_img(img_path, target_size=(224, 224))
    img_array = image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0)

    # Preprocess for VGG16
    img_array = preprocess_input(img_array)

    # Get FC1 layer output (4096-dim)
    feature_vector = model.predict(img_array)[0]

    # Normalize the vector for stable similarity comparison
    feature_vector = feature_vector / np.linalg.norm(feature_vector)

    return feature_vector


**Step 2.3 — Loop through all images and extract features**

In [7]:
print("Extracting features...")

# Loop through every file in the image directory
for img_name in os.listdir(IMAGE_DIR):

    # Create the full path for the file
    # Example: "/content/drive/MyDrive/images/cat1.jpg"
    file_path = os.path.join(IMAGE_DIR, img_name)

    # ------------------------------------------------------------
    # Check if file is an image (only process .jpg, .jpeg, .png)
    # Helps avoid errors if the folder contains non-image files
    # ------------------------------------------------------------
    if file_path.lower().endswith((".jpg", ".jpeg", ".png")):

        # ------------------------------------------------------------
        # Extract 4096-dim VGG16 features from the image
        # The extract_features() function returns a normalized vector
        # ------------------------------------------------------------
        features.append(extract_features(file_path))

        # Save the path for later reference / image search
        image_paths.append(file_path)

        # Print progress so we know which image was processed
        print("Processed:", img_name)


Extracting features...
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 10s/step
Processed: 00befedd19.jpg
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 423ms/step
Processed: 0206101929.jpg
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 410ms/step
Processed: 022a8608b9.jpg
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 459ms/step
Processed: 034292a8ff.jpg
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 448ms/step
Processed: 0428e62301.jpg
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 479ms/step
Processed: 043d33679c.jpg
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 481ms/step
Processed: 0465f6d586.jpg
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 467ms/step
Processed: 04cddbbf04.jpg
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 431ms/step
Processed: 05f5a5379d.jpg
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 463ms/step
Processed: 061

In [8]:
# Convert to numpy arrays
features = np.array(features)
image_paths = np.array(image_paths)

# ---------------------------------------
# Step 2.4 — Save features and paths
# ---------------------------------------
np.save("features.npy", features)
np.save("image_paths.npy", image_paths)