# PatternMind â€“ Feature Extraction

This notebook extracts feature vectors from all images in the `patternmind_dataset/` folder using a pre-trained CNN (VGG16).
The resulting feature matrix is saved as `X_features.npy` and will be used later by the K-Means clustering notebook (`02_kmeans_clustering.ipynb`).


In [None]:
import os
import glob
import numpy as np
from tqdm import tqdm

from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.preprocessing import image


## 1. Define Dataset Location

We assume that all images are stored inside the folder `patternmind_dataset/` in the same directory as this notebook.
If your folder is somewhere else, adjust the `DATA_DIR` path below.


In [None]:
DATA_DIR = 'patternmind_dataset'

# Collect all image paths (jpg, jpeg, png) recursively
extensions = ['*.jpg', '*.jpeg', '*.png']
image_paths = []
for ext in extensions:
    image_paths.extend(glob.glob(os.path.join(DATA_DIR, '**', ext), recursive=True))

print(f'Found {len(image_paths)} images.')
image_paths[:5]  # show a few examples


## 2. Load Pre-trained CNN (VGG16)

We use VGG16 pre-trained on ImageNet as a generic feature extractor.
We remove the top classification layer and keep the global average pooled features.


In [None]:
# Load VGG16 without the top classification layer
base_model = VGG16(weights='imagenet', include_top=False, pooling='avg')
base_model.summary()


## 3. Extract Features for All Images

For each image, we:
1. Load and resize it to 224x224.
2. Convert it to an array and apply the VGG16 preprocessing.
3. Run a forward pass through the network to obtain a feature vector.
4. Store the feature vectors in a NumPy array of shape `(n_samples, n_features)`.


In [None]:
features_list = []

for img_path in tqdm(image_paths, desc='Extracting features'):
    img = image.load_img(img_path, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)

    feats = base_model.predict(x, verbose=0)
    feats = feats.flatten()
    features_list.append(feats)

X_features = np.array(features_list)
print('Feature matrix shape:', X_features.shape)


## 4. Save Feature Matrix

We now save the feature matrix as `X_features.npy` and also store the
corresponding image paths in `image_paths.npy` for possible later use.


In [None]:
np.save('X_features.npy', X_features)
np.save('image_paths.npy', np.array(image_paths))
print('Saved X_features.npy and image_paths.npy')


## 5. Next Steps

- Keep `01_extract_features.ipynb` and `02_kmeans_clustering.ipynb` in the same folder.
- After running this notebook, you will have `X_features.npy` in that folder.
- You can then open and run `02_kmeans_clustering.ipynb` without modifying the loading line:

```python
X = np.load('X_features.npy')
```
