 ## Business Goal
 Given an image of a retail shelf, we need to extract KPIs like: ✅ Visibility on Shelf → How well the product is placed?
✅ Buying from Shelf → Are people interacting with it?
✅ Findability on Shelf → How easy is it to locate?
✅ Time to Find → How long does it take a customer to find the product?
##### 1️⃣ Use YOLOv5 for Object Detection to detect products on the shelf.
##### 2️⃣ Extract Attention Maps (Grad-CAM) from EfficientNet-B7 for feature enhancement.
##### 3️⃣ Convert Attention Maps to Feature Embeddings.
##### 4️⃣ Train a CNN Model (EfficientNet-B7) with embeddings to classify visibility, findability, etc.
##### 5️⃣ Deploy Model for Real-Time KPI Estimation.

Technical Approach

1️⃣ Use YOLOv5 for Object Detection to detect products on the shelf.

2️⃣ Extract Attention Maps (Grad-CAM) from EfficientNet-B7 for feature enhancement.

3️⃣ Convert Attention Maps to Feature Embeddings.

4️⃣ Train a CNN Model (EfficientNet-B7) with embeddings to classify visibility, findability, etc.

5️⃣ Deploy Model for Real-Time KPI Estimation.

#### Step 1: Install Dependencies
!pip install torch torchvision ultralytics opencv-python numpy matplotlib

#### Step 2: Detect Products on Shelf Using YOLOv5

In [None]:
import torch
import cv2
import numpy as np
from ultralytics import YOLO

model = YOLO("yolov5s.pt")  # Load YOLOv5 for object detection
model.eval()

def detect_products(image_path):
    image = cv2.imread(image_path)
    results = model(image)

    bboxes = results[0].boxes.xyxy.cpu().numpy()  # Extract bounding boxes
    class_ids = results[0].boxes.cls.cpu().numpy()  # Extract class labels

    for bbox, cls in zip(bboxes, class_ids):
        x1, y1, x2, y2 = map(int, bbox)
        cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)  # Draw bounding box
        cv2.putText(image, f"Class {int(cls)}", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)

    cv2.imshow("Detected Products", image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

    return bboxes  # Return product locations

image_path = "Shavingrazor.jpg"
bboxes = detect_products(image_path)


PRO TIP  Replace 'model=yolov5s.pt' with new 'model=yolov5su.pt'.
YOLOv5 'u' models are trained with https://github.com/ultralytics/ultralytics and feature improved performance vs standard YOLOv5 models trained with https://github.com/ultralytics/yolov5.


0: 640x352 (no detections), 127.4ms
Speed: 3.1ms preprocess, 127.4ms inference, 0.8ms postprocess per image at shape (1, 3, 640, 352)


### Step 3: Extract Attention Features Using EfficientNet-B7
extract attention maps from detected products.

In [None]:
import torchvision.models as models
import torch.nn.functional as F
import matplotlib.pyplot as plt

efficientnet = models.efficientnet_b7(weights=models.EfficientNet_B7_Weights.DEFAULT)
efficientnet.eval()
target_layer = efficientnet.features[-1]  # Last conv layer for attention

class GradCAM:
    def __init__(self, model, target_layer):
        self.model = model
        self.target_layer = target_layer
        self.gradients = None
        self.activations = None
        self.target_layer.register_forward_hook(self.save_activations)
        self.target_layer.register_backward_hook(self.save_gradients)

    def save_activations(self, module, input, output):
        self.activations = output

    def save_gradients(self, module, grad_input, grad_output):
        self.gradients = grad_output[0]

    def generate(self, image_tensor, class_idx):
        output = self.model(image_tensor)
        loss = output[:, class_idx]
        self.model.zero_grad()
        loss.backward()
        gradients = self.gradients.mean(dim=[2, 3], keepdim=True)
        cam = (self.activations * gradients).sum(dim=1, keepdim=True)
        cam = torch.relu(cam)
        cam = cam.squeeze().cpu().detach().numpy()
        cam = cv2.resize(cam, (224, 224))
        cam = (cam - cam.min()) / (cam.max() - cam.min())
        return cam

def preprocess_product(image, bbox):
    x1, y1, x2, y2 = map(int, bbox)
    product = image[y1:y2, x1:x2]
    product = cv2.resize(product, (224, 224))
    product = product.astype(np.float32) / 255.0
    product_tensor = torch.tensor(product).permute(2, 0, 1).unsqueeze(0)
    return product_tensor

def extract_attention_embeddings(image_path, bboxes):
    image = cv2.imread(image_path)
    embeddings = []
    grad_cam = GradCAM(efficientnet, target_layer)

    for bbox in bboxes:
        product_tensor = preprocess_product(image, bbox)
        output = efficientnet(product_tensor)
        class_idx = torch.argmax(output).item()
        heatmap = grad_cam.generate(product_tensor, class_idx)
        embedding = torch.tensor(heatmap).flatten().unsqueeze(0)  # Convert heatmap to 1D feature
        embeddings.append(embedding)

    return torch.cat(embeddings, dim=0)  # Stack all embeddings

attention_embeddings = extract_attention_embeddings(image_path, bboxes)
print("Extracted attention embeddings shape:", attention_embeddings.shape)


Step 4: Train EfficientNet-B7 for KPI Classification

Train EfficientNet-B7 using attention-enhanced embeddings.

In [None]:
class KPIClassifier(torch.nn.Module):
    def __init__(self):
        super(KPIClassifier, self).__init__()
        self.base_model = models.efficientnet_b7(weights=models.EfficientNet_B7_Weights.DEFAULT)
        self.base_model.classifier[1] = torch.nn.Linear(self.base_model.classifier[1].in_features, 256)
        self.attention_fc = torch.nn.Linear(224 * 224, 256)
        self.classifier = torch.nn.Linear(512, 5)  # 5 KPIs: Visibility, Findability, etc.

    def forward(self, image_tensor, attention_embedding):
        cnn_features = self.base_model(image_tensor)
        attention_features = self.attention_fc(attention_embedding)
        combined = torch.cat((cnn_features, attention_features), dim=1)
        output = self.classifier(combined)
        return output

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = KPIClassifier().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
criterion = torch.nn.CrossEntropyLoss()

def train_kpi_model(train_loader, model, optimizer, criterion, epochs=10):
    model.train()
    for epoch in range(epochs):
        for image_tensor, attention_embedding, kpi_label in train_loader:
            image_tensor, attention_embedding, kpi_label = image_tensor.to(device), attention_embedding.to(device), kpi_label.to(device)

            optimizer.zero_grad()
            output = model(image_tensor, attention_embedding)
            loss = criterion(output, kpi_label)
            loss.backward()
            optimizer.step()

        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

train_kpi_model(train_loader, model, optimizer, criterion)


Step 5: Predict KPIs for New Shelf Images

In [None]:
def predict_kpis(image_path, model):
    image = cv2.imread(image_path)
    bboxes = detect_products(image_path)
    attention_embedding = extract_attention_embeddings(image_path, bboxes)

    model.eval()
    with torch.no_grad():
        image_tensor = torch.randn(1, 3, 224, 224).to(device)  # Dummy tensor
        attention_embedding = attention_embedding.to(device)
        output = model(image_tensor, attention_embedding)
        predictions = torch.argmax(output, dim=1)

    return predictions.cpu().numpy()

image_path = "new_shelf.jpg"
print(f"Predicted KPIs: {predict_kpis(image_path, model)}")


Business Impact

✅ Retail Shelf Analysis → Measure how well a product is placed.

✅ Consumer Behavior Tracking → Time to find and purchase probability.

✅ Planogram Compliance → Ensuring correct product arrangement.

✅ To deploy this as an API (FastAPI, Flask)?

✅ To visualize the KPI trends using Power BI?

## Deploying Retail Shelf KPI Detection as an API with Flask

Step 1: Install Dependencies

!pip install flask torch torchvision ultralytics opencv-python numpy

### Step 2: Create a Flask App 
Create a new file app.py

In [None]:
from flask import Flask, request, jsonify
import torch
import cv2
import numpy as np
from ultralytics import YOLO
import torchvision.models as models

app = Flask(__name__)

# Load YOLOv5 for product detection
yolo_model = YOLO("yolov5s.pt")
yolo_model.eval()

# Load EfficientNet-B7 for KPI classification
efficientnet = models.efficientnet_b7(weights=models.EfficientNet_B7_Weights.DEFAULT)
efficientnet.eval()
target_layer = efficientnet.features[-1]

class GradCAM:
    def __init__(self, model, target_layer):
        self.model = model
        self.target_layer = target_layer
        self.gradients = None
        self.activations = None
        self.target_layer.register_forward_hook(self.save_activations)
        self.target_layer.register_backward_hook(self.save_gradients)

    def save_activations(self, module, input, output):
        self.activations = output

    def save_gradients(self, module, grad_input, grad_output):
        self.gradients = grad_output[0]

    def generate(self, image_tensor, class_idx):
        output = self.model(image_tensor)
        loss = output[:, class_idx]
        self.model.zero_grad()
        loss.backward()
        gradients = self.gradients.mean(dim=[2, 3], keepdim=True)
        cam = (self.activations * gradients).sum(dim=1, keepdim=True)
        cam = torch.relu(cam)
        cam = cam.squeeze().cpu().detach().numpy()
        cam = cv2.resize(cam, (224, 224))
        cam = (cam - cam.min()) / (cam.max() - cam.min())
        return cam

def detect_products(image):
    results = yolo_model(image)
    bboxes = results[0].boxes.xyxy.cpu().numpy()
    return bboxes

def extract_attention_embeddings(image, bboxes):
    embeddings = []
    grad_cam = GradCAM(efficientnet, target_layer)
    
    for bbox in bboxes:
        x1, y1, x2, y2 = map(int, bbox)
        product = image[y1:y2, x1:x2]
        product = cv2.resize(product, (224, 224))
        product = product.astype(np.float32) / 255.0
        product_tensor = torch.tensor(product).permute(2, 0, 1).unsqueeze(0)
        
        output = efficientnet(product_tensor)
        class_idx = torch.argmax(output).item()
        heatmap = grad_cam.generate(product_tensor, class_idx)
        embedding = torch.tensor(heatmap).flatten().unsqueeze(0)
        embeddings.append(embedding)
    
    return torch.cat(embeddings, dim=0)

@app.route('/predict', methods=['POST'])
def predict_kpi():
    if 'image' not in request.files:
        return jsonify({"error": "No image uploaded"}), 400

    file = request.files['image']
    image = cv2.imdecode(np.frombuffer(file.read(), np.uint8), cv2.IMREAD_COLOR)
    
    bboxes = detect_products(image)
    attention_embedding = extract_attention_embeddings(image, bboxes)

    response = {
        "bboxes": bboxes.tolist(),
        "attention_embedding_shape": attention_embedding.shape[1]
    }

    return jsonify(response)

if __name__ == '__main__':
    app.run(debug=True)


Step 3: Run the Flask API

python app.py

test with Postman  or cURL
curl -X POST -F "image=@shelf.jpg" http://127.0.0.1:5000/predict

Expected Output

{
    "bboxes": [[100, 200, 250, 350], [300, 100, 450, 250]],
    "attention_embedding_shape": 50176
}

"bboxes" → Detected product bounding boxes.

"attention_embedding_shape" → Feature vector for each detected product.