# Gun Identification Using Gesture Matching

This notebook demonstrates how to detect and identify guns appearing in a live video feed, using a custom trained deep learning model.

To get started, you must have the following requirements installed:

1. `tensorflow` - to load the custom gun detection model
2. `numpy` - to perform numerical operations and array manipulations
3. `opencv-python` - to capture video from the camera and perform image processing operations
4. `notebook` - to run this notebook in a Jupyter server

It is recommended to have a virtual environment to isolate these requirements from the rest of your system. This can be done using Python's virtualenv package.

First, open a Terminal (Command Prompt on Windows) in the same folder as this notebook and create a virtual environment:

```shell
python3 -m venv env
```

Next, activate the virtual environment. For Windows users:

```shell
.\env\Scripts\activate
```

For Linux and MacOS users:

```shell
source env/bin/activate
```

Now you can safely install the above requirements in your virtual environment:

```shell
pip install tensorflow numpy opencv-python notebook
```

After installation is complete, launch the Jupyter server to edit this notebook:

```shell
jupyter notebook
```

## Getting Started

The following imports are necessary to work with this notebook:

In [10]:
# Imports

import cv2
import numpy as np
from tensorflow import keras
from keras.models import load_model
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input

We will be using `opencv-python` to read a live video input from the computer's webcam and perform inference on each frame of the video input, using the custom gun detection model we trained <a href="https://github.com/ravi-aratchige/IPCV/blob/main/6.%20Gun%20Detection/Gun%20Detection%20with%20VGG16.ipynb">previously</a>. The predictions made by this model will be shown in real-time on the webcam's video feed as well.

## Prerequisites

Before we start the webcam to detect and identify guns, we need to do the following.

First, we need to load the saved model from the <a href="https://github.com/ravi-aratchige/IPCV/blob/main/6.%20Gun%20Detection/Gun%20Detection%20with%20VGG16.ipynb">previous notebook</a>.

In [12]:
# Load the saved gun detection modelCheck whether the mask has any non-zero values
model = load_model("model.h5")

print(model)

<keras.src.engine.sequential.Sequential object at 0x7cef260b5390>


Next, we need to define a **post-processing utility function** to do the following:

1. Convert the model's prediction into a **binary mask** using a threshold (in this case, 0.3) - values greater than the threshold wil be converted into 1s and others will be converted in 0s.
2. Ensure that the binary mask is a **NumPy array** (of type `uint8`).
3. Apply **erosion** to remove small false positives (noise). Erosion removes noise from the mask by shrinking the white regions (1s) i.e. the foreground. This must be done iteratively.
4. Apply dilation to restore the size of the eroded mask and adjust the regions affected by the erosion. Dilation restores the size of the previously shrunken regions, which helps recover the eroded regions while keeping the noise reduced. This must also be done iteratively.

This function will return the processed mask, or the binary mask as it is if it is empty (i.e. zero values only).

In [13]:
def apply_post_processing(prediction, erosion_kernel):
    # Convert the prediction to a binary mask using a threshold
    binary_mask = (prediction > 0.3).astype(np.uint8)

    # Check if the mask is not empty
    if np.any(binary_mask):
        # Ensure that the binary mask is a NumPy array
        binary_mask = np.asarray(binary_mask, dtype=np.uint8)

        # Apply erosion to remove small false positives
        eroded_mask = cv2.erode(binary_mask, erosion_kernel, iterations=8)

        # Apply dilation to restore the size of the image
        post_processed_mask = cv2.dilate(eroded_mask, erosion_kernel, iterations=5)

        return post_processed_mask
    else:
        return binary_mask

## Working with the Webcam

### VideoCapture Configuration

We can now configure and create the VideoCapture object for reading and manipulating the webcam's input.

In [14]:
# Create a VideoCapture object
cap = cv2.VideoCapture(0)

# Set the width of the video capture
cap.set(3, 640)
# Set the height of the video capture
cap.set(4, 480)

True

Here, `cap.set()` has the following parameters:

1. `propId` - an integer value assigned to a property of the `VideoCapture` object. `3` is the `propId` for width, while `4` is the `propId` for height.
2. The value assigned to these properties (i.e. number of pixels along each axis of the `VideoCapture` object).

### Binary Erosion Kernel

Next, we initialize the kernel for performing erosion on the binary mask. For this scenario, we will use a 5x5 filter as our kernel, which will move across the binary mask and average out the pixels in the overlapping regions.

In [16]:
# Initialize the binary erosion kernel
erosion_kernel = np.ones((5, 5), np.uint8)

print(erosion_kernel)

[[1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]]


### Temporal Smoothing

When working with video input that fluctuates over time, it is important to perform **temporal smoothing**. This is the process of averaging predictions made over a period of time, which helps **reduce short-term fluctuations and noise**, thereby providing a more stable and reliable output.

For this, we need to initialize variables to

1. Store a history of the most recent predictions made.
2. Specify how many predictions will be considered for temporal smoothing (i.e. the maximum number of predictions that will be stored in the predictions history at any moment).

For this scenario, we will use the 5 most recent predictions made.

In [17]:
# Initialize empty list for storing prediction history
prediction_history = []

# Initialize maximum number of predictions considered for temporal smoothing
history_length_config = 5

### Main Loop

Finally, we can create the main loop to start and work with the webcam input.

In [None]:
while True:
    # Capture a frame from the video feed
    ret, frame = cap.read()

    # Check if the frame is not empty
    if not ret:
        print("Error: Could not read frame")
        break

    # Preprocess the frame as expected by the VGG16 model
    processed_frame = cv2.resize(frame, (224, 224))
    img_array = image.img_to_array(processed_frame)
    img_array = np.expand_dims(img_array, axis=0)
    img_data = preprocess_input(img_array)

    # Perform inference using the model and retrieve the prediction made
    prediction = model.predict(img_data)[0][0]

    # Append the current prediction to the predictions history
    prediction_history.append(prediction)

    # Remove any old predictions if the predictions history has exceeded the maximum number of predictions
    if len(prediction_history) > history_length_config:
        prediction_history = prediction_history[1:]

    # Calculate the smoothed prediction using a simple moving average
    smoothed_prediction = np.mean(prediction_history)

    # Apply post-processing to the smoothed prediction
    post_processed_prediction = apply_post_processing(smoothed_prediction)

    # Initialize label to be displayed based on the post-processed prediction
    prediction_label = (
        "Gun Detected" if post_processed_prediction > 0.5 else "No Gun Detected"
    )

    # Display the frame with the post-processed prediction
    cv2.putText(
        frame,
        f"Prediction: {prediction_label}",
        (10, 30),
        cv2.FONT_HERSHEY_SIMPLEX,
        0.8,
        (255, 255, 255),
        2,
    )
    cv2.imshow("Gun Detection", frame)
    # cv2.waitKey(0)

    # Break the loop when the 'q' key is pressed
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

# Release the VideoCapture object and close all windows
cap.release()
cv2.destroyAllWindows()