# Detect Terrorists using YOLOv8 Model

+ In this project, we are going to cover:
  - Before you start
  - Image dataset preperation
  - Autolabel dataset
  - Train target model
  - Evaluate target model

# 🔥 Let's begin!

# ⚡ Before you start
Let's make sure that we have access to GPU. We can use nvidia-smi command to do that. In case of any problems navigate to Edit -> Notebook settings -> Hardware accelerator, set it to GPU, and then click Save.

In [None]:
# run on GPU from Runtime bar then change runtime types
# show type GPU
# !nvidia-smi # Tesla T4

In [None]:
# ! pip install ultralytics

In [None]:
# from ultralytics import YOLO # import YOLO algorithm for detection, classification or segmentation
# from IPython.display import display , Image # to display images

In [None]:
# !yolo task=detect mode=predict  model=yolov8n.pt  conf=0.25 source='https://media.roboflow.com/notebooks/...
# Task= detect, classificat or segment
# mode= predict, train or valid
# mode= yolov8n.pt (nano), yolov8s.pt (small), yolov8m.pt (medium), yolov8l.pt (large) or yolov8x.pt (xlarge)
# conf= 0.25 > size of bounding box if > 0.25 then it's class
# source= source of image

In [None]:
# display image
# Image(filename='source', height=500)

+ Classes
  <!-- - Military -->
  <!-- - Civilian -->
  - Armed military > labels
  - Armed civilian > unlabeled
  <!-- - Tank -->

So, we should get labels for dataset `Armed civilian` using framework Autodistill.

🧪 install autodistill

In [None]:
!pip install -q \
autodistill \
autodistill-grounded-sam \
autodistill-yolov8 \
supervision==0.9.0

get home directory

In [None]:
import os
HOME = os.getcwd() # current working directory
print(HOME)

/content


# 🖼️ Image dataset preperation

**NOTE:** To use Autodistill all you need to have is a set of images that you want to automatically annotate, and use for target model training.

In [None]:
# !rm -r {HOME}/images
!mkdir {HOME}/images
IMAGE_DIR_PATH = f"{HOME}/images"

mkdir: cannot create directory ‘/content/images’: File exists


**NOTE:** If you want to build YOLOv8 on your data make sure to upload it into `images` directory that we just created. ☝️

# get frames (images) from videos (Optional)
### Download raw videos

**NOTE:** In this tutorial, we will start with a directory containing video files and I will show you how to turn it into a ready-to-use collection of images. If you are working with your images, you can skip this part.

In [None]:
# !mkdir {HOME}/videos
# %cd {HOME}/videos

# # download zip file containing videos
# !wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1wnW7v6UTJZTAcOQj0416ZbQF8b7yO6Pt' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1wnW7v6UTJZTAcOQj0416ZbQF8b7yO6Pt" -O milk.zip && rm -rf /tmp/cookies.txt

# # unzip videos
# !unzip milk.zip

### Convert videos into images (optional)

**NOTE:** Now, let's convert videos into images. By default, the code below saves every `10th` frame from each video. You can change this by manipulating the value of the `FRAME_STRIDE` parameter.

In [None]:
# VIDEO_DIR_PATH = f"{HOME}/videos"
# IMAGE_DIR_PATH = f"{HOME}/images"
# FRAME_STRIDE = 10 # get image each 10 frames from video

**NOTE:** Notice that we put two of our videos aside so that we can use them at the end of the notebook to evaluate our model. (Optional)

In [None]:
# import supervision as sv
# from tqdm.notebook import tqdm

# video_paths = sv.list_files_with_extensions(
#     directory=VIDEO_DIR_PATH,
#     extensions=["mov", "mp4"])

# TEST_VIDEO_PATHS, TRAIN_VIDEO_PATHS = video_paths[:2], video_paths[2:]

# for video_path in tqdm(TRAIN_VIDEO_PATHS):
#     video_name = video_path.stem
#     image_name_pattern = video_name + "-{:05d}.png"
#     with sv.ImageSink(target_dir_path=IMAGE_DIR_PATH, image_name_pattern=image_name_pattern) as sink:
#         for image in sv.get_video_frames_generator(source_path=str(video_path), stride=FRAME_STRIDE):
#             sink.save_image(image=image)

### Display image sample

**NOTE:** Before we start building a model with autodistill, let's make sure we have everything we need.

In [None]:
import supervision as sv

# check number of images
image_paths = sv.list_files_with_extensions(
    directory=IMAGE_DIR_PATH,
    extensions=["png", "jpg", "jpg"])

print('image count:', len(image_paths))

image count: 1036


**NOTE:** We can also plot sample of our image dataset.

In [None]:
import cv2
import supervision as sv

SAMPLE_SIZE = 16
SAMPLE_GRID_SIZE = (4, 4)
SAMPLE_PLOT_SIZE = (16, 16)

# list of names of images
titles = [
    image_path.stem # stem get title or name of image from loop
    for image_path
    in image_paths[:SAMPLE_SIZE] # first 16 images
    ]

# list of images
images = [
    cv2.imread(str(image_path)) # read images from loop
    for image_path
    in image_paths[:SAMPLE_SIZE]]

sv.plot_images_grid(images=images, titles=titles, grid_size=SAMPLE_GRID_SIZE, size=SAMPLE_PLOT_SIZE)

## 🏷️ Autolabel dataset
### Define ontology

**Ontology** - an Ontology defines how your Base Model is prompted, what your Dataset will describe, and what your Target Model will predict. A simple Ontology is the CaptionOntology which prompts a Base Model with text captions and maps them to class names. Other Ontologies may, for instance, use a CLIP vector or example images instead of a text caption.

In [None]:
from autodistill.detection import CaptionOntology

# names of classes for dataset military
# ontology=CaptionOntology({
#     "military person": "military",
#     "military weapon": "weapon"
# })

# names of classes for dataset civilian
ontology=CaptionOntology({
    "civilian person": "civilian",
    "civilian weapon": "weapon"
})

### Initiate base model and autolabel

**Base Model** - A Base Model is a large foundation model that knows a lot about a lot. Base models are often multimodal and can perform many tasks. They're large, slow, and expensive. Examples of Base Models are GroundedSAM and GPT-4's upcoming multimodal variant. We use a Base Model (along with unlabeled input data and an Ontology) to create a Dataset.

In [None]:
!rm -r {HOME}/dataset
!mkdir {HOME}/dataset
DATASET_DIR_PATH = f"{HOME}/dataset"

**NOTE:** Base Models are slow... Make yourself a coffee, autolabeing may take a while. ☕

In [None]:
# !cd ~/.cache/autodistill/
# !rm -r *

# !pip install --upgrade autodistill_grounded_sam

In [None]:
# !pip uninstall torch
# !pip install torch

In [None]:
from autodistill_grounded_sam import GroundedSAM

base_model = GroundedSAM(ontology=ontology)
dataset = base_model.label(
    input_folder=IMAGE_DIR_PATH,
    extension=".jpg",
    output_folder=DATASET_DIR_PATH)

# END ANNONATIONS

trying to load grounding dino directly
downloading dino model weights


  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]


final text_encoder_type: bert-base-uncased


Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


KeyboardInterrupt: ignored

### Display dataset sample

**Dataset** - a Dataset is a set of auto-labeled data that can be used to train a Target Model. It is the output generated by a Base Model.

In [None]:
ANNOTATIONS_DIRECTORY_PATH = f"{HOME}/dataset/train/labels"
IMAGES_DIRECTORY_PATH = f"{HOME}/dataset/train/images"
DATA_YAML_PATH = f"{HOME}/dataset/data.yaml"

In [None]:
import supervision as sv

dataset = sv.DetectionDataset.from_yolo(
    images_directory_path=IMAGES_DIRECTORY_PATH,
    annotations_directory_path=ANNOTATIONS_DIRECTORY_PATH,
    data_yaml_path=DATA_YAML_PATH)

len(dataset)

In [None]:
import supervision as sv

image_names = list(dataset.images.keys())[:SAMPLE_SIZE]

mask_annotator = sv.MaskAnnotator()
box_annotator = sv.BoxAnnotator()

images = []
for image_name in image_names:
    image = dataset.images[image_name]
    annotations = dataset.annotations[image_name]
    labels = [
        dataset.classes[class_id]
        for class_id
        in annotations.class_id]
    annotates_image = mask_annotator.annotate(
        scene=image.copy(),
        detections=annotations)
    annotates_image = box_annotator.annotate(
        scene=annotates_image,
        detections=annotations,
        labels=labels)
    images.append(annotates_image)

sv.plot_images_grid(
    images=images,
    titles=image_names,
    grid_size=SAMPLE_GRID_SIZE,
    size=SAMPLE_PLOT_SIZE)

## 🔥 Train target model - YOLOv8

**Target Model** - a Target Model is a supervised model that consumes a Dataset and outputs a distilled model that is ready for deployment. Target Models are usually small, fast, and fine-tuned to perform a specific task very well (but they don't generalize well beyond the information described in their Dataset). Examples of Target Models are YOLOv8 and DETR.

In [None]:
# %cd {HOME}

# from autodistill_yolov8 import YOLOv8
# small verson from yolo8
# target_model = YOLOv8("yolov8s.pt")
# target_model.train(DATA_YAML_PATH, epochs=50)

## ⚖️ Evaluate target model

**NOTE:** As with the regular YOLOv8 training, we can now take a look at artifacts stored in `runs` directory.

In [None]:
# %cd {HOME}

# from IPython.display import Image

# Image(filename=f'{HOME}/runs/detect/train/confusion_matrix.png', width=600)

In [None]:
# %cd {HOME}

# from IPython.display import Image

# Image(filename=f'{HOME}/runs/detect/train/results.png', width=600)

In [None]:
# %cd {HOME}

# from IPython.display import Image

# Image(filename=f'{HOME}/runs/detect/train/val_batch0_pred.jpg', width=600)

## 🎬 Run Inference on a video

In [None]:
# import locale
# locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
# INPUT_VIDEO_PATH = TEST_VIDEO_PATHS[0]
# OUTPUT_VIDEO_PATH = f"{HOME}/output.mp4"
# TRAINED_MODEL_PATH = f"{HOME}/runs/detect/train/weights/best.pt"

In [None]:
# !yolo predict model={TRAINED_MODEL_PATH} source={INPUT_VIDEO_PATH}