# **Notes**

**Introduction and Dataset**

* The tutorial covers the full process of training a pose estimation model on custom data using YOLOv8, including annotation, file preparation, training (local and Colab), and evaluation.
* The project uses the AWA pose dataset, which contains images of various quadrupeds (antelopes, bobcats, buffaloes, etc.).
* The goal is to detect 39 distinct keypoints on the animals, such as the nose, eyes, ears, legs, and tail.

**Data Annotation with CVAT**

* The author uses the Computer Vision Annotation Tool (CVAT) to annotate data.
* Users must create a project and task, then upload images.
* **Key Requirement:** Annotations must follow a strict, consistent order for every image (e.g., always annotating the nose first, then the upper jaw).
* In addition to keypoints, a bounding box must be drawn around the object.
* Data should be exported in the format "CVAT for images 1.1" rather than "COCO keypoints 1.0" to avoid errors.
* A custom Python script (provided in the project repository) is used to convert the exported XML annotations into the YOLOv8-compatible format.

**File Structure and Label Format**

* The file system must be structured with a root `data` directory containing `images` and `labels` folders, which are further divided into `train` and `val` subdirectories.
* Label text files follow specific formatting rules:
* **Class ID:** The first number indicates the class (e.g., 0 for quadruped).
* **Bounding Box:** The next four numbers represent the center X, center Y, width, and height of the box.
* **Keypoints:** The remaining numbers represent the keypoints in `x, y, visibility` format.


* **Visibility Flags (V):**
* `0`: Not labeled.
* `1`: Labeled but not visible.
* `2`: Labeled and visible.


* YOLOv8 also supports a format with only `x, y` coordinates if visibility data is unavailable.

**Configuration and "Flip Index"**

* The `config.yaml` file defines paths to training/validation data, class names, and keypoint shapes.
* **Flip Index:** The config requires a `flip_idx` list. This handles data augmentation when an image is flipped horizontally (e.g., a "right eye" keypoint becomes a "left eye" keypoint upon flipping).
* Keypoints that are centered (like the nose) do not change index, but side-specific keypoints must swap indexes in this list.

**Training Process**

* **Local Training:** Requires installing the `ultralytics` package. A simple Python script loads the model and config, then executes the training loop.
* **Google Colab Training:**
* Data and the config file must be uploaded to Google Drive.
* The Drive is mounted to the Colab environment.
* Config file paths must be updated to match the Google Drive directory structure.
* After training, the results folder (`runs`) is copied back to Google Drive to save the weights.



**Model Evaluation**

* **Loss Analysis:** The author examines pose loss graphs. A downward trend in training loss indicates learning; validation loss should also decrease but may eventually plateau.
* **Visual Validation:** By comparing ground truth annotations with model predictions on validation batches, the author notes that facial features were detected well, but the model struggled with legs and tails.
* **Improvement Strategies:** To fix detection issues, the author suggests training for more epochs (the tutorial example used 100) or tuning hyperparameters.
* **Weights:** The process generates `best.pt` (best performance) and `last.pt` (final epoch). The author prefers `last.pt` for robustness.

**Inference**

* An `inference.py` script is used to predict keypoints on new samples (e.g., a wolf image).
* The script loads the trained model, iterates through results, and uses OpenCV (`cv2`) to draw numbers representing keypoints on the image.
* Final visualization confirmed the model performed well on the body and face but missed specific leg and tail points.