# Overview

This notebook helps to create a training set for the k-NN classifier described in the MediaPipe [Pose Classification](https://google.github.io/mediapipe/solutions/pose_classification.html) soultion, export it to a CSV and then use it in the [ML Kit sample app](https://developers.google.com/ml-kit/vision/pose-detection/classifying-poses#4_integrate_with_the_ml_kit_quickstart_app).

# Step 1: Upload image samples

Locally create a folder named `poses_images_in` with image samples.

Images should repesent terminal states of desired pose classes. I.e. if you want to classify burglary provide images for two classes: burglary and normal.

There should be about a few hundred samples per class covering different camera angles, environment conditions, body shapes, and exercise variations to build a good classifier.

Required structure of the images_in_folder:
```
poses_images_in/
  burglary/
    image_001.jpg
    image_002.jpg
    ...
  normal/
    image_001.jpg
    image_002.jpg
    ...
  ...
```

Zip the `poses_images_in` folder:
```
zip -r poses_images_in.zip poses_images_in
```

And run the code below to upload it to the Colab runtime

# Step 2: Create samples for classifier

Runs BlazePose on provided images to get target poses for the classifier in a format required by classifier.

In [11]:
%cd /Users/freddie/Music

/Users/freddie/Music


In [30]:
# Folder with images to use as target poses for classification.
#
# Images should repesent terminal states of desired pose classes. I.e. if you
# want to classify burglary provide iamges for two classes: burglary and normal
#
# Required structure of the images_in_folder:
#   poses_images_in/
#     pose_class_1/
#       image_001.jpg
#       image_002.jpg
#       ...
#     pose_class_2/
#       image_001.jpg
#       image_002.jpg
#       ...
#     ...
images_in_folder = 'poses_images_in'

# Output folders for bootstrapped images and CSVs. Image will have a predicted
# Pose rendering and can be used to remove unwanted samples.
images_out_folder = 'poses_images_out'

# Output CSV path to put bootstrapped poses to. This CSV will be used by the
# demo App.
#
# Output CSV format:
#   poses_images_in/
#     pose_class_1/
#       sample_00001,x1,y1,z1,x2,y2,z2,...,x33,y33,z33
#       sample_00002,x1,y1,z1,x2,y2,z2,...,x33,y33,z33
#       ...
#     pose_class_2/
#       sample_00001,x1,y1,z1,x2,y2,z2,...,x33,y33,z33
#       sample_00002,x1,y1,z1,x2,y2,z2,...,x33,y33,z33
#       ...
#   ...
#
csvs_out_folder = 'poses_csvs_out'

In [13]:
import csv
import cv2
import numpy as np
import os
import sys
import tqdm

from mediapipe.python.solutions import drawing_utils as mp_drawing
from mediapipe.python.solutions import pose as mp_pose

# Folder names are used as pose class names.
pose_class_names = sorted([n for n in os.listdir(images_in_folder) if not n.startswith('.')])

for pose_class_name in pose_class_names:
  print('Bootstrapping ', pose_class_name, file=sys.stderr)

  if not os.path.exists(csvs_out_folder):
    os.makedirs(csvs_out_folder)
    
  with open(os.path.join(csvs_out_folder,pose_class_name+'.csv'), 'w') as csv_out_file:
    csv_out_writer = csv.writer(csv_out_file, delimiter=',', quoting=csv.QUOTE_MINIMAL)

    if not os.path.exists(os.path.join(images_out_folder, pose_class_name)):
      os.makedirs(os.path.join(images_out_folder, pose_class_name))

    image_names = sorted([
        n for n in os.listdir(os.path.join(images_in_folder, pose_class_name))
        if not n.startswith('.')])
    for image_name in tqdm.tqdm(image_names, position=0):
      # Load image.
      input_frame = cv2.imread(os.path.join(images_in_folder, pose_class_name, image_name))
      input_frame = cv2.cvtColor(input_frame, cv2.COLOR_BGR2RGB)

      # Initialize fresh pose tracker and run it.
      with mp_pose.Pose(static_image_mode=True, min_detection_confidence=0.5) as pose_tracker:
        result = pose_tracker.process(image=input_frame)
        pose_landmarks = result.pose_landmarks
      
      # Save image with pose prediction (if pose was detected).
      output_frame = input_frame.copy()
      if pose_landmarks is not None:
        mp_drawing.draw_landmarks(
            image=output_frame,
            landmark_list=pose_landmarks,
            connections=mp_pose.POSE_CONNECTIONS)
      output_frame = cv2.cvtColor(output_frame, cv2.COLOR_RGB2BGR)
      cv2.imwrite(os.path.join(images_out_folder, pose_class_name, image_name), output_frame)
      
      # Save landmarks.
      if pose_landmarks is not None:
        # Check the number of landmarks and take pose landmarks.
        assert len(pose_landmarks.landmark) == 33, 'Unexpected number of predicted pose landmarks: {}'.format(len(pose_landmarks.landmark))
        pose_landmarks = [[lmk.x, lmk.y, lmk.z] for lmk in pose_landmarks.landmark]

        # Map pose landmarks from [0, 1] range to absolute coordinates to get
        # correct aspect ratio.
        frame_height, frame_width = output_frame.shape[:2]
        pose_landmarks *= np.array([frame_width, frame_height, frame_width])

        # Write pose sample to CSV.
        pose_landmarks = np.around(pose_landmarks, 5).flatten().astype(np.str).tolist()
        csv_out_writer.writerow([image_name] + pose_landmarks)


Bootstrapping  burglary
100%|██████████| 20/20 [00:04<00:00,  4.56it/s]
Bootstrapping  normal
100%|██████████| 20/20 [00:04<00:00,  4.11it/s]


Now look at the output images with predicted Pose and remove those you are not satisfied with from the output CSV. Wrongly predicted poses will affect accuracy of the classification.

## Step 3: Manual filtration

Please manually verify predictions and remove samples (images) that has wrong pose prediction. Check as if you were asked to classify pose just from predicted landmarks. If you can't - remove it.

Align CSVs and image folders once you are done.

In [31]:
  def align_images_and_csvs(print_removed_items=False):
    """Makes sure that image folders and CSVs have the same sample.
    
    Leaves only intersetion of samples in both image folders and CSVs.
    """
    for pose_class_name in pose_class_names:
      # Paths for the pose class.
      images_out_pose_folder = os.path.join(images_out_folder, pose_class_name)
      csv_out_path = os.path.join(csvs_out_folder, pose_class_name + '.csv')

      # Read CSV into memory.
      rows = []
      with open(csv_out_path) as csv_out_file:
        csv_out_reader = csv.reader(csv_out_file, delimiter=',')
        for row in csv_out_reader:
          rows.append(row)
            # Image names left in CSV.
      image_names_in_csv = []

      # Re-write the CSV removing lines without corresponding images.
      with open(csv_out_path, 'w') as csv_out_file:
        csv_out_writer = csv.writer(csv_out_file, delimiter=',', quoting=csv.QUOTE_MINIMAL)
        for row in rows:
          image_name = row[0]
          image_path = os.path.join(images_out_pose_folder, image_name)
          if os.path.exists(image_path):
            image_names_in_csv.append(image_name)
            csv_out_writer.writerow(row)
          elif print_removed_items:
            print('Removed image from CSV: ', image_path)

      # Remove images without corresponding line in CSV.
      for image_name in os.listdir(images_out_pose_folder):
        if image_name not in image_names_in_csv:
          image_path = os.path.join(images_out_pose_folder, image_name)
          os.remove(image_path)
          if print_removed_items:
            print('Removed image from folder: ', image_path)


In [32]:
align_images_and_csvs()