# PosePipeline : Getting Started

### General Description
The following notebook will help guide and explain PosePipeline which is a pose estimation framework that uses computer vision to detect and track the posture of a person in an image or video. It is assumed that the input for the video was from a single camera (monocular). When working with pose estimation you can approach this with either top-down or bottom-up techniques.
Top-down, first detects and locates all the people in the image (Tracking via bounding box), then extracts the pose (key points) for the person of interest. Bottom-up consists of locating all the key points first then associating them with the person of interest. A downside of top-down approaches is that the pose estimation depends a lot on the performance of the detection model. In a crowded scene, when person detection fails, pose estimation will fail, so perhaps top-down might not be as well suited for crowded multi-person scenes. 

This tutorial notebook will walk through some basic DataJoint operations, video import, and will process videos to obtain 3D keypoints for the person you are analyzing. 


## 0.  Initialization

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import json
from datetime import datetime
import io
import os
import sys
import cv2

from IPython.display import Video as JupyterVideo
from IPython.display import HTML
from IPython.display import display
from os import system, name

os.environ["CUDA_VISIBLE_DEVICES"] = '1'        # Set the GPU to use for computation 
os.environ['PYOPENGL_PLATFORM'] = 'osmesa'      # Set the OpenGL platform to use for rendering. omesa is a backend for OpenGL that doesn't require a display. This is useful for running on a server without a display. 

import datajoint as dj
dj.config['display.limit'] = 50                 # Sets the limit for the number of rows to display when running a query, adjust as needed

from pose_pipeline import *
from pose_pipeline.utils.jupyter import play, play_grid


## 1.  Introduction to DataJoint

<details>
<summary>Click to Expand DataJoint Description</summary>
DataJoint is an open-source data mangement framework designed for scientific workflows. It helps researchers organize, process, and query complex data pipelines using Python (or MATLAB) with a clean, relational database backend like MySQL or PostgreSQL.

It is highly reccommneded to look over the DataJoint Documentation: https://datajoint.com/docs/

At its core, DataJoint: Structures your data into tables called schemas (relations) that represent experiments, results, or processing steps. Tracks dependencies between steps in your pipeline. Makes it easy to reproducibly populate, update, and query data — especially when dealing with many subjects, sessions, or models. Instead of manually managing files and folders, DataJoint schemas (a set of linked tables) to ensure data integrity, transparency, and scalability.

DataJoint is a relational database framework and in PosePipe organizes data into three key table (schemas) types. 
(1)	Lookup Tables: Stores predefined, static values used for standardization
(2)	Manual Tables: Contain user-inserted data that links method or experiment details. 
(3)	Computed Tables: ‘Automatically’ generate results by processing upstream data. These tables rely on the populate() command to execute and store results. 
</details> 

### 1.1.  List All Schemas in PosePipe

In [None]:
schemas = dj.list_schemas()
print('The comprehensive list of all the Schemas in the database:')
display(schemas)


schema = dj.schema('pose_pipeline')
print('The tables in the PosePipeline Schema include:')
display(schema.list_tables())

### 1.2. View the Schemas as Diagram
Diagrams are a great way to visualize the pipeline and understand the flow of data. Diagrams witin DataJoint are based on entitiy relationship diagrams (ERD). 
Each node is a table. Arrows show dependencies between tables (e.g., foreign key relationships). 

In [None]:
# display the ERD diagram
# Recall schema = dj.Schema('pose_pipeline')

diagram = dj.Diagram(schema)

diagram.label = 'plain'                                             # Sets the label to plain for a simpler diagram

filename = 'pose_pipeline_schema_diagram.png'                       # Sets the filename for the diagram image
dj.Diagram(schema).save(filename)                                   # saves PNG image file of diagram in your current working directory
Image(filename)                                                     # Displays the image of the diagram

### 1.3.  Viewing a table definition 

In [None]:
print('The table definition for the Video is:\n')
print(Video.describe())
print()

print('The table definition for the TopDownMethodLookup is:\n')
print(TopDownMethodLookup.describe())
print()

print('The table definition for the TopDownPerson is:\n')
print(TopDownPerson.describe())
print()

<details>
<summary>Click to Expand Output Explanation</summary>
Viewing a schema's/ table's definition will tell you what data is in the table. If working in VS code you can also <Ctrl + click> on Video in the code cell and it will take you to the tables definition block. 

Breaking down the output:
- Hashtag (#) followed by string is the description of the table
- Three consecutive dashes (---) seperates primary keys (above the dashes) from depedent attributes (after the dashes)
- Dash with greater than sign (->) indicates a foreign key dependency on another table - meaning that TopDownPerson inherits the primary keys of TopDownMethod
</details> 

### 1.4. View a table

In [None]:
VideoInfo()

In [None]:
dj.config['display.limit'] = 5

display(VideoInfo())

### 1.5. Querying and Filtering Data from a Table


<details>
<summary>Click to Expand Explanation</summary>
DataJoint uses 'fetch' to query or pull data from tables. Below are some examples using this built in keyword. 
The '&' symbol in DataJoint is used to perform a restriction - like applying a filter to a table or query
- (example 1) When fetch is used with the word KEY (has to all be capitalized) it pulls all the primary key fields from the table you specify 
- (example 2) How to fetch unique fields from a table as a numpy array
- (example 3) How to filter a table by a specific attribute, in this case by project
</details> 

In [None]:
# (example 1)
primary_keys = Video.fetch('KEY')
print('The primary keys for the Video table are:')
display(primary_keys[1])                                                # Displays only the first primary key in the list

In [None]:
# (example 2)
np.unique(Video.fetch('video_project'))

In [None]:
# (example 3)
proj_filt = {'video_project': 'gymnastics_TEST'}

Video & proj_filt

## 2.  PosePipeline: Raw Video Import

<details>
<summary>Click to Expand Explanation</summary>
Assumes monocular video input. The first step in the process is to upload and import your videos. PosePipeline relies on DataJoint (SQL database) to store and process these videos. 
</details> 

### 2.1. Import Video(s)

In [None]:
from pose_pipeline.utils.video_format import insert_local_video

videos_path = '/mnt/CottonLab/datasets/gymnastics/'      # Path to the videos, adjust as needed
files = os.listdir(videos_path)                          # List of all the files in the directory

for f in files: 
    insert_local_video(f, datetime.now(), os.path.join(videos_path, f), video_project='gymnastics_TEST', skip_duplicates=True)

### 2.2. Check that the Videos Table has been filled

In [None]:
proj_filt = {'video_project': 'gymnastics_TEST'}
Video & proj_filt

### 2.3.  Populate the VideoInfo Table
Populate is a keyword in datajoint that fills downstream tables. In this case VideoInfo is a computed table that inherits from the Video table. 

VideoInfo.populate(proj_filt)
VideoInfo & proj_filt

### 2.4. View a raw video

In [None]:
JupyterVideo('/mnt/CottonLab/datasets/gymnastics/' + 'gymnastics_test_1.mp4', embed=True)

## 3. PosePipeline: Bottom-Up Methods

<details>
<summary> Click to Expand Explanation</summary>
After you have uploaded your videos, a typical processing step is to select and run a bottom-up approach. Recall that Bottom-Up first finds all the keypoints in the image then associates them with a bounding box.  The following takes video(s) from the Video table and links them with a chosen bottom-up approach (e.g., OpenPose, MMPose, Bridging_OpenPose). BottomupPeople runs pose estimation by extracting and storing 2D keypoints (pixel locations). Optionally you can populate BottomUpVideo which takes the keypoints from BottomUpPeople and overlays them onto the video. 

To produce most overlay videos in this framework, you must first populate the BlurredVideo Table which protects human subjects' identities. BlurredVideo is dependent on first having keypoints for the face to know where to apply the blur to so BottomUpPeople must be populated. 
</details>

### 3.1.  Creating Keys to Process Images

<details>
<summary>Click to Expand Explanation of 3.1.</summary>
To run (populate) BottomUpPeople you will need to specify the bottom-up method that you want to use. To acomplish this you will need to insert the proper rows into the BottomUpMethod table (Mannual Table). 
To do this you need to specify the primary keys (PK) for that table. Below is an example of how you may do this. First becuase BottomUpMethod inherits the PKs from Video which are 'video_project' and 'filename' we can just 'grab' those keys by using the special keyword 'KEY'. Then from here you can simply insert the name of the method you want to run. 
</details>

In [None]:
# ''Grab' PKs from the Video table given by the 'KEY' in all caps. 
video_keys = (Video & proj_filt).fetch('KEY')
display(video_keys)

for v in video_keys:
    v["bottom_up_method_name"] = "Bridging_OpenPose"        # Set the method name that you want to use
    print(v)
    BottomUpMethod.insert1(v, skip_duplicates=True)

BottomUpMethod() & proj_filt

### 3.2.  Populate BottomUpPeople Table

In [None]:
# Optionally: Populate BottomUpPeople
BottomUpPeople.populate(proj_filt)      # Runs BottomUpPeople for the specified filter

BottomUpPeople() & proj_filt            # Displays the populated BottomUpPeople table

<details>
<summary>Click to Expand for Additional Info </summary>
BottomUpPeople will run wihout having to define what is referred to here as a 'filter_skeleton'. BottomUpPeople produces a list of 2D keypoints and for the method of Bridging_OpenPose this specific method can optionally take in a 'filter_skeleton'. This is a skeleton that the user chooses and is responsible for selecting certain keypoints from the full list of output keypoints that belong to a specific skeleton format. Different datasets use different joint conventions (e.g., COCO, SMPL, etc). The other place where the skelton is directly used is in the BottomUpBridgingVideoLookup Table in which you must specify the skeleton to visualize the video. 
</details>

In [None]:
skeleton_keys = []  # Create an empty list to store modified dictionaries

for sk in video_keys:
    sk_copy = sk.copy()  # Make a copy to avoid modifying the original
    sk_copy["skeleton"] = "bml_movi_87"
    skeleton_keys.append(sk_copy)  # Append to the list

display(skeleton_keys) 

### 3.3.  Populate BlurredVideo Table

In [None]:
BlurredVideo.populate(proj_filt)

BlurredVideo() & proj_filt

### 3.4.  Populate BottomUpBridging Table

<details>
<summary> Click to Expand Explanation</summary>
BottomUpBridging is a type of bottom-up method that follows a slightly different workflow than other bottom-up methods because it ‘bridges’ (integrates) pose estimation with tracking and additional refinement. Unlike BottomUpPeople, which just extracts keypoints per frame, BottomUpBridging links these keypoints over time to (1) assign consistent IDs to people across frames, (2) estimates 3D keypoints instead of just 2D, and (3) tracks movement more robustly. 
If you want to see more granularly how this works go into bridging.py. The bridging_formats_bottom_up function in this sub-module processes each frame of the video and extracts bounding boxes, 2D keypoints, 3D keypoints, and the keypoint noise using the MeTRAbs model. The MeTRAbs (Metric-Scale Trained Regression for Absolute 3D Human Pose Estimation) is a deep learning-based model. A key challenge is that different frames may detect new people or lose track of previously detected people. 
</details>

In [None]:
from pose_pipeline.pipeline import BottomUpBridging,BottomUpBridgingPerson
BottomUpBridging.populate(proj_filt)    
BottomUpBridging & proj_filt            

### 3.5.  Indexing and Extracting Data From Table

In [None]:
data = (BottomUpBridging & proj_filt).fetch('boxes', 'keypoints2d', 'keypoints3d')

boxes, keypoints_2d, keypoints_3d = data

print( np.array(boxes))

In [None]:
first_tbox = boxes[0][0]        # Extract first detected bounding box in the first frame
# The bounding box is represented as [x1, y1, x2, y2, confidence_score]
print(first_tbox)

confidence_score = first_tbox[0,4]          # Confidence score of the first detected bounding box
print("Confidence Score:", confidence_score)

x_min,y_min,width,height,confidence = first_tbox[0]
print(x_min,y_min,width,height)

### 3.6.  Visualize the Extracted Data

In [None]:
# Visualize some of the data - Manually 

# Read the first frame of the video
cap = cv2.VideoCapture('/mnt/CottonLab/datasets/gymnastics/' + 'gymnastics_test_1.mp4')  # Replace with the actual path
ret, frame = cap.read()
cap.release()

# Convert frame to RGB
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

x_min, y_min, width, height = int(x_min), int(y_min), int(width), int(height)
cv2.rectangle(frame_rgb, (x_min, y_min), (x_min + width, y_min + height), (0, 255, 0), 2)

# Show the image with bounding boxes
plt.imshow(frame_rgb)
plt.axis("off")
plt.title("Bounding Box on Frame 1")
plt.show()

In [None]:
# Visualize some of the data - Automatically
from pose_pipeline.pipeline import BottomUpBridgingVideo
BottomUpBridgingVideo.populate(skeleton_keys)
BottomUpBridgingVideo() & proj_filt
BottomUpBridgingVideo() & skeleton_keys

video_path = (BottomUpBridgingVideo() & skeleton_keys).fetch("output_video")
print(video_path)

# Display the video in Jupyter Notebook
# Uncomment the following lines to display the video in Jupyter Notebook
# from IPython.display import Video as JupyterVideo
# JupyterVideo(video_path)

## 4. Tracking 

<details>
<summary>Click to Expand Explanation </summary>
Up until this point you have likely run or populated BottomUpBriding and or  BottomUpPeople. These methods have now provided you with boxes, 2D and 3D keypoints and 2D keypoints, respectively for all the people 'seen' in your videos. If your videos contain multiple people it is likely that you are interested in one particular subject. The following steps function to isolate this subject of interest for further analysis. To acomplish this you will need to run Tracking. 
</details>

In [None]:
from pose_pipeline.pipeline import TrackingBbox, TrackingBboxMethod, TrackingBboxMethodLookup

### 4.1. Create Tracking Keys

In [None]:
# select the tracking method you want to use via TrackingBboxMethod table (this is a mannual table)

tracking_method = (TrackingBboxMethodLookup & 'tracking_method_name="MMDet_deepsort"').fetch1('tracking_method')
print('The tracking method is: ', tracking_method)

tracking_keys = (Video & proj_filt).fetch('KEY')
display(tracking_keys)

for key in tracking_keys:
    key["tracking_method"] = tracking_method
    print(key)
    TrackingBboxMethod.insert1(key, skip_duplicates=True)


In [None]:
TrackingBboxMethod() & proj_filt
TrackingBboxMethodLookup & 'tracking_method=8'

### 4.2.  Run Tracking

In [None]:
TrackingBbox.populate(tracking_keys)

TrackingBbox() & proj_filt

In [None]:
# View the video with the tracks overlaid by populating the TrackingBboxVideo Table (BlurredVideo has to be populated first)
TrackingBboxVideo.populate(proj_filt)

TrackingBboxVideo & proj_filt

### 4.3.  Working with the Annotations GUI

<details>
<summary>Click to Expand Explanation</summary>
After populating the TrackingBbox table, if dealing with a multi-person scene and or if you have detections that are not relevant you will need to select the tracking bounding boxes that contain the person that you are intereseted in analyzing. To do this, you will first visually identify the person you wnat and select ALL the tracks that locate them throughout the video. For internal users: follow the link below to access the annotation GUI.

http://jc-compute01.ric.org:8505/

The selected tracks are automatically populated into the PerpsonBboxValid table.
</details>

In [None]:
from pose_pipeline.pipeline import PersonBboxValid
# Populated from annotations GUI
PersonBboxValid & proj_filt

### 4.4.  Extract and view the valid tracks

In [None]:
tracks = (PersonBboxValid & proj_filt).fetch('keep_tracks')
display(tracks)

### 4.5. Populate PersonBbox

<details>
<summary>Click to Expand Explanation</summary>
PersonBboxValid contains all the selected tracks/boxes that correspond to your person of interest. Next, you will populate PersonBbox which combines all the valid bboxes into a single bbox for the entire video. 
</details>

In [None]:
from pose_pipeline.pipeline import PersonBbox
PersonBbox.populate(proj_filt)
PersonBbox & proj_filt

###  4.6. Assess the quality of the tracking

<details>
<summary>Click to Expand Explanation</summary>
Up to this point, you have tracked all people in the scene, selected which bounding boxes are valid or in other words the bounding box(es) that are associated with the subject of interest. DetectedFrames is an optional table that provides insight on the tracking quality. 
Depending on the method you want you can choose top-down or bottom-up. Because we already did part of BotomUpBridging you could populate BottomUpBridgingPerson which essentially filters out the other people in the scene and associates bounding boxes with motion keypoints. On the other hand you can choose to continue with a different top-down approach. 
</details>

In [None]:
from pose_pipeline.pipeline import DetectedFrames

DetectedFrames.populate(proj_filt)
DetectedFrames & proj_filt

In [None]:
BottomUpBridgingPerson.populate(proj_filt)

## 5.  Top-Down Methods

<details>
<summary>Click to Expand Explanation</summary>
You may want to run Top-Down approaches as well. This could be because you want to test out some different tracking methods if the tracking performed in BottomUpBridging (OpenPose_Bridging) was not super successful. 
</details>

### 5.1.  Create Top-Down Method key(s)

In [None]:
top_down_keys = (PersonBbox & proj_filt).fetch('KEY')
display(top_down_keys)

for td in top_down_keys:
    td["top_down_method"] = top_down_method
    TopDownMethod.insert1(td, skip_duplicates=True)

display(TopDownMethod() & proj_filt)

top_down_keys = (TopDownMethod & proj_filt).fetch('KEY')
display(top_down_keys)

### 5.2. Populate the Top-Down Table

In [None]:
TopDownPerson.populate(proj_filt)
TopDownPerson() & proj_filt

## 6. Lifting

<details>
<summary>Click to Expand Explanation</summary>
The goal of lifting is to take in your 2D keypoints and produce them in 3D. To run lifting you’ll follow the same general principles as previous steps. First you’ll select your lifting method. For this example I will use the ‘Bridging_method_name = “Bridging_bml_movi_87”. You’ll create the key(s) for this specific method by inheriting the keys from TopDownPerson then add your lifting method to the key(s). From here you can check that you have defined all your PKs and populate LiftingPerson. 
</details>

### 6.1.  Create lifting keys

In [None]:
lifting_keys = (TopDownPerson & proj_filt).fetch('KEY')
display(lifting_keys)

for L in lifting_keys:
    L["lifting_method"] = 12
    LiftingMethod.insert1(L, skip_duplicates=True)

display(LiftingMethod() & proj_filt)

lifting_keys = (LiftingMethod & proj_filt).fetch('KEY')
display(top_down_keys)

### 6.2.  Run Lifting

In [None]:
LiftingPerson.populate(proj_filt)
LiftingPerson() & proj_filt

### View Lifting Video

In [None]:
LiftingPersonVideo.populate(proj_filt)
LiftingPersonVideo() & proj_filt

<details>
<summary>Click to Expand Final Summary</summary>
 You have now reached the end of the tutorial notebook for PosePipeline. You began with raw images or videos that you wanted to analyze. If done correctly, you should be left with 2D and 3D keypoints that have been associated to the person you are interested in analyzing, along with the keypoint confidences. If you are familiar with marker-based motion capture this is similar to reaching the end of data collection where you are left with the 3D marker positions in space. The keypoints that you have. 
</details>