## OPEN POSE INSTALLATION

* https://github.com/huggingface/blog/blob/main/controlnet.md
* https://github.com/lllyasviel/ControlNet
* https://blog.etereo.io/detecting-poses-with-openpose-in-google-colab-d591dc8d8609


In [None]:
%pip install opencv-contrib-python librosa numpy matplotlib
%pip install --upgrade diffusers[torch]==0.27.2
%pip install transformers scipy ftfy accelerate
%pip install peft
%pip install ipywidgets

In [None]:
%pip install mediapipe

# Pose Detection with OpenPose

This notebook uses an open source project [CMU-Perceptual-Computing-Lab/openpose](https://github.com/CMU-Perceptual-Computing-Lab/openpose.git) to detect/track multi person poses on a video from your Google Drive

@dinatih update base on https://colab.research.google.com/github/tugstugi/dl-colab-notebooks/blob/master/notebooks/OpenPose.ipynb

## Choose a video from your Google Drive

## Import libraries and instantiate model


In [2]:
from diffusers.utils import load_image
from PIL import Image
import cv2
import numpy as np
import librosa
import os
from controlnet_aux import OpenposeDetector

model = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]



body_pose_model.pth:   0%|          | 0.00/209M [00:00<?, ?B/s]

hand_pose_model.pth:   0%|          | 0.00/147M [00:00<?, ?B/s]

facenet.pth:   0%|          | 0.00/154M [00:00<?, ?B/s]

# Testing

In [9]:
!pwd

/Users/mapisangut/Documents/UPC/project/Generative Art Project/colab


In [25]:
import PIL
import os

project_path = "/Users/mapisangut/Documents/UPC/project/Generative Art Project"
video_path = f"{project_path}/colab/videos"
videos = [f"{video_path}/video{i}.mp4" for i in range(207)]

# Create the 'frames' directory if it doesn't exist
for i in range(207):
  if not os.path.exists(f"{video_path}/video{i}"):
    os.makedirs(f"{video_path}/video{i}")
  if not os.path.exists(f"{video_path}/video{i}/frames"):
    os.makedirs(f"{video_path}/video{i}/frames")
  if not os.path.exists(f"{video_path}/video{i}/images"):
    os.makedirs(f"{video_path}/video{i}/images")


In [26]:
# Loop through each video file
for i, video_file in enumerate(videos):
    # Load the video
    cap = cv2.VideoCapture(video_file)

    # Initialize frame and spectrogram lists
    frames = []
    frames_path = []
    poses = []
    spectrograms = []

    # Get the total duration of the video
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    fps = cap.get(cv2.CAP_PROP_FPS)
    total_duration = total_frames / fps

    # Extract frames and spectrograms
    frame_time = 0  # initialize frame time to 0
    while True:
        ret, frame = cap.read()
        if not ret:
            break

        # Resize frame to 128x128
        resized_frame = cv2.resize(frame, (256, 256))

        print(f"VIDEO {i} - frame resized")
        # Extract audio and convert to spectrogram
        # Load a short segment of audio centered around the current frame

        # Pad the audio appropriately for the first and last frames
        if frame_time < 0.5:
            # For the first frames, load audio from the start and pad the beginning
            padding_duration = 0.5 - frame_time
            y, sr = librosa.load(video_file, sr=None, offset=0, duration=frame_time + 0.5)
            y_padded = np.pad(y, (int(sr * padding_duration), 0), 'constant')

        elif frame_time > total_duration - 0.5:
            # For the last frames, load audio from the end and pad the end
            padding_duration = 0.5 - (total_duration - frame_time)
            audio_offset = frame_time - 0.5
            y, sr = librosa.load(video_file, sr=None, offset=audio_offset, duration= 1- padding_duration)
            y_padded = np.pad(y, (0, int(sr * padding_duration)), 'constant')

        else:
            # For all other frames, load 1 second of audio as before
            y, sr = librosa.load(video_file, sr=None, offset=frame_time - 0.5, duration=1)
            y_padded = y

        win_length = 256  # window length in samples
        hop_length = 64  # hop length in samples
        D = librosa.amplitude_to_db(np.abs(librosa.stft(y_padded, win_length=win_length, hop_length=hop_length)), ref=np.max)

        # Resize spectrogram to 128x128
        D = cv2.resize(D, (256, 256))

        spectrograms.append(D)

        # Save frame and spectrogram to the 'frames' directory using the frame time as the filename
        frame_time_rounded = '{:.3f}'.format(frame_time)
        filename = video_file.split('.')[0]


        ## Save poses and jpg frames
        frame_filename_jpg = f"{filename}/images/{frame_time_rounded}_frame.jpg"
        cv2.imwrite(frame_filename_jpg, frame)
        image = load_image(frame_filename_jpg)
        print(f"VIDEO {i} - extracting pose")

        pose = model(image)
        pose_filename_jpg = f"{filename}/images/{frame_time_rounded}_pose.jpg"
        pose.save(pose_filename_jpg)

        pose_image = cv2.imread(pose_filename_jpg)
        pose_image = cv2.resize(pose_image, (256, 256))
        pose_filename_npy = f"{filename}/frames/{frame_time_rounded}_pose.npy"

        np.save(pose_filename_npy, pose_image)
        print(f"VIDEO {i} - pose saved")


        ## Save frames with npy extension
        frame_filename_npy = f"{filename}/frames/{frame_time_rounded}_frame.npy"
        spectrogram_filename_npy = f"{filename}/frames/{frame_time_rounded}_spectrogram.npy"

        np.save(frame_filename_npy, resized_frame)
        np.save(spectrogram_filename_npy, spectrograms[-1])
        print(f"VIDEO {i} - spectrogram extracted")

        frames.append(resized_frame)
        frame_time += 1 / fps  # increment frame time by the duration of one frame

    cap.release()

VIDEO 0 - frame resized
VIDEO 0 - extracting pose


  y, sr = librosa.load(video_file, sr=None, offset=0, duration=frame_time + 0.5)


VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resi

  y, sr = librosa.load(video_file, sr=None, offset=frame_time - 0.5, duration=1)


VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resi

  y, sr = librosa.load(video_file, sr=None, offset=audio_offset, duration= 1- padding_duration)


VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resized
VIDEO 0 - extracting pose
VIDEO 0 - pose saved
VIDEO 0 - spectrogram extracted
VIDEO 0 - frame resi

KeyboardInterrupt: 