## NTU Deep Learning Week Hackathon - 2024

This notebook goes into depth about the backend services and technologies for TuneIn.

### Music Generation

### Video Generation

ComfyUI workflow: https://www.youtube.com/@Ai_Davos/videos

The idea behind this is to make perhaps more entertaining by adding on a visual element. This will help with spreading among social media algorithms.

To achieve this, we want to have a dance choreography that is similar to that of the music that is generated, or at least similar to it. Here, we leverage Stanford Edge, https://github.com/Stanford-TML/EDGE, where we can generate

Let's test it out:

In [2]:
# Output Directories
output_folder = "custom_music"
motion_folder = "SMPL-to-FBX/motions"

In [4]:
urllist = [
    "https://www.youtube.com/watch?v=nsXwi67WgOo",
    "https://www.youtube.com/watch?v=HCq1OcAEAm0",
]

In [8]:
for url in urllist:
    !youtube-dl --extract-audio --audio-format wav --audio-quality 0 --output "{output_folder}/%(id)s.%(ext)s" "{url}"

[youtube] nsXwi67WgOo: Downloading webpage
[youtube] nsXwi67WgOo: Downloading player c48a9559
[dashsegments] Total fragments: 1
[download] Destination: custom_music/nsXwi67WgOo.webm
[K[download] 100% of 2.20MiB in 00:00.48MiB/s ETA 00:002
[ffmpeg] Destination: custom_music/nsXwi67WgOo.wav
Deleting original file custom_music/nsXwi67WgOo.webm (pass -k to keep)
[youtube] HCq1OcAEAm0: Downloading webpage
[youtube] HCq1OcAEAm0: Downloading player 9bb09009
[dashsegments] Total fragments: 1
[download] Destination: custom_music/HCq1OcAEAm0.webm
[K[download] 100% of 3.56MiB in 00:00.09MiB/s ETA 00:004
[ffmpeg] Destination: custom_music/HCq1OcAEAm0.wav
Deleting original file custom_music/HCq1OcAEAm0.webm (pass -k to keep)


We use the above two songs as a test to seee if we can effectively generate our dances.

## 2. Generate Dances
After the music is downloaded, run the model on the music to process the music and generate dances. Stick figure videos will be saved to `output_folder` and pickle files of the motions will be saved to `motion_folder`

In [5]:
!python test.py --music_dir "{output_folder}"/ --save_motions --motion_save_dir "{motion_folder}"

Computing features for input music
Slicing custom_music/HCq1OcAEAm0.wav
Computing features for custom_music/HCq1OcAEAm0.wav
  0%|                                                    | 0/84 [00:00<?, ?it/s]Importing jukebox and associated packages...
Setting up the VQ-VAE...
Loading vqvae in eval mode
Setting up the top prior...
Loading artist IDs from /opt/conda/lib/python3.10/site-packages/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /opt/conda/lib/python3.10/site-packages/jukebox/data/ids/v2_genre_ids.txt
Level:2, Cond downsample:None, Raw to tokens:128, Sample length:1048576
Converting to fp16 params
Loading prior in eval mode
Loading the top prior weights into memory...

  0%|                                                   | 0/872 [00:00<?, ?it/s][A
  2%|▊                                        | 16/872 [00:00<00:05, 156.27it/s][A
  9%|███▌                                     | 76/872 [00:00<00:03, 199.62it/s][A
 16%|██████▏                                 | 136/

This script automatically generates a 3D image. However there are two problems. One is that it is in 3D, and so it is difficult to use it to generate 'realistic' video. Additionally, another problem is that it doesn't follow a specific format. Thus, we'll convert this into OpenPose format.

In [172]:
import numpy as np

with open('/home/poses.npy', 'rb') as f:
    poses = np.load(f)

Here we'll try to map the 3-d video and cut it down into a front-facing 2d video.

In [173]:
smpl_parents = [
	-1,
	0,
	0,
	0,
	1,
	2,
	3,
	4,
	5,
	6,
	7,
	8,
	9,
	9,
	9,
	12,
	13,
	14,
	16,
	17,
	18,
	19,
	20,
	21,
]

smpl_offsets = np.array([
	[0.0, 0.0, 0.0],
	[0.05858135, -0.08228004, -0.01766408],
	[-0.06030973, -0.09051332, -0.01354254],
	[0.00443945, 0.12440352, -0.03838522],
	[0.04345142, -0.38646945, 0.008037],
	[-0.04325663, -0.38368791, -0.00484304],
	[0.00448844, 0.1379564, 0.02682033],
	[-0.01479032, -0.42687458, -0.037428],
	[0.01905555, -0.4200455, -0.03456167],
	[-0.00226458, 0.05603239, 0.00285505],
	[0.04105436, -0.06028581, 0.12204243],
	[-0.03483987, -0.06210566, 0.13032329],
	[-0.0133902, 0.21163553, -0.03346758],
	[0.07170245, 0.11399969, -0.01889817],
	[-0.08295366, 0.11247234, -0.02370739],
	[0.01011321, 0.08893734, 0.05040987],
	[0.12292141, 0.04520509, -0.019046],
	[-0.11322832, 0.04685326, -0.00847207],
	[0.2553319, -0.01564902, -0.02294649],
	[-0.26012748, -0.01436928, -0.03126873],
	[0.26570925, 0.01269811, -0.00737473],
	[-0.26910836, 0.00679372, -0.00602676],
	[0.08669055, -0.01063603, -0.01559429],
	[-0.0887537, -0.00865157, -0.01010708],
])

In [224]:
old_joints = np.array(["root",  "lhip",  "rhip",  "belly", "lknee", "rknee", "spine", "lankle","rankle","chest", "ltoes", "rtoes", "neck",  "linshoulder", "rinshoulder", "head", "lshoulder", "rshoulder", "lelbow", "relbow",  "lwrist", "rwrist", "lhand", "rhand"])
new_joints = np.array(['neck', 'lhip',	'rhip',	'lknee', 'rknee',	'ltoes',	'rtoes',	'lshoulder',	'rshoulder',	'lelbow',	'relbow',	'lwrist', 'rwrist', 'head'])

# Step 1: Sort the first array and remember the original indices
sorted_indices = np.argsort(old_joints)
sorted_arr1 = old_joints[sorted_indices]

# Step 2: Find positions of the second array's elements in the sorted first array
positions = np.searchsorted(sorted_arr1, new_joints)

# Step 3: Map these positions back to the original indices
new_idx = sorted_indices[positions]
new_idx = new_idx.tolist()
print(new_idx)

[12, 1, 2, 4, 5, 10, 11, 16, 17, 18, 19, 20, 21, 15]


In [227]:
smpl_joints = [
	'neck', # 0
	'lhip', # 1
	'rhip', # 2
	'lknee', # 3
	'rknee', # 4
	'ltoes', # 5
	'rtoes', # 6
'lshoulder', # 7
	'rshoulder', # 8
	'lelbow', # 9
	'relbow', # 10
	'lwrist', # 11
	'rwrist', # 12
	'head',
]

smpl_parents = [
	-1,
	0,
	0,
	1,
	2,
	3,
	4,
	0,
	0,
	7,
	8,
	9,
	10,
	0
]

In [282]:
import numpy as np
import cv2
import os

# Assuming poses, smpl_offsets, and other required variables are initialized elsewhere

num_joints = 24  # Adjust based on your actual model
images_paths = []

new_poses = poses[:, new_idx, :]
new_offset = smpl_offsets[new_idx]

marker = {
	'neck': 'FF5500',
	'lhip': '0055FF',
	'rhip': '00FFAA',
	'lknee': '0000FF',
	'rknee': '00FFFF',
	'ltoes': '5500FF',
	'rtoes': '00AAFF',
	'lshoulder': '55FF00',
	'rshoulder': 'FFAA00',
	'lelbow': '00FF00',
	'relbow': 'FFFF00',
	'lwrist': '00FF55',
	'rwrist': 'AAFF00',
	'head': 'FF0000',
	'leye': 'FF00FF',
	'lear': 'FF0055',
	'reye': 'AA00FF',
	'rear': 'FF00AA'
}

bone = {
	'lhip': '009999',
	'rhip': '009900',
	'lknee': '006699',
	'rknee': '009933',
	'ltoes': '003399',
	'rtoes': '009966',
	'lshoulder': '993300',
	'rshoulder': '990000',
	'lelbow': '669900',
	'relbow': '996600',
	'lwrist': '339900',
	'rwrist': '999900',
	'head': '000099',
	'leye': '990099',
	'lear': '990066',
	'reye': '330099',
	'rear': '660099'
}


# Function to draw the skeleton
def dfs(u, par, num, img):
	if par != -1:
		# Convert color hex to BGR

		line_color = tuple(int(bone[smpl_joints[u]][i:i+2], 16) for i in (4, 2, 0))
		# Draw line for bone
		cv2.line(img,
				 (int((new_offset[par, 0] + new_poses[num, par, 0] + 1.5) * 512 / 3), 100 + 768 - int((new_offset[par, 2] + new_poses[num, par, 2]) * 768 / 3)),
				 (int((new_offset[u, 0] + new_poses[num, u, 0] + 1.5) * 512 / 3), 100 + 768 - int((new_offset[u, 2] + new_poses[num, u, 2]) * 768 / 3)),
				 line_color, 2)
		
	joint_color = tuple(int(marker[smpl_joints[u]][i:i+2], 16) for i in (4, 2, 0))
	# Draw joint
	cv2.circle(img,
				(int((new_offset[u, 0] + new_poses[num, u, 0] + 1.5) * 512 / 3), 100 + 768 - int((new_offset[u, 2] + new_poses[num, u, 2]) * 768 / 3)),
				3, joint_color, -1)

	if smpl_joints[u] == 'head':
		cv2.line(img,
				(int((new_offset[par, 0] + new_poses[num, par, 0] + 1.5) * 512 / 3), 100 + 768 - int((new_offset[par, 2] + new_poses[num, par, 2]) * 768 / 3)),
				(int((new_offset[u, 0] + new_poses[num, u, 0] + 1.5) * 512 / 3), 100 + 768 - int((new_offset[u, 2] + new_poses[num, u, 2]) * 768 / 3)),
				line_color, 2)
		
		face_points = {
			'reye': int((new_offset[u, 0] + new_poses[num, u, 0] + 1.5 - 0.07) * 512 / 3), 
			'rear': int((new_offset[u, 0] + new_poses[num, u, 0] + 1.5 - 0.12) * 512 / 3), 
			'leye': int((new_offset[u, 0] + new_poses[num, u, 0] + 1.5 + 0.07) * 512 / 3), 
			'lear': int((new_offset[u, 0] + new_poses[num, u, 0] + 1.5 + 0.12) * 512 / 3),
		}
		names = ['leye', 'lear', 'reye', 'rear']

		par_y_point = 100 + 768 - int((new_offset[u, 2] + new_poses[num, u, 2]) * 768 / 3)
		y_point = 100 + 768 - int((new_offset[u, 2] + new_poses[num, u, 2] + 0.05) * 768 / 3)

		for name in names:


			joint_color = tuple(int(marker[name][i:i+2], 16) for i in (4, 2, 0))
			line_color = tuple(int(bone[name][i:i+2], 16) for i in (4, 2, 0))
			fp = face_points[name]
			# Draw joint

			if name in ['leye', 'reye']:
				px, py = (int((new_offset[u, 0] + new_poses[num, u, 0] + 1.5) * 512 / 3), 100 + 768 - int((new_offset[u, 2] + new_poses[num, u, 2]) * 768 / 3))
			elif name == 'rear':
				px = face_points['reye']
				py = y_point
			else:
				px = face_points['leye']
				py = y_point

			cv2.line(img,
				(px, py),
				(fp, y_point),
				line_color, 2)

			cv2.circle(img,
					(fp, y_point),
					3, joint_color, -1)

		for d in [-0.07, 0.07]:
			# Assuming img, new_offset, new_poses, and y_point are defined as per your context
			# Example parameters for the ellipse
			center_x = int((new_offset[u, 0] + new_poses[num, u, 0] + 1.5 - d) * 512 / 3)
			center_y = y_point - 5
			axes_length = (4, 2)  # Major and minor axes lengths

			# Draw the ellipse (eye)
			cv2.ellipse(img, (center_x, center_y), axes_length,
						0,  # Rotation angle
						0,  # Starting angle
						360,  # Ending angle
						(255, 255, 255),  # Color (white)
						1)  # Thickness (-1 for filled)

			# Calculate points around the ellipse for placing circles
			angles = np.linspace(0, 2 * np.pi, 7)[:-1]  # Exclude the last point as it's the same as the first
			circle_radius = 2  # Radius of the circles to be drawn

			for angle in angles:
				# For points on the ellipse: x = a * cos(t), y = b * sin(t), where a and b are the semi-axes lengths
				# Adjusting for the center of the ellipse
				x = center_x + axes_length[0] * np.cos(angle)
				y = center_y + axes_length[1] * np.sin(angle)

				# Draw circles at calculated positions
				cv2.circle(img, (int(x), int(y)), circle_radius, (255, 255, 255), -1)  # Color (red) for visibility

	idx = np.where(pars == u)[0]
	for c in idx:
		if not vis[c]:
			dfs(c, u, num, img)

# Creating a folder to save the images
image_dir = "/home/Gen/opencv"  # Update to your path
if not os.path.exists(image_dir):
	os.makedirs(image_dir)

for num in range(200):  # Generate images for each num
	# Create an empty black image
	img = np.zeros((768, 512, 3), dtype=np.uint8)

	# Initialize visited array for DFS
	vis = np.zeros(num_joints, dtype=bool)

	# Call DFS to draw the skeleton on the image
	dfs(0, -1, num, img)

	# Save the image
	image_path = f"{image_dir}/{num}.png"
	cv2.imwrite(image_path, img)
	images_paths.append(image_path)

images_paths

['/home/Gen/opencv/0.png',
 '/home/Gen/opencv/1.png',
 '/home/Gen/opencv/2.png',
 '/home/Gen/opencv/3.png',
 '/home/Gen/opencv/4.png',
 '/home/Gen/opencv/5.png',
 '/home/Gen/opencv/6.png',
 '/home/Gen/opencv/7.png',
 '/home/Gen/opencv/8.png',
 '/home/Gen/opencv/9.png',
 '/home/Gen/opencv/10.png',
 '/home/Gen/opencv/11.png',
 '/home/Gen/opencv/12.png',
 '/home/Gen/opencv/13.png',
 '/home/Gen/opencv/14.png',
 '/home/Gen/opencv/15.png',
 '/home/Gen/opencv/16.png',
 '/home/Gen/opencv/17.png',
 '/home/Gen/opencv/18.png',
 '/home/Gen/opencv/19.png',
 '/home/Gen/opencv/20.png',
 '/home/Gen/opencv/21.png',
 '/home/Gen/opencv/22.png',
 '/home/Gen/opencv/23.png',
 '/home/Gen/opencv/24.png',
 '/home/Gen/opencv/25.png',
 '/home/Gen/opencv/26.png',
 '/home/Gen/opencv/27.png',
 '/home/Gen/opencv/28.png',
 '/home/Gen/opencv/29.png',
 '/home/Gen/opencv/30.png',
 '/home/Gen/opencv/31.png',
 '/home/Gen/opencv/32.png',
 '/home/Gen/opencv/33.png',
 '/home/Gen/opencv/34.png',
 '/home/Gen/opencv/35.png',
 '

In [240]:
from PIL import Image

# Create a GIF from the saved images
gif_path = "/home/smpl_animation.gif"
frames = [Image.open(image_path) for image_path in images_paths]
frame_one = frames[0]
frame_one.save(gif_path, format="GIF", append_images=frames,
               save_all=True, duration=20, loop=0)

gif_path

'/home/smpl_animation.gif'

<img src="smpl_animation.gif">

Once we have the OpenPose details, we moved over to ComfyUI to make the flow, as well as generate an API.
Specifically, we use the above OpenPose images and combine it with AnimateAnyone, as well as ControlNets.

<img src="comfy.PNG" />

<img src="comfy2.PNG" />

This is then transformed into a REST API Endpoint, for us to ping with the relevant information.