# Forced Phoneme Alignment

This notebook includes the code used to collect alignment data for my project.

## Importing Charsiu

The following five cells are taken from [this](https://colab.research.google.com/github/lingjzhu/charsiu/blob/development/charsiu_forced_alignment_demo.ipynb#scrollTo=GmHNb4OxRVD8) notebook, which describes how to install and use the Charsiu forced alignment package, written by [Jian Zhu](https://lingjzhu.github.io/).

In [None]:
!pip install torch torchvision torchaudio
!pip install datasets transformers
!pip install g2p_en praatio librosa

In [None]:
import os
from os.path import exists, join, expanduser

os.chdir(expanduser("~"))
charsiu_dir = 'charsiu'
if exists(charsiu_dir):
  !rm -rf /root/charsiu
if not exists(charsiu_dir):
  ! git clone -b development https://github.com/lingjzhu/$charsiu_dir
  ! cd charsiu && git checkout && cd -

os.chdir(charsiu_dir)

In [None]:
import sys
import torch
from datasets import load_dataset
import matplotlib.pyplot as plt
sys.path.insert(0,'src')

In [None]:
from google.colab import drive
drive.mount('/content/drive/')

In [None]:
from Charsiu import charsiu_forced_aligner, charsiu_attention_aligner
charsiu = charsiu_forced_aligner(aligner='charsiu/en_w2v2_fc_10ms')

## Setup

The code will need the both the texts and filenames corresponding to each recording.

In [None]:
from tqdm import tqdm
import json

In [None]:
# Prepare texts
TEXTS = [
    "Petey's mom got a CD record of 'Everlong' by Foo Fighters.",
    "Jude said goodbye to his pet goose when he moved to Tuscon, Arizona.",
    "I went to watch a movie instead, and I met my friend there.",
    "My dog thought it was odd that he never got to eat human food.",
    "Who bought my seat at the 3D movie theater?",
    "My job schedule is full – can Bob do Tuesday at the beach instead?"
]

# Prepare audio titles
AUDIOS_1 = ["petey1", "jude1", "movie1", "dog1", "who1", "job1"]
AUDIOS_2 = ["petey2", "jude2", "movie2", "dog2", "who2", "job2"]
AUDIOS_3 = ["petey3", "jude3", "movie3", "dog3", "who3", "job3"]
AUDIOS_4 = ["petey4", "jude4", "movie4", "dog4", "who4", "job4"]

# Prepare audio filenames
FILES_1 = [f"audios/{audio}.wav" for audio in AUDIOS_1]
FILES_2 = [f"audios/{audio}.wav" for audio in AUDIOS_2]
FILES_3 = [f"audios/{audio}.wav" for audio in AUDIOS_3]
FILES_4 = [f"audios/{audio}.wav" for audio in AUDIOS_4]

# Prepare testing conditions
CONDITIONS = [FILES_1, FILES_2, FILES_3, FILES_4]

# Prepare vowels
VOWELS = ["i", "e", "a", "u"]

# Prepare directory name
DIRNAME = "/content/drive/MyDrive/Colab Notebooks/"

## Alignment

The following cell describes the process of alignment. Iterating through each condition and each sentence, it finds the forced phoneme alignment, and saves this data to a json file (alignments.json), with each alignment labeled "c{condition_id}s{sentence_id}".

In [None]:
def get_alignment(condition, sentence):
  c_id = condition - 1
  files = CONDITIONS[c_id]

  s_id = sentence - 1
  text = TEXTS[s_id]

  audio = DIRNAME + files[s_id]
  alignment = charsiu.align(audio=audio, text=text)
  return alignment[0]

def get_all_alignments():
  data = {}
  for c_id in tqdm(range(4)):
    for s_id in range(6):
      label = f"c{c_id}s{s_id}"
      print(label, "done")
      alignment = get_alignment(c_id + 1, s_id + 1)
      data[label] = alignment
  return data


In [None]:
data = get_all_alignments()

In [None]:
with open(DIRNAME + "alignments.json", "w") as outfile:
    json.dump(data, outfile)