Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes + Channel Selection for CHiME-7 Task #4934

Merged
merged 39 commits into from
Feb 14, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
69fe371
addressing Taejin pointed out issues
popcornell Feb 10, 2023
5bf3c5a
addressing Taejin pointed out issues
popcornell Feb 10, 2023
644e58e
fixed md5sum check on original chime6 script
popcornell Feb 10, 2023
dc770b4
Merge branch 'master' of https://github.com/espnet/espnet
popcornell Feb 10, 2023
445e3ae
adding channel selection
popcornell Feb 11, 2023
b217ed7
revert
popcornell Feb 11, 2023
2518f81
revert
popcornell Feb 11, 2023
721a64a
revert
popcornell Feb 11, 2023
c4a58b2
added skip stages to asr dprep
popcornell Feb 11, 2023
f57c1d5
added flag to generate evaluation
popcornell Feb 11, 2023
447bd9d
addes contain function to data.sh
popcornell Feb 11, 2023
1f106f7
minor changes to run.sh
popcornell Feb 11, 2023
0d84fcb
with pretrained
popcornell Feb 11, 2023
263c36c
data.sh, skipping for decoding only
popcornell Feb 11, 2023
92dedbd
soundfile much faster than torchaudio
popcornell Feb 11, 2023
133dcd6
revised channel selection
popcornell Feb 11, 2023
a594319
applied linters
popcornell Feb 11, 2023
d888a38
applied linters
popcornell Feb 11, 2023
e8dc4d3
added jiwer and conda prefix
popcornell Feb 12, 2023
dd91c90
added dr kamo suggestion
popcornell Feb 12, 2023
17efdb2
changed stage
popcornell Feb 12, 2023
a531c03
better default
popcornell Feb 12, 2023
a412cc4
readme changed instructions
popcornell Feb 12, 2023
d99720d
gss2lhotse changed
popcornell Feb 12, 2023
df99724
Merge branch 'master' into chime7task1
popcornell Feb 12, 2023
ea808d1
prevent exiting on data.sh
popcornell Feb 13, 2023
4180cf7
sox is appended after
popcornell Feb 13, 2023
31292ce
data prep is needed
popcornell Feb 13, 2023
d73ddda
addressed LDC path issues with train calls and mixer6
popcornell Feb 13, 2023
930d388
changed error display
popcornell Feb 13, 2023
ebe8db9
some comments changed
popcornell Feb 13, 2023
270fad9
default is 80% mics channel selection
popcornell Feb 13, 2023
cfbb957
Merge branch 'chime7task1' of https://github.com/popcornell/espnet
popcornell Feb 13, 2023
c1abe1b
applied black
popcornell Feb 13, 2023
f28cca8
applied black
popcornell Feb 13, 2023
e87df34
added registration link to README.md
popcornell Feb 13, 2023
a49870a
added details about evaluation script
popcornell Feb 13, 2023
00308e1
added details about non determinism in GSS inference
popcornell Feb 13, 2023
5179f7a
Merge branch 'master' into chime7task1
popcornell Feb 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Empty file modified egs/chime6/asr1/local/distant_audio_list
100644 → 100755
Empty file.
4 changes: 2 additions & 2 deletions egs2/chime7_task1/asr1/local/gen_task1_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ mixer6_root=
. ./utils/parse_options.sh || exit 1


if ! [ -d chime5_root ]; then
if [ -z "$chime5_root" ]; then
skip_stages="1" # if chime5 undefined skip chime6 generation
fi

Expand All @@ -41,7 +41,7 @@ fi

if [ ${stage} -le 1 ] && ! contains $skip_stages 1; then
# from CHiME5 create CHiME6
./generate_chime6_data.sh --cmd "$cmd" \
./local/generate_chime6_data.sh --cmd "$cmd" \
$chime5_root \
$chime6_root
fi
Expand Down
39 changes: 15 additions & 24 deletions egs2/chime7_task1/asr1/local/generate_chime6_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,18 @@ sessions1="S01 S02 S03 S04 S05 S06 S07"
sessions2="S08 S09 S12 S13 S16 S17 S18"
sessions3="S19 S20 S21 S22 S23 S24"

CONDA_PATH=${MAIN_ROOT}/tools/venv/bin/sox
CONDA_SOX=$(dirname $(which python))
if [ -z "${CONDA_SOX}" ]; then
echo "Please run ./local/install_dependencies.sh to install sox via conda"
exit 1
fi

IN_PATH=${sdir}/audio
OUT_PATH=${odir}/audio
TMP_PATH=${odir}/audio_tmp



if [ ! -d "${IN_PATH}" ]; then
echo "please specify the CHiME-5 data path correctly"
exit 1
Expand All @@ -74,49 +81,33 @@ if [ -f ${odir}/audio/dev/S02_P05.wav ]; then
fi

pushd ${SYNC_PATH}
echo "Correct for frame dropping"
for session in ${sessions1}; do
$cmd ${expdir}/correct_signals_for_frame_drops.${session}.log \
${CONDA_PATH}/python correct_signals_for_frame_drops.py --session=${session} chime6_audio_edits.json $IN_PATH $TMP_PATH &
done
wait
for session in ${sessions2}; do
$cmd ${expdir}/correct_signals_for_frame_drops.${session}.log \
${CONDA_PATH}/python correct_signals_for_frame_drops.py --session=${session} chime6_audio_edits.json $IN_PATH $TMP_PATH &
done
wait
for session in ${sessions3}; do
$cmd ${expdir}/correct_signals_for_frame_drops.${session}.log \
${CONDA_PATH}/python correct_signals_for_frame_drops.py --session=${session} chime6_audio_edits.json $IN_PATH $TMP_PATH &
done
wait


echo "Sox processing for correcting clock drift"
for session in ${sessions1}; do
$cmd ${expdir}/correct_signals_for_clock_drift.${session}.log \
${CONDA_PATH}/python correct_signals_for_clock_drift.py --session=${session} --sox_path $CONDA_PATH chime6_audio_edits.json $TMP_PATH $OUT_PATH &
python correct_signals_for_clock_drift.py --session=${session} --sox_path $CONDA_SOX chime6_audio_edits.json $TMP_PATH $OUT_PATH &
done
wait
for session in ${sessions2}; do
$cmd ${expdir}/correct_signals_for_clock_drift.${session}.log \
${CONDA_PATH}/python correct_signals_for_clock_drift.py --session=${session} --sox_path $CONDA_PATH chime6_audio_edits.json $TMP_PATH $OUT_PATH &
python correct_signals_for_clock_drift.py --session=${session} --sox_path $CONDA_SOX chime6_audio_edits.json $TMP_PATH $OUT_PATH &
done
wait
for session in ${sessions3}; do
$cmd ${expdir}/correct_signals_for_clock_drift.${session}.log \
${CONDA_PATH}/python correct_signals_for_clock_drift.py --session=${session} --sox_path $CONDA_PATH chime6_audio_edits.json $TMP_PATH $OUT_PATH &
python correct_signals_for_clock_drift.py --session=${session} --sox_path $CONDA_SOX chime6_audio_edits.json $TMP_PATH $OUT_PATH &
done
wait

echo "adjust the JSON files"
mkdir -p ${odir}/transcriptions/eval ${odir}/transcriptions/dev ${odir}/transcriptions/train
${CONDA_PATH}/python correct_transcript_for_clock_drift.py --clock_drift_data chime6_audio_edits.json ${sdir}/transcriptions ${odir}/transcriptions
python correct_transcript_for_clock_drift.py --clock_drift_data chime6_audio_edits.json ${sdir}/transcriptions ${odir}/transcriptions
popd

# finally check md5sum
pushd ${odir}
echo "check MD5 hash value for generated audios"
md5sum -c ${SYNC_PATH}/audio_md5sums.txt || echo "check https://github.com/chimechallenge/chime6-synchronisation"
popd
sed "s+audio+${odir}/audio+g" ${SYNC_PATH}/audio_md5sums.txt > ${SYNC_PATH}/audio_md5sums_abs.txt
md5sum -c ${SYNC_PATH}/audio_md5sums_abs.txt || echo "check https://github.com/chimechallenge/chime6-synchronisation"

echo "`basename $0` Done."
210 changes: 210 additions & 0 deletions egs2/chime7_task1/asr1/local/gss_micrank.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
import argparse
import lhotse
import torchaudio
import torch
from copy import deepcopy
import tqdm
from pathlib import Path
import os
from torch.utils.data import DataLoader, Dataset


class EnvelopeVariance(torch.nn.Module):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Channel selection is based on Envelope variance right now.
It is not guarantee it will work because of overlapped speech

"""
Envelope Variance Channel Selection method with
(optionally) learnable per mel-band weights.
"""

def __init__(
self,
n_mels=40,
n_fft=400,
hop_length=200,
samplerate=16000,
eps=1e-6,
chunk_size=4,
chunk_stride=2,
):
super(EnvelopeVariance, self).__init__()
self.mels = torchaudio.transforms.MelSpectrogram(
sample_rate=samplerate,
n_fft=n_fft,
hop_length=hop_length,
n_mels=n_mels,
power=2,
)
self.eps = eps
self.subband_weights = torch.nn.Parameter(torch.ones(n_mels))
self.chunk_size = int(chunk_size * samplerate / hop_length)
self.chunk_stride = int(chunk_stride * samplerate / hop_length)

def _single_window(self, mels):
logmels = torch.log(mels + self.eps)
mels = torch.exp(logmels - torch.mean(logmels, -1, keepdim=True))
var = torch.var(mels ** (1 / 3), dim=-1) # channels, subbands
var = var / torch.amax(var, 1, keepdim=True)
subband_weights = torch.abs(self.subband_weights)
ranking = torch.sum(var * subband_weights, -1)
return ranking

def _count_chunks(self, inlen, chunk_size, chunk_stride):
return int((inlen - chunk_size + chunk_stride) / chunk_stride)

def _get_chunks_indx(self, in_len, chunk_size, chunk_stride, discard_last=False):
i = -1
for i in range(self._count_chunks(in_len, chunk_size, chunk_stride)):
yield i * chunk_stride, i * chunk_stride + chunk_size
if not discard_last and i * chunk_stride + chunk_size < in_len:
if in_len - (i + 1) * chunk_stride > 0:
yield (i + 1) * chunk_stride, in_len

def forward(self, channels):
assert channels.ndim == 3
mels = self.mels(channels)
if mels.shape[-1] > (self.chunk_size + self.chunk_stride):
# using for because i am too lazy of taking care of padded
# values in stats computation, but this is fast

indxs = self._get_chunks_indx(
mels.shape[-1], self.chunk_size, self.chunk_stride
)
all_win_ranks = [self._single_window(mels[..., s:t]) for s, t in indxs]

return torch.stack(all_win_ranks).mean(0)
else:
return self._single_window(mels)


class MicRanking(Dataset):
def __init__(self, recordings, supervisions, ranker, top_k):
super().__init__()

self.recordings = recordings
self.supervisions = supervisions
self.ranker = ranker
self.top_k = top_k

def __len__(self):
return len(self.supervisions)

def _get_read_chans(self, c_recordings, start, duration, fs=16000):
to_tensor = []
chan_indx = []
for recording in c_recordings.sources:
c_wav, _ = torchaudio.load(
recording.source,
frame_offset=int(start * fs),
num_frames=int(duration * fs),
)
assert (
c_wav.shape[0] == 1
), "Input audio should be mono for channel selection in this script."
to_tensor.append(c_wav)
chan_indx.append(recording.channels[0])

all_channels = torch.stack(to_tensor).transpose(0, 1)
return all_channels, chan_indx

def __getitem__(self, item):
c_supervision = self.supervisions[item]
start = c_supervision.start
duration = c_supervision.duration
c_recordings = recordings[c_supervision.recording_id]
fs = c_recordings.sampling_rate
all_channels, chan_indx = self._get_read_chans(
c_recordings, start, duration, fs
)

assert all_channels.ndim == 3
assert (
all_channels.shape[0] == 1
), "If batch size is more than one here something went wrong."
with torch.inference_mode():
c_scores = ranker(all_channels)
c_scores = c_scores[0].numpy().tolist()
c_scores = [(x, y) for x, y in zip(c_scores, chan_indx)]
c_scores = sorted(c_scores, key=lambda x: x[0])
c_scores = c_scores[: int(len(c_scores) * self.top_k)]
new_sup = deepcopy(c_supervision)
new_sup.channel = sorted([x[-1] for x in c_scores])
return new_sup


if __name__ == "__main__":
parser = argparse.ArgumentParser(
"We use this script to select a subset of " "microphones to feed to GSS."
)
parser.add_argument(
"-r,--recordings",
type=str,
metavar="STR",
dest="recordings",
help="Input recordings lhotse manifest",
)
parser.add_argument(
"-s,--supervisions",
type=str,
metavar="STR",
dest="supervisions",
help="Input supervisions lhotse manifest",
)
parser.add_argument(
"-o, --out_name",
type=str,
metavar="STR",
dest="out_name",
help="Name and path for the new output manifests with the reduced "
"channels. E.g. /tmp/chime6_selected --> will create "
"chime6_selected_recordings.jsonl.gz and chime6_selected_supervisions.jsonl.gz",
)
parser.add_argument(
"-k, --top_k",
default=25,
type=int,
metavar="INT",
dest="top_k",
help="Percentage of best microphones to keep (e.g. 20 -> 20% of all microphones)",
)
parser.add_argument(
"--nj",
default=8,
type=int,
metavar="INT",
dest="nj",
help="Number of parallel jobs",
)
args = parser.parse_args()

recordings = lhotse.load_manifest(args.recordings)
supervisions = lhotse.load_manifest(args.supervisions)
output_filename = args.out_name
ranker = EnvelopeVariance(samplerate=recordings[0].sampling_rate)
single_thread = MicRanking(recordings, supervisions, ranker, args.top_k / 100)
dataloader = DataLoader(
single_thread,
shuffle=False,
batch_size=1,
num_workers=args.nj,
drop_last=False,
collate_fn=lambda batch: [x for x in batch],
)

new_supervisions = []
for i_batch, elem in enumerate(tqdm.tqdm(dataloader)):
new_supervisions.extend(elem)

recording_set, supervision_set = lhotse.fix_manifests(
lhotse.RecordingSet.from_recordings(recordings),
lhotse.SupervisionSet.from_segments(new_supervisions),
)
# Fix manifests
lhotse.validate_recordings_and_supervisions(recording_set, supervision_set)

Path(output_filename).parent.mkdir(exist_ok=True, parents=True)
filename = Path(output_filename).stem
supervision_set.to_file(
os.path.join(Path(output_filename).parent, f"{filename}_supervisions.jsonl.gz")
)
recording_set.to_file(
os.path.join(Path(output_filename).parent, f"{filename}_recordings.jsonl.gz")
)
4 changes: 2 additions & 2 deletions egs2/chime7_task1/asr1/local/install_dependencies.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,10 @@ ${MAIN_ROOT}/tools/installers/install_s3prl.sh

if ! command -v gss &>/dev/null; then
conda install -yc conda-forge cupy=10.2
${MAIN_ROOT}/tools/installers/install_gss.sh.
${MAIN_ROOT}/tools/installers/install_gss.sh
fi

sox_conda=`command -v ../../../tools/venv/bin/sox 2>/dev/null`
sox_conda=`command -v $(dirname $(which python))/sox 2>/dev/null`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully this fixes the sox issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI. Conda setups some useful shell environment variables: CONDA_PREFIX, CONDA_EXE, etc.

If sox was installed by sox, the path should be ${CONDA_PREFIX}/bin/sox

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks I was not aware of CONDA_PREFIX. Seems much better to use that, it is more clean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamo-naoyuki I followed your suggestion and added a script + JSON file to check the MD5 checksum for each file https://github.com/espnet/espnet/blob/cfbb957d9c71c5c7aed27a1d4b2b85b62721381a/egs2/chime7_task1/asr1/local/check_data_gen.py but I had also to add a .json file to this recipe. Is it ok ?

if [ -z "${sox_conda}" ]; then
echo "install conda sox (v14.4.2)"
conda install -c conda-forge sox
Expand Down
22 changes: 20 additions & 2 deletions egs2/chime7_task1/asr1/local/run_gss.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,35 @@ cmd=run.pl #if you use gridengine: "queue-freegpu.pl --gpu 1 --mem 8G --config c
max_batch_duration=90 # adjust based on your GPU VRAM, here 40GB
max_segment_length=200
channels=
sel_nj=32
top_k=25
use_selection=0

. ./path.sh
. parse_options.sh

mkdir -p ${exp_dir}/${dset_name}/${dset_part}

if [ $use_selection == 1 ]; then
echo "Stage 0: Selecting a subset of channels"
python local/gss_micrank.py -r ${manifests_dir}/${dset_name}/${dset_part}/${dset_name}-mdm_recordings_${dset_part}.jsonl.gz \
-s ${manifests_dir}/${dset_name}/${dset_part}/${dset_name}-mdm_supervisions_${dset_part}.jsonl.gz \
-o ${manifests_dir}/${dset_name}/${dset_part}/${dset_name}_{dset_part}_selected \
-k $top_k \
--nj $sel_nj \

recordings=${manifests_dir}/${dset_name}/${dset_part}/${dset_name}_{dset_part}_selected_recordings.jsonl.gz
supervisions=${manifests_dir}/${dset_name}/${dset_part}/${dset_name}_{dset_part}_selected_supervisions.jsonl.gz
else
recordings=${manifests_dir}/${dset_name}/${dset_part}/${dset_name}-mdm_recordings_${dset_part}.jsonl.gz
supervisions=${manifests_dir}/${dset_name}/${dset_part}/${dset_name}-mdm_supervisions_${dset_part}.jsonl.gz
fi

if [ $stage -le 1 ] && [ $stop_stage -ge 1 ]; then
echo "Stage 1: Prepare cut set"
lhotse cut simple --force-eager \
-r ${manifests_dir}/${dset_name}/${dset_part}/${dset_name}-mdm_recordings_${dset_part}.jsonl.gz \
-s ${manifests_dir}/${dset_name}/${dset_part}/${dset_name}-mdm_supervisions_${dset_part}.jsonl.gz \
-r $recordings \
-s $supervisions \
${exp_dir}/${dset_name}/${dset_part}/cuts.jsonl.gz
fi

Expand Down
13 changes: 7 additions & 6 deletions egs2/chime7_task1/asr1/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,10 @@ if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
exit
fi

if [ ${dset_name} == dipco ]; then
channels=2,5,9,12,16,19,23,26,30,33 # in dipco only using opposite mics on each array, works better
elif [ ${dset_name} == chime6 ] && [ ${dset_part} == dev ]; then # use only outer mics
channels=0,3,4,7,8,11,12,15,16,19
if [ ${dset_part} == dev ]; then # use only outer mics
use_selection=1
else
use_selection=0
fi

if [ ${dset_part} == train ]; then
Expand All @@ -144,7 +144,8 @@ if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
--nj $ngpu \
--max-segment-length $max_segment_length \
--max-batch-duration $gss_max_batch_dur \
--channels $channels
--channels $channels \
--use-selection $use_selection
log "Guided Source Separation processing for ${dset_name}/${dset_part} was successful !"
done
fi
Expand All @@ -167,7 +168,7 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then

pretrained_affix=
if [ -z "$use_pretrained" ]; then
pretrained_affix+="--skip_data_prep true --skip_train true "
pretrained_affix+="--skip_data_prep false --skip_train true "
pretrained_affix+="--download_model ${use_pretrained}"
fi

Expand Down