Skip to content

Commit

Permalink
[egs] Add missing file local/join_suffix.py in TEDLIUM s5_r3; thx:ana…
Browse files Browse the repository at this point in the history
…nd@sayint.ai (#2741)
  • Loading branch information
huangruizhe authored and danpovey committed Sep 26, 2018
1 parent 1d079fa commit f1f9a48
Showing 1 changed file with 26 additions and 0 deletions.
26 changes: 26 additions & 0 deletions egs/tedlium/s5_r3/local/join_suffix.py
@@ -0,0 +1,26 @@
#!/usr/bin/env python
#
# Copyright 2014 Nickolay V. Shmyrev
# 2016 Johns Hopkins University (author: Daniel Povey)
# Apache 2.0


import sys
from codecs import open

# This script joins together pairs of split-up words like "you 're" -> "you're".
# The TEDLIUM transcripts are normalized in a way that's not traditional for
# speech recognition.

for line in sys.stdin:
items = line.split()
new_items = []
i = 1
while i < len(items):
if i < len(items) - 1 and items[i+1][0] == '\'':
new_items.append(items[i] + items[i+1])
i = i + 1
else:
new_items.append(items[i])
i = i + 1
print(items[0] + ' ' + ' '.join(new_items))

0 comments on commit f1f9a48

Please sign in to comment.