Skip to content

Commit

Permalink
Fix errors in scripts for dataset creation
Browse files Browse the repository at this point in the history
  • Loading branch information
PanosAntoniadis committed May 25, 2019
1 parent 568348c commit b0e04fc
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 4 deletions.
21 changes: 21 additions & 0 deletions data/scripts/createIds.py
@@ -0,0 +1,21 @@
# A python script that creates 'test.fileids' file
# for a certain dataset. This file contains the id
# of each file in the dataset in the form dataset_{id}.

import sys

f = open("test.fileids", 'w')

name = raw_input("Give dataset: ")
if (name != "radio" and name != "paramythi"):
sys.exit("Wrong dataset (should be either radio or paramythi)")
name = name.strip('\n') + "_"

n = int(input("Give number of examples: "))

for i in range(n):
if name == "radio_":
f.write(name + format(i, '02d') + '\n')
if name == "paramythi_":
f.write("Paramythi_horis_onoma_" + format(i, '04d') + '\n')
f.close()
7 changes: 3 additions & 4 deletions data/scripts/createText.py
@@ -1,7 +1,6 @@
#!/bin/bash

# A script that reads test.transcriptions file and write each
# sentence in 'train-text.txt' file line by line.
# A python script that reads 'test.transcriptions' file and
# write each sentence in 'train-text.txt' file line by line.
# Necessary to build a language model from transciptions.

# Open read file
r = open("../../test.transciptions", 'r')
Expand Down
5 changes: 5 additions & 0 deletions data/scripts/merge.sh
@@ -1,5 +1,10 @@
#!/bin/bash

# A bash script that takes two language models
# as input and merges them usign SRILM.
# Usage: ./merge <language model 1> <language model 2>

# Check arguments
if [ ! -f "$1" ]; then
echo "First .lm model does not exist"
exit 0
Expand Down

0 comments on commit b0e04fc

Please sign in to comment.