New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segment_long_utterances.sh failing on decode_segmentation #1629
Comments
This looks like an error in the script, not a user error; Vimal will fix it today hopefully. |
@nizmagu Can you check if this solves the problem? |
This solves the decode problem, however a new problem came up. The script crashed at stage 9 with the following error: Here is a sample log file:
I tried to change
Then the script crashed again: Here is the log file:
|
I'll create a pull request soon.
On Mon, May 22, 2017, 07:18 nizmagu ***@***.***> wrote:
This solves the decode problem, however a new problem came up.
The script crashed at stage 9 with the following error:
run.pl: 6 / 6 failed, log is in
exp/segment_train_long/lats/log/retrieve_similar_docs.*.log
Here is a sample log file:
# steps/cleanup/internal/retrieve_similar_docs.py --query-tfidf=exp/segment_train_long/query_docs/split6/query_tf_idf.1.ark.txt --source-text2tfidf-file=exp/segment_train_long/docs/source2tf_idf.scp --source-text-id2doc-ids=exp/segment_train_long/docs/text2doc --query-id2source-text-id=exp/segment_train_long/new2orig_utt --num-neighbors-to-search=1 --neighbor-tfidf-threshold=0.5 --relevant-docs=exp/segment_train_long/query_docs/split6/relevant_docs.1.txt
# Started at Mon May 22 14:07:50 IDT 2017
#
usage: retrieve_similar_docs.py [-h] [--verbose {0,1,2,3}]
[--num-neighbors-to-search NUM_NEIGHBORS_TO_SEARCH]
[--neighbor-tfidf-threshold NEIGHBOR_TFIDF_THRESHOLD]
[--partial-doc-fraction PARTIAL_DOC_FRACTION]
--source-text-id2doc-ids
SOURCE_TEXT_ID2DOC_IDS
--query-id2source-text-id
QUERY_ID2SOURCE_TEXT_ID --source-text-id2tfidf
SOURCE_TEXT_ID2TFIDF --query-tfidf QUERY_TFIDF
--relevant-docs RELEVANT_DOCS
retrieve_similar_docs.py: error: argument --source-text-id2tfidf is required
# Accounting: time=0 threads=1
# Ended (code 2) at Mon May 22 14:07:50 IDT 2017, elapsed time 0 seconds
I tried to change --source-text2tfidf-file to --source-text-id2tfidf and
this was the result:
# steps/cleanup/internal/retrieve_similar_docs.py --query-tfidf=exp/segment_train_long/query_docs/split6/query_tf_idf.1.ark.txt --source-text-id2tfidf=exp/segment_train_long/docs/source2tf_idf.scp --source-text-id2doc-ids=exp/segment_train_long/docs/text2doc --query-id2source-text-id=exp/segment_train_long/new2orig_utt --num-neighbors-to-search=1 --neighbor-tfidf-threshold=0.5 --relevant-docs=exp/segment_train_long/query_docs/split6/relevant_docs.1.txt
# Started at Mon May 22 14:11:06 IDT 2017
#
2017-05-22 14:11:06,790 [retrieve_similar_docs.py:336 - run - INFO ] Retrieved similar documents for 0 queries
Traceback (most recent call last):
File "steps/cleanup/internal/retrieve_similar_docs.py", line 353, in <module>
main()
File "steps/cleanup/internal/retrieve_similar_docs.py", line 348, in main
args.relevant_docs, args.query_tfidf, args.source_tfidf]:
AttributeError: 'Namespace' object has no attribute 'source_tfidf'
# Accounting: time=0 threads=1
# Ended (code 1) at Mon May 22 14:11:06 IDT 2017, elapsed time 0 seconds
args.source_tfidf seemed to only be referenced once (in the closing
command), so I changed it again to args.source_text_id2tfidf.
Then the script crashed again:
run.pl: 6 / 6 failed, log is in
exp/segment_train_long/lats/log/get_ctm_edits.*.log
Here is the log file:
# steps/cleanup/internal/stitch_documents.py --query2docs=exp/segment_train_long/query_docs/split6/relevant_docs.1.txt --input-documents=exp/segment_train_long/docs/split6/1/docs.txt --output-documents=- | steps/cleanup/internal/align_ctm_ref.py --eps-symbol="<eps>" --oov-word='<UNK>' --symbol-table=data/lang/words.txt --hyp-format=CTM --align-full-hyp=false --hyp=exp/segment_train_long/lats/score_10/train_long_uniform_seg.ctm.1 --ref=- --output=exp/segment_train_long/lats/score_10/train_long_uniform_seg.ctm_edits.1
# Started at Mon May 22 14:15:01 IDT 2017
#
Traceback (most recent call last):
File "steps/cleanup/internal/align_ctm_ref.py", line 615, in <module>
main()
File "steps/cleanup/internal/align_ctm_ref.py", line 598, in main
args = get_args()
File "steps/cleanup/internal/align_ctm_ref.py", line 103, in get_args
"--reco2file-and-channel must be provided for "
RuntimeError: --reco2file-and-channel must be provided for hyp-format=CTM
usage: stitch_documents.py [-h] --query2docs QUERY2DOCS --input-documents
INPUT_DOCUMENTS --output-documents OUTPUT_DOCUMENTS
[--check-sorted-docs-per-query {true,false}]
stitch_documents.py: error: argument --input-documents: can't open 'exp/segment_train_long/docs/split6/1/docs.txt': [Errno 2] No such file or directory: 'exp/segment_train_long/docs/split6/1/docs.txt'
# Accounting: time=0 threads=1
# Ended (code 1) at Mon May 22 14:15:01 IDT 2017, elapsed time 0 seconds
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1629 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEATV4yNpLasXvMnukCyi13GwmxjtpWmks5r8W8NgaJpZM4NfWes>
.
--
Vimal Manohar
PhD Student
Electrical & Computer Engineering
Johns Hopkins University
|
I fixed some issues in #1639 |
That fixed the issue, thanks a lot! |
This issue may still exist for the |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I was trying to use segment_long_utterances.sh on 6 5-hour-long files.
Upon reaching stage 4, I get the following message:
When inspecting the log files, decode_segmentation gave this error:
I tried to use validate_data_dir.sh and it says the files are in sorted order.
I believed it may have something to do with locale so I used
export LANG=
andexport LC_ALL=C
and checked withsort -c
to no avail.How can I fix this issue?
The text was updated successfully, but these errors were encountered: