some files not found, when execute auto_phrase.sh #2

CenturySee · 2016-12-26T06:52:32Z

when I execute auto_phrase.sh, some Exceptions are founded:
java.io.FileNotFoundException: tmp/final_quality_multi-words.txt
java.io.FileNotFoundException: tmp/final_quality_unigrams.txt
java.io.FileNotFoundException: tmp/final_quality_salient.txt

According to the error information, something wrong was located
at Tokenizer.tokenizeText(Tokenizer.java:618)
at Tokenizer.main(Tokenizer.java:766)

so, how to resolve this?

remenberl · 2016-12-27T16:20:44Z

May I ask what the language is your data set? Or can you provide a sample set for us to reproduce the results?

CenturySee · 2016-12-28T15:44:53Z

I just use the Default Run as the README writes which will download DBLP.txt.gz from "http://dmserv2.cs.illinois.edu/data/DBLP.txt.gz". Maybe something goes wrong with the downloaded file. I will check it first.

remenberl · 2016-12-28T15:59:36Z

It's quite strange. According to your reported exception, the bash script got stuck at the very end of the job (Generating Output stage, line 108). I have rerun the current repository on our Linux machine but couldn't see this problem. Could you paste the complete log?

CenturySee · 2016-12-29T03:19:55Z

Here is the output:
===Compilation===
===Tokenization===
-ne Current step: Tokenizing input file...
Picked up JAVA_TOOL_OPTIONS: -Duser.language=en

real 2m41.072s
user 8m48.506s
sys 0m9.457s
-ne Detected Language: EN

-ne Current step: Tokenizing stopword file...
Picked up JAVA_TOOL_OPTIONS: -Duser.language=en
-ne Current step: Tokenizing wikipedia phrases...

Picked up JAVA_TOOL_OPTIONS: -Duser.language=en
Picked up JAVA_TOOL_OPTIONS: -Duser.language=en
===Part-Of-Speech Tagging===
Current step: Tagging...
ERROR: while reading string from binary file
aborted.

ERROR: while reading string from binary file
aborted.

ERROR: while reading string from binary file
aborted.
Current step: Merging...
===AutoPhrasing===
Loading data...
'#' of total tokens = 111149629
max word token id = 553946
POS file doesn't have enough POS tags

real 0m10.902s
user 0m10.188s
sys 0m0.489s
===Generating Output===
Picked up JAVA_TOOL_OPTIONS: -Duser.language=en
java.io.FileNotFoundException: tmp/final_quality_multi-words.txt (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at java.io.FileInputStream.(FileInputStream.java:93)
at Tokenizer.tokenizeText(Tokenizer.java:618)
at Tokenizer.main(Tokenizer.java:766)
Picked up JAVA_TOOL_OPTIONS: -Duser.language=en
java.io.FileNotFoundException: tmp/final_quality_unigrams.txt (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at java.io.FileInputStream.(FileInputStream.java:93)
at Tokenizer.tokenizeText(Tokenizer.java:618)
at Tokenizer.main(Tokenizer.java:766)
Picked up JAVA_TOOL_OPTIONS: -Duser.language=en
java.io.FileNotFoundException: tmp/final_quality_salient.txt (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at java.io.FileInputStream.(FileInputStream.java:93)
at Tokenizer.tokenizeText(Tokenizer.java:618)
at Tokenizer.main(Tokenizer.java:766)

I download the DBLP.txt.gz by hand, and put it in data folder. And first several lines are:
OQL[C++]: Extending C++ with an Object Query Capability.
.
Transaction Management in Multidatabase Systems.
.
Overview of the ADDS System.
.
Multimedia Information Systems: Issues and Approaches.
.
Active Database Systems.
.
Where Object-Oriented DBMSs Should Do Better: A Critique Based on Early Experiences.
.
Distributed Databases.
.
An Object-Oriented DBMS War Story: Developing a Genome Mapping Database in C++.

remenberl · 2016-12-29T20:31:37Z

The dataset should be correct.
I guess the problem is with your environment. Could you help provide your operating system and java version (run java -version)?

remenberl · 2016-12-29T20:33:33Z

If it is linux, I need your kernel output by running

uname -r

CenturySee · 2016-12-30T16:51:42Z

I am using mac os sierra,
java version is
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
the mac os kernel output is 16.3.0

remenberl · 2016-12-30T19:55:13Z

FYI, I have tested on a notebook with sierra (16.3.0) installed. After installing gcc6 and java8, the script runs correctly. Please consider reinstalling gcc and java following the updated instructions

g++ 6 $ brew install gcc6
Java 8 $ brew update; brew tap caskroom/cask; brew install Caskroom/cask/java

After installation, you should have the following versions for gcc and java respectively.
gcc version 6.2.0 (Homebrew gcc6 6.2.0)
java version "1.8.0_112"

remenberl closed this as completed Jan 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some files not found, when execute auto_phrase.sh #2

some files not found, when execute auto_phrase.sh #2

CenturySee commented Dec 26, 2016

remenberl commented Dec 27, 2016

CenturySee commented Dec 28, 2016

remenberl commented Dec 28, 2016

CenturySee commented Dec 29, 2016 •

edited

Loading

remenberl commented Dec 29, 2016

remenberl commented Dec 29, 2016

CenturySee commented Dec 30, 2016

remenberl commented Dec 30, 2016

some files not found, when execute auto_phrase.sh #2

some files not found, when execute auto_phrase.sh #2

Comments

CenturySee commented Dec 26, 2016

remenberl commented Dec 27, 2016

CenturySee commented Dec 28, 2016

remenberl commented Dec 28, 2016

CenturySee commented Dec 29, 2016 • edited Loading

remenberl commented Dec 29, 2016

remenberl commented Dec 29, 2016

CenturySee commented Dec 30, 2016

remenberl commented Dec 30, 2016

CenturySee commented Dec 29, 2016 •

edited

Loading