Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some files not found, when execute auto_phrase.sh #2

Closed
CenturySee opened this issue Dec 26, 2016 · 8 comments
Closed

some files not found, when execute auto_phrase.sh #2

CenturySee opened this issue Dec 26, 2016 · 8 comments

Comments

@CenturySee
Copy link

when I execute auto_phrase.sh, some Exceptions are founded:
java.io.FileNotFoundException: tmp/final_quality_multi-words.txt
java.io.FileNotFoundException: tmp/final_quality_unigrams.txt
java.io.FileNotFoundException: tmp/final_quality_salient.txt

According to the error information, something wrong was located
at Tokenizer.tokenizeText(Tokenizer.java:618)
at Tokenizer.main(Tokenizer.java:766)

so, how to resolve this?

@remenberl
Copy link
Collaborator

May I ask what the language is your data set? Or can you provide a sample set for us to reproduce the results?

@CenturySee
Copy link
Author

I just use the Default Run as the README writes which will download DBLP.txt.gz from "http://dmserv2.cs.illinois.edu/data/DBLP.txt.gz". Maybe something goes wrong with the downloaded file. I will check it first.

@remenberl
Copy link
Collaborator

It's quite strange. According to your reported exception, the bash script got stuck at the very end of the job (Generating Output stage, line 108). I have rerun the current repository on our Linux machine but couldn't see this problem. Could you paste the complete log?

@CenturySee
Copy link
Author

CenturySee commented Dec 29, 2016

Here is the output:
===Compilation===
===Tokenization===
-ne Current step: Tokenizing input file...
Picked up JAVA_TOOL_OPTIONS: -Duser.language=en

real 2m41.072s
user 8m48.506s
sys 0m9.457s
-ne Detected Language: EN

-ne Current step: Tokenizing stopword file...
Picked up JAVA_TOOL_OPTIONS: -Duser.language=en
-ne Current step: Tokenizing wikipedia phrases...

Picked up JAVA_TOOL_OPTIONS: -Duser.language=en
Picked up JAVA_TOOL_OPTIONS: -Duser.language=en
===Part-Of-Speech Tagging===
Current step: Tagging...
ERROR: while reading string from binary file
aborted.

ERROR: while reading string from binary file
aborted.

ERROR: while reading string from binary file
aborted.

ERROR: while reading string from binary file
aborted.

ERROR: while reading string from binary file
aborted.

ERROR: while reading string from binary file
aborted.

ERROR: while reading string from binary file
aborted.

ERROR: while reading string from binary file
aborted.

ERROR: while reading string from binary file
aborted.

ERROR: while reading string from binary file
aborted.
Current step: Merging...
===AutoPhrasing===
Loading data...
'#' of total tokens = 111149629
max word token id = 553946
POS file doesn't have enough POS tags

real 0m10.902s
user 0m10.188s
sys 0m0.489s
===Generating Output===
Picked up JAVA_TOOL_OPTIONS: -Duser.language=en
java.io.FileNotFoundException: tmp/final_quality_multi-words.txt (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at java.io.FileInputStream.(FileInputStream.java:93)
at Tokenizer.tokenizeText(Tokenizer.java:618)
at Tokenizer.main(Tokenizer.java:766)
Picked up JAVA_TOOL_OPTIONS: -Duser.language=en
java.io.FileNotFoundException: tmp/final_quality_unigrams.txt (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at java.io.FileInputStream.(FileInputStream.java:93)
at Tokenizer.tokenizeText(Tokenizer.java:618)
at Tokenizer.main(Tokenizer.java:766)
Picked up JAVA_TOOL_OPTIONS: -Duser.language=en
java.io.FileNotFoundException: tmp/final_quality_salient.txt (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at java.io.FileInputStream.(FileInputStream.java:93)
at Tokenizer.tokenizeText(Tokenizer.java:618)
at Tokenizer.main(Tokenizer.java:766)

I download the DBLP.txt.gz by hand, and put it in data folder. And first several lines are:
OQL[C++]: Extending C++ with an Object Query Capability.
.
Transaction Management in Multidatabase Systems.
.
Overview of the ADDS System.
.
Multimedia Information Systems: Issues and Approaches.
.
Active Database Systems.
.
Where Object-Oriented DBMSs Should Do Better: A Critique Based on Early Experiences.
.
Distributed Databases.
.
An Object-Oriented DBMS War Story: Developing a Genome Mapping Database in C++.

@remenberl
Copy link
Collaborator

The dataset should be correct.
I guess the problem is with your environment. Could you help provide your operating system and java version (run java -version)?

@remenberl
Copy link
Collaborator

If it is linux, I need your kernel output by running

uname -r

@CenturySee
Copy link
Author

I am using mac os sierra,
java version is
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)

the mac os kernel output is 16.3.0

@remenberl
Copy link
Collaborator

FYI, I have tested on a notebook with sierra (16.3.0) installed. After installing gcc6 and java8, the script runs correctly. Please consider reinstalling gcc and java following the updated instructions

g++ 6 $ brew install gcc6
Java 8 $ brew update; brew tap caskroom/cask; brew install Caskroom/cask/java

After installation, you should have the following versions for gcc and java respectively.
gcc version 6.2.0 (Homebrew gcc6 6.2.0)
java version "1.8.0_112"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants