Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more detailed steps to update the dictionaries #112

Open
kenden opened this issue Jun 6, 2019 · 5 comments
Open

Add more detailed steps to update the dictionaries #112

kenden opened this issue Jun 6, 2019 · 5 comments

Comments

@kenden
Copy link

kenden commented Jun 6, 2019

Add software requirements and steps to update the dictionaries.
This should help potential extra maintainers

I managed to get WiktionarySplitter.sh to start processing with the following steps.
(I'm dumping this here for now in case it helps someone)

Using an Ubuntu 18.04 VM
# clone Dictionary
# clone DictionaryPC
$ apt install openjdk-11-jdk

$ ./compile.sh 
ICU4J needs to be installed
--> apt search ICU4J
--> apt install libicu4j-49-java

$ ./compile.sh 
Junit needs to be installed
---> sudo apt install junit

$ ./compile.sh 
Xerces needs to be installed
--> sudo apt install libxerces2-java


$ ./compile.sh 
commons-lang needs to be installed
---> apt install libcommons-lang3-java

$ ./compile.sh 
commons-compress needs to be installed
--> libcommons-compress-java

$ ./compile.sh 
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: ../Dictionary/Util/src/com/hughes/util/CachingList.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

---> change compile.sh to have:
javac -g -Xlint:deprecation -Xlint:unchecked ../Dictionary/Util/src/com/hughes/util/*.java ../Dictionary/Util/src/com/hughes/util/raf/*.java ../Dictionary/src/com/hughes/android/dictionary/DictionaryInfo.java ../Dictionary/src/com/hughes/android/dictionary/engine/*.java ../Dictionary/src/com/hughes/android/dictionary/C.java src/com/hughes/util/*.java src/com/hughes/android/dictionary/*.java src/com/hughes/android/dictionary/*/*.java src/com/hughes/android/dictionary/*/*/*.java -classpath "$ICU4J:$JUNIT:$XERCES:$COMMONS:$COMMONS_COMPRESS"

$ ./compile.sh 
../Dictionary/Util/src/com/hughes/util/CachingList.java:32: warning: [unchecked] unchecked cast
        chunked = useChunked ? (ChunkedList<T>)list : null;
                                               ^
  required: ChunkedList<T>
  found:    List<T>
  where T is a type-variable:
    T extends Object declared in class CachingList
src/com/hughes/util/MapUtil.java:39: warning: [deprecation] newInstance() in Class has been deprecated
                map.put(key, valueClass.newInstance());
                                       ^
  where T is a type-variable:
    T extends Object declared in class Class
src/com/hughes/android/dictionary/parser/wiktionary/WholeSectionToHtmlParser.java:399: warning: [deprecation] StringEscapeUtils in org.apache.commons.lang3 has been deprecated
        final String htmlEscaped = StringEscapeUtils.escapeHtml3(plainText);
                                   ^
3 warnings


# Note: there are 2048m of RAM in the VM used to test this
$ ./WiktionarySplitter.sh 
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.base/java.io.PipedInputStream.initPipe(PipedInputStream.java:161)
	at java.base/java.io.PipedInputStream.<init>(PipedInputStream.java:125)
	at com.hughes.android.dictionary.engine.WriteBuffer.<init>(WriteBuffer.java:28)
	at com.hughes.android.dictionary.engine.WiktionarySplitter.go(WiktionarySplitter.java:89)
	at com.hughes.android.dictionary.engine.WiktionarySplitter.main(WiktionarySplitter.java:60)
---> add '-Xmx2048m' to java command in ./WiktionarySplitter.sh
$ ./WiktionarySplitter.sh 
./WiktionarySplitter.sh: line 15:  7924 Killed                  "$JAVA" -Xverify:none -Xmx2048m -classpath src:../Util/src/:../Dictionary/src/:"$ICU4J":"$XERCES":"$COMMONS_COMPRESS" com.hughes.android.dictionary.engine.WiktionarySplitter "$@"
---> increase amount of ram in the VM to 4096m
---> add '-Xmx3072m' to java command in ./WiktionarySplitter.sh
$ ./WiktionarySplitter.sh

--> hitting issue #81

@rdoeffinger
Copy link
Owner

I don't think changing the compile.sh step makes sense, it just makes those warnings even more annoying, since at this point I don't intend or can't fix them.
As to the memory change: I don't think you can run this process with less than 8 GB of RAM anyway, which might be the reason I never needed to change the memory allocation for Java (Java only being able to use a fixed amount of RAM still is a really bad joke anyway).
I'll change it to use the same value as run.sh (and I ought to look into if it couldn't/shoudn't re-use run.sh anyway).

@kenden
Copy link
Author

kenden commented Jun 21, 2019

It would be helpful to indicate the requirements for development, maybe in the README.md, or in a new CONTRIBUTING.md.

Have things like:

Hardware requirements:

  • 8 GB RAM minimum (maybe 5 is enough?)
  • XX GB disk minimum

Software requirements:

  • OS: Linux:

    • tested to work: Ubuntu XXX(16.04, 18.04...)
  • Packages:

    • For Ubuntu 18.04
apt install 
  openjdk-11-jdk \
  libicu4j-49-java \
  junit \
  libxerces2-java \
  libcommons-lang3-java
  libcommons-compress-java

@kenden
Copy link
Author

kenden commented Jun 21, 2019

Maybe we could also make it easier with docker, have a Dockerfile to specify the build environment and use it to build:

# Add Dockerfile
$ mkdir docker
$ echo 'FROM ubuntu:18.04

RUN apt-get update \
    && apt-get install -y \
         openjdk-11-jdk \
         libicu4j-49-java \
         junit \
         libxerces2-java \
         libcommons-lang3-java \
         libcommons-compress-java' \
> docker/Dockerfile

# create the development environment
$ docker build -t dictionary_build_env --file=docker/Dockerfile docker

# build inside the development environment
$ docker run -it --rm \
     --volume $(pwd):/workspace \
     dictionary_build_env \
       bash -c \
         'cd /workspace/DictionaryPC/ && ./compile.sh'

Note: I had to add -encoding UTF-8 -Xlint:deprecation to compile.sh for it to work.

@rdoeffinger
Copy link
Owner

Sorry that I don't really have time to help much on this, except helping with specific issues.
Does docket by default pull an image that is not configured with a UTF-8 locale?
I can reproduce with "LANG=en_US ./compile.sh" but "LANG=en_US.UTF-8 ./compile.sh"
I guess I can add it, but I don't really intend to support Linux setups stuck in the 1990s...

@rdoeffinger
Copy link
Owner

Here's the Java code compiled to a native Linux binary.
The latest git code is changed to use that binary if it exists.
I have note tested compatibility with older Linux distributions etc. pp. or tested the scripts all that well, but if anyone wants to help, I think that's likely the best approach available for better usability.
I can't generate Windows binary so far, though in theory it should be possible. But it should be
DictionaryPC.zip
possible run under WSL on Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants