## Table of contents

* [Find out the version info of the underlying JDK/JVM on which this notebook is running](#Find-out-the-version-info-of-the-underlying-JDK/JVM-on-which-this-notebook-is-running)
* [Valohai command-line client](#Valohai-command-line-client)
* [Set up project using the vh client](Set-up-project-using-the-vh-client)
* Java bindings (Java API) via Valohai client
 * [Language Detector API](#Language-Detector-API)
 * [Sentence Detection API](#Sentence-Detection-API)
 * [Tokenizer API](#Tokenizer-API)
 * [Name Finder API](#Name-Finder-API)
 * [More Name Finder API examples](#More-Name-Finder-API-examples)
 * [Parts of speech (POS) Tagger API](#Parts-of-speech-(POS)-Tagger-API)
 * [Chunking API](#Chunking-API)
 * [Parsing API](#Parsing-API)

### Find out the version info of the underlying JDK/JVM on which this notebook is running

In [85]:
System.out.println("java.version: " + System.getProperty("java.version"));

java.version: 11.0.4


In [86]:
System.out.println("java.specification.version: " + System.getProperty("java.specification.version"));
System.out.println("java.runtime.version: " + System.getProperty("java.runtime.version"));

java.specification.version: 11
java.runtime.version: 11.0.4+11


In [87]:
import java.lang.management.ManagementFactory;

System.out.println("java runtime VM version: " + ManagementFactory.getRuntimeMXBean().getVmVersion());

java runtime VM version: 11.0.4+11


Return to [Table of contents](#Table-of-contents)

### Valohai command-line client

In [1]:
%system vh --help

Usage: vh [OPTIONS] COMMAND [ARGS]...

  :type ctx: click.Context

Options:
  --debug / --no-debug
  --output-format, --table-format [human|csv|tsv|scsv|psv|json]
  --valohai-host URL              Override the Valohai API host (default
                                  https://app.valohai.com/)  [env var:
                                  VALOHAI_HOST]
  --valohai-token SECRET          Use this Valohai authentication token  [env
                                  var: VALOHAI_TOKEN]
  --project UUID                  (Advanced) Override the project ID  [env
                                  var: VALOHAI_PROJECT]
  --project-mode local|remote     (Advanced) When using --project, set the
                                  project mode  [env var:
                                  VALOHAI_PROJECT_MODE]
  --project-root DIR              (Advanced) When using --project, set the
                                  project root directory  [env var:
                                  VALOHAI_PROJECT_

### Set up project using the vh client
_Your Valohai token has must have been provided (and set) during startup of the container. Without this the rest of the commands in the notebook may not work. The below commands expects it and will run successfully when it is not set in the environment._

In [1]:
%system ./create-project.sh nlp-java-jvm-example

😼  Success! Project nlp-java-jvm-example created.
🙂  Success! Linked /home/jovyan/work to nlp-java-jvm-example.


### Language Detector API

##### Show a simple example detecting a language of a sentence using a Language detecting model called langdetect-183.bin on a remote instance (powered by Valohai), from within the notebook cell using cell magic!

In [95]:
%system ./exec-step.sh "detect-language"

Executing step detect-language
Packaging /home/jovyan/work...
=>   Git not available, found 15 files to package
Uploading 27.1 kB...
😻  Success! Uploaded ad-hoc code ~44cee86dfced86ab0d69df943757ffb65b46917eac1c3bbd31d42b4b138b190a
😎  Success! Execution #23 created. See https://app.valohai.com/p/neomatrix369/nlp-java-jvm-example/execution/016eaa36-27d9-8731-b00a-f78d7fcd9afc/


In [96]:
%system ./watch-execution.sh 23

Watching counter latest
(nlp-java-jvm-example) #23                            2019-11-27T00:15:08.068091
Status: started     Step: detect-languagCommit: ~44cee86dfce           16 events
00:15:06.72  starting job on i-08c1fa5e3a874d093, Peon 0.27.1                   
00:15:06.73  free scratch space: 400.2 GB (400156061696 B)                      
00:15:06.75  downloading repository (code)                                      
00:15:06.77  /valohai/inputs/model/langdetect-183.bin: found in cache, 10.6 MB  
00:15:06.77  /valohai/inputs/java-program/DetectLanguage.java: found in cache, 1
00:15:06.77  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:15:06.77  /valohai/inputs/model/langdetect-183.bin: md5 sum: 87be0a1cf60e5d89
00:15:06.77  /valohai/inputs/java-program/DetectLanguage.java: md5 sum: e47ca724
00:15:06.77  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:15:06.77  /valohai/inputs/java-program/DetectLanguage.java: sha1 sum: bf66327
00:1

In [97]:
%system ./show-final-result.sh 23

Gathering output from counter latest
Nothing found, maybe it failed, maybe its still running.
Use the './watch-execution.sh latest' command to find out.


**Apparantly it detects this to be Latin, instead of English 
maybe the language detecting model needs more training.
See https://opennlp.apache.org/docs/1.9.1/manual/opennlp.html#tools.langdetect.training on how this can be achieved**

Return to [Table of contents](#Table-of-contents)

### Sentence Detection API


##### Show a simple example detecting sentences using a Sentence detecting model called en-sent.bin on a remote instance (powered by Valohai), from within the notebook cell using cell magic!

In [98]:
%system ./exec-step.sh "detect-sentence"

Executing step detect-sentence
Packaging /home/jovyan/work...
=>   Git not available, found 15 files to package
😸  Success! Ad-hoc code ~44cee86dfced86ab0d69df943757ffb65b46917eac1c3bbd31d42b4b138b190a already uploaded
😸  Success! Execution #24 created. See https://app.valohai.com/p/neomatrix369/nlp-java-jvm-example/execution/016eaa36-496b-204a-1690-5601932f4ac2/


In [99]:
%system ./watch-execution.sh 24

Watching counter latest
(nlp-java-jvm-example) #24                            2019-11-27T00:15:16.560006
Status: started     Step: detect-sentencCommit: ~44cee86dfce           16 events
00:15:15.23  starting job on i-08c1fa5e3a874d093, Peon 0.27.1                   
00:15:15.24  free scratch space: 400.1 GB (400096624640 B)                      
00:15:15.24  downloading repository (code)                                      
00:15:15.26  /valohai/inputs/model/en-sent.bin: found in cache, 98.5 kB         
00:15:15.26  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:15:15.27  /valohai/inputs/model/en-sent.bin: md5 sum: 3822c5f82cb4ba139284631
00:15:15.27  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:15:15.27  /valohai/inputs/model/en-sent.bin: sha1 sum: 5cc6337965fa2236ad7f08
00:15:15.27  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:15:15.27  /valohai/inputs/java-program/DetectSentence.java: found in cache, 1
00:1

In [100]:
%system ./show-final-result.sh 24

Gathering output from counter latest
Nothing found, maybe it failed, maybe its still running.
Use the './watch-execution.sh latest' command to find out.


**As you can see the two ways to use the SentenceDetect API to detect sentences in a piece of text.**

Return to [Table of contents](#Table-of-contents)

### Tokenizer API

##### Show a simple example of tokenization of a sentence using a Tokenizer model called en-token.bin on a remote instance (powered by Valohai), from within the notebook cell using cell magic!

In [101]:
%system ./exec-step.sh "tokenize"

Executing step tokenize
Packaging /home/jovyan/work...
=>   Git not available, found 15 files to package
😺  Success! Ad-hoc code ~44cee86dfced86ab0d69df943757ffb65b46917eac1c3bbd31d42b4b138b190a already uploaded
😻  Success! Execution #25 created. See https://app.valohai.com/p/neomatrix369/nlp-java-jvm-example/execution/016eaa36-69ae-6dae-d7d4-784db8729ae6/


In [102]:
%system ./watch-execution.sh 25

Watching counter latest
(nlp-java-jvm-example) #25                            2019-11-27T00:15:24.864919
Status: started     Step: tokenize-senteCommit: ~44cee86dfce           16 events
00:15:23.41  starting job on i-08c1fa5e3a874d093, Peon 0.27.1                   
00:15:23.43  free scratch space: 400.0 GB (400047693824 B)                      
00:15:23.44  downloading repository (code)                                      
00:15:23.46  /valohai/inputs/java-program/Tokenize.java: found in cache, 1.2 kB 
00:15:23.46  /valohai/inputs/model/en-token.bin: found in cache, 439.9 kB       
00:15:23.46  /valohai/inputs/model/en-token.bin: md5 sum: f38628ea25fc246e99fc5e
00:15:23.46  /valohai/inputs/java-program/Tokenize.java: md5 sum: 588d4fcde491e2
00:15:23.46  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:15:23.46  /valohai/inputs/java-program/Tokenize.java: sha1 sum: 0bafc5177a992
00:15:23.47  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:1

In [103]:
%system ./show-final-result.sh 25

Gathering output from counter 15
Nothing found, maybe it failed, maybe its still running.
Use the './watch-execution.sh 15' command to find out.


Return to [Table of contents](#Table-of-contents)

### Name Finder API

##### Show a simple example of tokenization of a sentence using a Tokenizer model called en-token.bin on a remote instance (powered by Valohai), from within the notebook cell using cell magic!

In [105]:
%system ./exec-step.sh "name-finder-person"

Executing step name-finder-person
Packaging /home/jovyan/work...
=>   Git not available, found 15 files to package
😼  Success! Ad-hoc code ~44cee86dfced86ab0d69df943757ffb65b46917eac1c3bbd31d42b4b138b190a already uploaded
😎  Success! Execution #26 created. See https://app.valohai.com/p/neomatrix369/nlp-java-jvm-example/execution/016eaa36-8f84-3867-623d-33136fa30f5a/


In [106]:
%system ./watch-execution.sh 26

Watching counter latest
(nlp-java-jvm-example) #26                            2019-11-27T00:15:34.477259
Status: started     Step: name-finder-peCommit: ~44cee86dfce           16 events
00:15:33.16  starting job on i-08c1fa5e3a874d093, Peon 0.27.1                   
00:15:33.16  free scratch space: 400.0 GB (399998418944 B)                      
00:15:33.17  downloading repository (code)                                      
00:15:33.19  /valohai/inputs/model/en-ner-person.bin: found in cache, 5.2 MB    
00:15:33.19  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:15:33.19  /valohai/inputs/java-program/NameFinderPerson.java: found in cache,
00:15:33.20  /valohai/inputs/model/en-ner-person.bin: md5 sum: 909b9017a13b2d69c
00:15:33.20  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:15:33.20  /valohai/inputs/java-program/NameFinderPerson.java: md5 sum: e63fd0
00:15:33.20  /valohai/inputs/model/en-ner-person.bin: sha1 sum: 78da74299ae61b26
00:1

In [117]:
%system ./show-final-result.sh 26

Gathering output from counter latest
Nothing found, maybe it failed, maybe its still running.
Use the './watch-execution.sh latest' command to find out.


**As you can see above, it has detected the name of the person in both sentences**

Return to [Table of contents](#Table-of-contents)

### More Name Finder API examples

There are a handful more Name Finder related models i.e.

- Name Finder Date
- Name Finder Location
- Name Finder Money
- Name Finder Organization
- Name Finder Percentage
- Name Finder Time

Their model names go by these names respectively:

- en-ner-date.bin
- en-ner-location.bin
- en-ner-money.bin
- en-ner-organization.bin
- en-ner-percentage.bin
- en-ner-time.bin

and can be found at the same location all other models are found at, i.e. http://opennlp.sourceforge.net/models-1.5/

Return to [Table of contents](#Table-of-contents)

### Parts of speech (POS) Tagger API

##### Show a simple example of Parts of speech tagger on a sentence using a PoS Tagger model called en-pos-maxent.bin on a remote instance (powered by Valohai), from within the notebook cell using cell magic!

In [108]:
%system ./exec-step.sh "pos-tagger"

Executing step pos-tagger
Packaging /home/jovyan/work...
=>   Git not available, found 15 files to package
😼  Success! Ad-hoc code ~44cee86dfced86ab0d69df943757ffb65b46917eac1c3bbd31d42b4b138b190a already uploaded
😻  Success! Execution #27 created. See https://app.valohai.com/p/neomatrix369/nlp-java-jvm-example/execution/016eaa36-b070-63c3-7fc7-ff30a81b012a/


In [109]:
%system ./watch-execution.sh 27

Watching counter latest
(nlp-java-jvm-example) #27                            2019-11-27T00:15:43.805001
Status: started     Step: pos-tagger    Commit: ~44cee86dfce           16 events
00:15:42.31  starting job on i-08c1fa5e3a874d093, Peon 0.27.1                   
00:15:42.32  free scratch space: 399.9 GB (399944376320 B)                      
00:15:42.33  downloading repository (code)                                      
00:15:42.35  /valohai/inputs/java-program/PoSTagger.java: found in cache, 1.4 kB
00:15:42.35  /valohai/inputs/model/en-pos-maxent.bin: found in cache, 5.7 MB    
00:15:42.35  /valohai/inputs/java-program/PoSTagger.java: md5 sum: dca9ccc386733
00:15:42.35  /valohai/inputs/java-program/PoSTagger.java: sha1 sum: 2d12d93d7670
00:15:42.35  /valohai/inputs/model/en-pos-maxent.bin: md5 sum: db2cd70395b9e2e4c
00:15:42.35  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:15:42.36  /valohai/inputs/java-program/PoSTagger.java: sha256 sum: e9632347ec
00:1

In [118]:
%system ./show-final-result.sh 27

Gathering output from counter 27
00:15:48.87 [Started...]
00:15:49.90 Sentence: [Most, large, cities, in, the, US, had, morning, and, afternoon, newspapers, .]
00:15:49.90 [JJS, JJ, NNS, IN, DT, NNP, VBD, NN, CC, NN, NNS, .]
00:15:49.91 
00:15:49.91 Probabilities of tags:
00:15:49.91 0.6005488809717314
00:15:49.91 0.9346347227057236
00:15:49.91 0.9928943439421191
00:15:49.91 0.993711911129381
00:15:49.92 0.9959619800700815
00:15:49.92 0.9632635300742168
00:15:49.92 0.96904256131942
00:15:49.92 0.936549747737236
00:15:49.92 0.9706281118634225
00:15:49.92 0.8831901977922334
00:15:49.92 0.9711019283924753
00:15:49.92 0.9931572030890747
00:15:49.92 
00:15:49.92 Tags as sequences (contains probabilities:
00:15:49.92 [-0.9196402685290461 [JJS, JJ, NNS, IN, DT, NNP, VBD, NN, CC, NN, NNS, .], -1.4538683571912276 [RBS, JJ, NNS, IN, DT, NNP, VBD, NN, CC, NN, NNS, .], -5.124416242584632 [JJS, JJ, NNS, IN, DT, PRP, VBD, NN, CC, NN, NNS, .]]
00:15:49.92 [...Finished]


Return to [Table of contents](#Table-of-contents)

### Chunking API

##### Show a simple example of chunking on a sentence using a Chunker model called en-chunker.bin on a remote instance (powered by Valohai), from within the notebook cell using cell magic!

In [111]:
%system ./exec-step.sh "chunker"

Executing step chunker
Packaging /home/jovyan/work...
=>   Git not available, found 15 files to package
😻  Success! Ad-hoc code ~44cee86dfced86ab0d69df943757ffb65b46917eac1c3bbd31d42b4b138b190a already uploaded
😀  Success! Execution #28 created. See https://app.valohai.com/p/neomatrix369/nlp-java-jvm-example/execution/016eaa36-d425-9b92-9cbf-c88e4cfdbfe2/


In [119]:
%system ./watch-execution.sh 28

Watching counter 28
(nlp-java-jvm-example) #28                            2019-11-27T00:16:39.814364
Status: error       Step: chunker       Commit: ~44cee86dfce          560 events
00:15:57.61  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/util/
00:15:57.63  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/ml/mo
00:15:57.65  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/ml/na
00:15:57.67  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/posta
00:15:57.69  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/forma
00:15:57.71  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/forma
00:15:57.73  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/forma
00:15:57.75  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/langd
00:15:57.77  apache-opennlp-1.9.1/docs/apidocs/opennlp-morfologik-addon/opennlp/
00:15:57.79  apache-opennlp-1.9.1/docs/apidocs/opennlp-uima/opennlp/uima/util/cl
00:15:57

In [113]:
%system ./show-final-result.sh 28

Gathering output from counter latest
Nothing found, maybe it failed, maybe its still running.
Use the './watch-execution.sh latest' command to find out.


Return to [Table of contents](#Table-of-contents)

### Parsing API

##### Show a simple example of parsing chunked sentences using a Parser Chunker model called en-parser-chunking.bin on a remote instance (powered by Valohai), from within the notebook cell using cell magic!

In [114]:
%system ./exec-step.sh "parser"

Executing step parser
Packaging /home/jovyan/work...
=>   Git not available, found 15 files to package
😻  Success! Ad-hoc code ~44cee86dfced86ab0d69df943757ffb65b46917eac1c3bbd31d42b4b138b190a already uploaded
😊  Success! Execution #29 created. See https://app.valohai.com/p/neomatrix369/nlp-java-jvm-example/execution/016eaa36-f44d-3cc7-4696-d9076bb84067/


In [115]:
%system ./watch-execution.sh 29

Watching counter latest
(nlp-java-jvm-example) #29                            2019-11-27T00:16:00.365568
Status: started     Step: parser        Commit: ~44cee86dfce           16 events
00:15:59.52  starting job on i-08c1fa5e3a874d093, Peon 0.27.1                   
00:15:59.54  free scratch space: 399.8 GB (399838445568 B)                      
00:15:59.56  downloading repository (code)                                      
00:15:59.57  /valohai/inputs/java-program/Parser.java: found in cache, 1.8 kB   
00:15:59.57  /valohai/inputs/model/en-parser-chunking.bin: found in cache, 36.3 
00:15:59.57  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:15:59.57  /valohai/inputs/java-program/Parser.java: md5 sum: 2dab48976fb811c6
00:15:59.58  /valohai/inputs/java-program/Parser.java: sha1 sum: 9622dd35e505d94
00:15:59.58  /valohai/inputs/model/en-parser-chunking.bin: md5 sum: 47c1b3f4dd7d
00:15:59.58  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:1

In [116]:
%system ./show-final-result.sh 29

Gathering output from counter latest
Nothing found, maybe it failed, maybe its still running.
Use the './watch-execution.sh latest' command to find out.


Return to [Table of contents](#Table-of-contents)

### For more resources please refer to [Apache OpenNLP README](https://github.com/neomatrix369/nlp-java-jvm-example/blob/master/images/java/opennlp/README.md) and [Apache OpenNLP Resources](https://github.com/neomatrix369/nlp-java-jvm-example/blob/master/images/java/opennlp/README.md#resources).