## Table of contents

* [Find out the version info of the underlying JDK/JVM on which this notebook is running](#Find-out-the-version-info-of-the-underlying-JDK/JVM-on-which-this-notebook-is-running)
* [Valohai command-line client](#Valohai-command-line-client)
* [Set up project using the vh client](Set-up-project-using-the-vh-client)
* Java bindings (Java API) via Valohai client
 * [Language Detector API](#Language-Detector-API)
 * [Sentence Detection API](#Sentence-Detection-API)
 * [Tokenizer API](#Tokenizer-API)
 * [Name Finder API](#Name-Finder-API)
 * [More Name Finder API examples](#More-Name-Finder-API-examples)
 * [Parts of speech (POS) Tagger API](#Parts-of-speech-(POS)-Tagger-API)
 * [Chunking API](#Chunking-API)
 * [Parsing API](#Parsing-API)

### Find out the version info of the underlying JDK/JVM on which this notebook is running

In [85]:
System.out.println("java.version: " + System.getProperty("java.version"));

java.version: 11.0.4


In [86]:
System.out.println("java.specification.version: " + System.getProperty("java.specification.version"));
System.out.println("java.runtime.version: " + System.getProperty("java.runtime.version"));

java.specification.version: 11
java.runtime.version: 11.0.4+11


In [87]:
import java.lang.management.ManagementFactory;

System.out.println("java runtime VM version: " + ManagementFactory.getRuntimeMXBean().getVmVersion());

java runtime VM version: 11.0.4+11


Return to [Table of contents](#Table-of-contents)

### Valohai command-line client

In [1]:
%system vh --help

Usage: vh [OPTIONS] COMMAND [ARGS]...

  :type ctx: click.Context

Options:
  --debug / --no-debug
  --output-format, --table-format [human|csv|tsv|scsv|psv|json]
  --valohai-host URL              Override the Valohai API host (default
                                  https://app.valohai.com/)  [env var:
                                  VALOHAI_HOST]
  --valohai-token SECRET          Use this Valohai authentication token  [env
                                  var: VALOHAI_TOKEN]
  --project UUID                  (Advanced) Override the project ID  [env
                                  var: VALOHAI_PROJECT]
  --project-mode local|remote     (Advanced) When using --project, set the
                                  project mode  [env var:
                                  VALOHAI_PROJECT_MODE]
  --project-root DIR              (Advanced) When using --project, set the
                                  project root directory  [env var:
                                  VALOHAI_PROJECT_

### Set up project using the vh client
_Your Valohai token has must have been provided (and set) during startup of the container. Without this the rest of the commands in the notebook may not work. The below commands expects it and will run successfully when it is not set in the environment._

In [1]:
%system ./create-project.sh nlp-java-jvm-example

😼  Success! Project nlp-java-jvm-example created.
🙂  Success! Linked /home/jovyan/work to nlp-java-jvm-example.


### Language Detector API

##### Show a simple example detecting a language of a sentence using a Language detecting model called langdetect-183.bin on a remote instance (powered by Valohai), from within the notebook cell using cell magic!

In [120]:
%system ./exec-step.sh "detect-language"

Executing step detect-language
Packaging /home/jovyan/work...
=>   Git not available, found 15 files to package
Uploading 26.7 kB...
😸  Success! Uploaded ad-hoc code ~a458ac4d1658b53cef4bfc49f288c9ca609ba81ddd87cdf500df412517ce218d
😼  Success! Execution #30 created. See https://app.valohai.com/p/neomatrix369/nlp-java-jvm-example/execution/016eaa3d-4eaf-db74-13e6-d68d23060117/


In [126]:
%system ./watch-execution.sh 30

Watching counter 30
(nlp-java-jvm-example) #30                            2019-11-27T00:24:13.249564
Status: started     Step: detect-languagCommit: ~a458ac4d165           19 events
00:22:55.33  starting job on i-0e65ed922b53ef596, Peon 0.27.1                   
00:22:55.36  free scratch space: 403.4 GB (403421933568 B)                      
00:22:55.37  downloading repository (code)                                      
00:22:55.38  /valohai/inputs/model/langdetect-183.bin: downloading http://mirror
00:22:55.38  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:22:55.38  /valohai/inputs/java-program/DetectLanguage.java: downloading https
00:22:55.44  pulling image neomatrix369/nlp-java:0.2                            
00:22:55.57  /valohai/inputs/java-program/DetectLanguage.java: downloaded, 1.4 k
00:22:55.57  /valohai/inputs/java-program/DetectLanguage.java: md5 sum: 5daac65a
00:22:55.58  /valohai/inputs/java-program/DetectLanguage.java: sha1 sum: 6d3786c
00:22:55

In [133]:
%system ./show-final-result.sh 30

Gathering output from counter 30
00:25:02.85 [Started...]
00:25:03.65 Sentence: This is a sample text.
00:25:03.67 Best language: lat
00:25:03.68 Best language confidence: 0.017774467481479657
00:25:03.68 
00:25:03.68 Predict languages (with confidence): [tur (0.009708737864077673), bel (0.009708737864077673), san (0.009708737864077673), ara (0.009708737864077673), mon (0.009708737864077673), tel (0.009708737864077673), sin (0.009708737864077673), pes (0.009708737864077673), min (0.009708737864077673), cmn (0.009708737864077673), aze (0.009708737864077673), fao (0.009708737864077673), ita (0.009708737864077673), ceb (0.009708737864077673), mkd (0.009708737864077673), eng (0.009708737864077673), nno (0.009708737864077673), lvs (0.009708737864077673), kor (0.009708737864077673), som (0.009708737864077673), swa (0.009708737864077673), hun (0.009708737864077673), fra (0.009708737864077673), nld (0.009708737864077673), mlt (0.009708737864077673), bak (0.009708737864077673), ekk (0.009708737

**Apparantly it detects this to be Latin, instead of English 
maybe the language detecting model needs more training.
See https://opennlp.apache.org/docs/1.9.1/manual/opennlp.html#tools.langdetect.training on how this can be achieved**

Return to [Table of contents](#Table-of-contents)

### Sentence Detection API


##### Show a simple example detecting sentences using a Sentence detecting model called en-sent.bin on a remote instance (powered by Valohai), from within the notebook cell using cell magic!

In [128]:
%system ./exec-step.sh "detect-sentence"

Executing step detect-sentence
Packaging /home/jovyan/work...
=>   Git not available, found 15 files to package
Uploading 26.7 kB...
😻  Success! Uploaded ad-hoc code ~bde3d0fc8f223f7c11a40de93a53fd945f2f83e87a7cf81c8303b5c2b3b54fb2
😸  Success! Execution #31 created. See https://app.valohai.com/p/neomatrix369/nlp-java-jvm-example/execution/016eaa3f-3166-8a28-f8f8-bd447a6c07e0/


In [134]:
%system ./watch-execution.sh 31

Watching counter 31
(nlp-java-jvm-example) #31                            2019-11-27T00:26:05.270291
Status: started     Step: detect-sentencCommit: ~bde3d0fc8f2           19 events
00:24:59.06  starting job on i-05915a3cc4afe54df, Peon 0.27.1                   
00:24:59.08  free scratch space: 403.4 GB (403421937664 B)                      
00:24:59.09  downloading repository (code)                                      
00:24:59.11  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:24:59.11  /valohai/inputs/java-program/DetectSentence.java: downloading https
00:24:59.11  /valohai/inputs/model/en-sent.bin: downloading http://opennlp.sourc
00:24:59.18  pulling image neomatrix369/nlp-java:0.2                            
00:24:59.30  /valohai/inputs/java-program/DetectSentence.java: downloaded, 1.1 k
00:24:59.30  /valohai/inputs/java-program/DetectSentence.java: md5 sum: 0c75a3d2
00:24:59.30  /valohai/inputs/java-program/DetectSentence.java: sha1 sum: 1cc6d7f
00:24:59

In [151]:
%system ./show-final-result.sh 31

Gathering output from counter 31
00:27:10.68 [Started...]
00:27:10.84 Sentence:   First sentence. Second sentence.
00:27:10.84 [First sentence., Second sentence.]
00:27:10.85 
00:27:10.85 [[2..17), [18..34)]
00:27:10.85 [...Finished]
00:27:11.84 container finished with return code 0, duration 3.011956
00:27:11.85 completed in 132.78 seconds


**As you can see the two ways to use the SentenceDetect API to detect sentences in a piece of text.**

Return to [Table of contents](#Table-of-contents)

### Tokenizer API

##### Show a simple example of tokenization of a sentence using a Tokenizer model called en-token.bin on a remote instance (powered by Valohai), from within the notebook cell using cell magic!

In [136]:
%system ./exec-step.sh "tokenize"

Executing step tokenize
Packaging /home/jovyan/work...
=>   Git not available, found 15 files to package
Uploading 26.6 kB...
😎  Success! Uploaded ad-hoc code ~b0c9a5c78af1fffa0e76a38ffd6bd3e0ce12513d1bda34482723c2f4af35e40b
😀  Success! Execution #32 created. See https://app.valohai.com/p/neomatrix369/nlp-java-jvm-example/execution/016eaa40-982a-89e5-ec0d-ca0dd955ac16/


In [137]:
%system ./watch-execution.sh 32

Watching counter 32
(nlp-java-jvm-example) #32                            2019-11-27T00:26:41.666020
Status: started     Step: tokenize-senteCommit: ~b0c9a5c78af           19 events
00:26:30.79  starting job on i-0c865f2a1773b78fd, Peon 0.27.1                   
00:26:30.81  free scratch space: 403.4 GB (403421937664 B)                      
00:26:30.82  downloading repository (code)                                      
00:26:30.83  /valohai/inputs/java-program/Tokenize.java: downloading https://raw
00:26:30.83  /valohai/inputs/model/en-token.bin: downloading http://opennlp.sour
00:26:30.83  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:26:30.89  pulling image neomatrix369/nlp-java:0.2                            
00:26:31.02  /valohai/inputs/java-program/Tokenize.java: downloaded, 1.3 kB     
00:26:31.02  /valohai/inputs/java-program/Tokenize.java: md5 sum: 0be5e9141a98e9
00:26:31.02  /valohai/inputs/java-program/Tokenize.java: sha1 sum: 34998f62e664e
00:26:31

In [152]:
%system ./show-final-result.sh 32

Gathering output from counter 32
00:28:14.32 [Started...]
00:28:14.53 Sentence: An input sample sentence.
00:28:14.53 [An, input, sample, sentence, .]
00:28:14.53 Probabilities of each of the tokens above
00:28:14.54 1.0
00:28:14.54 1.0
00:28:14.54 1.0
00:28:14.54 0.9956236737394807
00:28:14.54 1.0
00:28:14.54 
00:28:14.54 [[0..2), [3..8), [9..15), [16..24), [24..25)]
00:28:14.54 [...Finished]
00:28:15.66 container finished with return code 0, duration 3.009422
00:28:15.67 completed in 104.88 seconds


Return to [Table of contents](#Table-of-contents)

### Name Finder API

##### Show a simple example of tokenization of a sentence using a Tokenizer model called en-token.bin on a remote instance (powered by Valohai), from within the notebook cell using cell magic!

In [138]:
%system ./exec-step.sh "name-finder-person"

Executing step name-finder-person
Packaging /home/jovyan/work...
=>   Git not available, found 15 files to package
Uploading 27.2 kB...
🙂  Success! Uploaded ad-hoc code ~081836aba2221ea232a42568193fb470d0287c5931204ec392ab22c61ae399dc
😺  Success! Execution #33 created. See https://app.valohai.com/p/neomatrix369/nlp-java-jvm-example/execution/016eaa40-f36e-d0da-17d5-6eeb14c60181/


In [153]:
%system ./watch-execution.sh 33

Watching counter 33
(nlp-java-jvm-example) #33                            2019-11-27T00:29:59.675008
Status: complete    Step: name-finder-peCommit: ~081836aba22          572 events
00:28:28.88  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/util/
00:28:28.90  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/lemma
00:28:28.92  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/ml/ma
00:28:28.94  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/ml/pe
00:28:28.96  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/posta
00:28:28.98  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/forma
00:28:29.00  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/forma
00:28:29.02  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/forma
00:28:29.04  apache-opennlp-1.9.1/docs/apidocs/opennlp-brat-annotator/opennlp/br
00:28:29.06  apache-opennlp-1.9.1/docs/apidocs/opennlp-uima/opennlp/uima/tokeniz
00:28:29

In [154]:
%system ./show-final-result.sh 33

Gathering output from counter 33
00:28:30.05 [Started...]
00:28:30.85 Sentence: [Pierre, is, from, Paris, France.]
00:28:30.86 [[0..1) person]
00:28:30.87 Sentence: [John, is, from, London, England.]
00:28:30.87 [[0..1) person]
00:28:30.87 [...Finished]
00:28:31.28 container finished with return code 0, duration 3.009486
00:28:31.28 completed in 97.16 seconds


**As you can see above, it has detected the name of the person in both sentences**

Return to [Table of contents](#Table-of-contents)

### More Name Finder API examples

There are a handful more Name Finder related models i.e.

- Name Finder Date
- Name Finder Location
- Name Finder Money
- Name Finder Organization
- Name Finder Percentage
- Name Finder Time

Their model names go by these names respectively:

- en-ner-date.bin
- en-ner-location.bin
- en-ner-money.bin
- en-ner-organization.bin
- en-ner-percentage.bin
- en-ner-time.bin

and can be found at the same location all other models are found at, i.e. http://opennlp.sourceforge.net/models-1.5/

Return to [Table of contents](#Table-of-contents)

### Parts of speech (POS) Tagger API

##### Show a simple example of Parts of speech tagger on a sentence using a PoS Tagger model called en-pos-maxent.bin on a remote instance (powered by Valohai), from within the notebook cell using cell magic!

In [141]:
%system ./exec-step.sh "pos-tagger"

Executing step pos-tagger
Packaging /home/jovyan/work...
=>   Git not available, found 15 files to package
😃  Success! Ad-hoc code ~081836aba2221ea232a42568193fb470d0287c5931204ec392ab22c61ae399dc already uploaded
😻  Success! Execution #34 created. See https://app.valohai.com/p/neomatrix369/nlp-java-jvm-example/execution/016eaa41-405b-d39c-041d-834d48964c65/


In [142]:
%system ./watch-execution.sh 34

Watching counter 34
(nlp-java-jvm-example) #34                            2019-11-27T00:27:17.361272
Status: started     Step: pos-tagger    Commit: ~081836aba22           14 events
00:27:13.76  starting job on i-05915a3cc4afe54df, Peon 0.27.1                   
00:27:13.78  free scratch space: 400.3 GB (400329969664 B)                      
00:27:13.79  downloading repository (code)                                      
00:27:13.80  /valohai/inputs/model/en-pos-maxent.bin: downloading http://opennlp
00:27:13.80  /valohai/inputs/java-program/PoSTagger.java: downloading https://ra
00:27:13.81  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:27:13.81  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:27:13.81  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:27:13.82  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:27:13.87  image neomatrix369/nlp-java:0.2 was found in cache                 
00:27:13

In [155]:
%system ./show-final-result.sh 34

Gathering output from counter 34
00:27:26.23 [Started...]
00:27:27.22 Sentence: [Most, large, cities, in, the, US, had, morning, and, afternoon, newspapers, .]
00:27:27.22 [JJS, JJ, NNS, IN, DT, NNP, VBD, NN, CC, NN, NNS, .]
00:27:27.23 
00:27:27.23 Probabilities of tags:
00:27:27.23 0.6005488809717314
00:27:27.23 0.9346347227057236
00:27:27.23 0.9928943439421191
00:27:27.23 0.993711911129381
00:27:27.24 0.9959619800700815
00:27:27.24 0.9632635300742168
00:27:27.24 0.96904256131942
00:27:27.24 0.936549747737236
00:27:27.24 0.9706281118634225
00:27:27.24 0.8831901977922334
00:27:27.24 0.9711019283924753
00:27:27.24 0.9931572030890747
00:27:27.24 
00:27:27.24 Tags as sequences (contains probabilities:
00:27:27.24 [-0.9196402685290461 [JJS, JJ, NNS, IN, DT, NNP, VBD, NN, CC, NN, NNS, .], -1.4538683571912276 [RBS, JJ, NNS, IN, DT, NNP, VBD, NN, CC, NN, NNS, .], -5.124416242584632 [JJS, JJ, NNS, IN, DT, PRP, VBD, NN, CC, NN, NNS, .]]
00:27:27.24 [...Finished]


Return to [Table of contents](#Table-of-contents)

### Chunking API

##### Show a simple example of chunking on a sentence using a Chunker model called en-chunker.bin on a remote instance (powered by Valohai), from within the notebook cell using cell magic!

In [144]:
%system ./exec-step.sh "chunker"

Executing step chunker
Packaging /home/jovyan/work...
=>   Git not available, found 15 files to package
😃  Success! Ad-hoc code ~081836aba2221ea232a42568193fb470d0287c5931204ec392ab22c61ae399dc already uploaded
😼  Success! Execution #35 created. See https://app.valohai.com/p/neomatrix369/nlp-java-jvm-example/execution/016eaa41-9477-3290-ff15-8119f600cc24/


In [145]:
%system ./watch-execution.sh 35

Watching counter 35
(nlp-java-jvm-example) #35                            2019-11-27T00:27:40.393003
Status: started     Step: chunker       Commit: ~081836aba22           19 events
00:27:35.33  starting job on i-0ce501b6954a07890, Peon 0.27.1                   
00:27:35.36  free scratch space: 403.4 GB (403421937664 B)                      
00:27:35.37  downloading repository (code)                                      
00:27:35.38  /valohai/inputs/java-program/Chunker.java: downloading https://raw.
00:27:35.38  /valohai/inputs/apache-opennlp-jar/apache-opennlp-1.9.1-bin.tar.gz:
00:27:35.38  /valohai/inputs/model/en-chunker.bin: downloading http://opennlp.so
00:27:35.46  pulling image neomatrix369/nlp-java:0.2                            
00:27:35.56  /valohai/inputs/java-program/Chunker.java: downloaded, 1.7 kB      
00:27:35.57  /valohai/inputs/java-program/Chunker.java: md5 sum: cd797541f005bd4
00:27:35.57  /valohai/inputs/java-program/Chunker.java: sha1 sum: 6464c016f3ac7b
00:27:35

In [156]:
%system ./show-final-result.sh 35

Gathering output from counter 35
00:29:14.99 [Started...]
00:29:15.46 Sentence: [Rockwell, International, Corp., 's, Tulsa, unit, said, it, signed, a, tentative, agreement, extending, its, contract, with, Boeing, Co., to, provide, structural, parts, for, Boeing, 's, 747, jetliners, .]
00:29:15.46 
00:29:15.47 Tags chunked: [B-NP, I-NP, I-NP, B-NP, I-NP, I-NP, B-VP, B-NP, B-VP, B-NP, I-NP, I-NP, B-VP, B-NP, I-NP, B-PP, B-NP, I-NP, B-VP, I-VP, B-NP, I-NP, B-PP, B-NP, B-NP, I-NP, I-NP, O]
00:29:15.47 
00:29:15.47 Tags chunked (with probabilities): [-0.3533550124421968 [B-NP, I-NP, I-NP, B-NP, I-NP, I-NP, B-VP, B-NP, B-VP, B-NP, I-NP, I-NP, B-VP, B-NP, I-NP, B-PP, B-NP, I-NP, B-VP, I-VP, B-NP, I-NP, B-PP, B-NP, B-NP, I-NP, I-NP, O], -4.9833651782143225 [B-NP, I-NP, I-NP, B-NP, I-NP, I-NP, B-VP, B-NP, B-VP, B-NP, I-NP, I-NP, B-PP, B-NP, I-NP, B-PP, B-NP, I-NP, B-VP, I-VP, B-NP, I-NP, B-PP, B-NP, B-NP, I-NP, I-NP, O], -5.207232108117287 [B-NP, I-NP, I-NP, B-NP, I-NP, I-NP, B-VP, B-NP, B-VP, 

Return to [Table of contents](#Table-of-contents)

### Parsing API

##### Show a simple example of parsing chunked sentences using a Parser Chunker model called en-parser-chunking.bin on a remote instance (powered by Valohai), from within the notebook cell using cell magic!

In [1]:
%system ./exec-step.sh "parser"

Executing step parser
Packaging /home/jovyan/work...
=>   Git not available, found 15 files to package
Uploading 27.9 kB...
😸  Success! Uploaded ad-hoc code ~5fdb97fc287ff35abc100fd118818f36d778f44c47ea6d95edc3b7ed2d933875
😻  Success! Execution #41 created. See https://app.valohai.com/p/neomatrix369/nlp-java-jvm-example/execution/016eaa4a-9838-abda-0242-2c4703c96706/


In [7]:
%system ./watch-execution.sh 41

Watching counter 41
(nlp-java-jvm-example) #41                            2019-11-27T01:00:44.328579
Status: error       Step: parser        Commit: ~5fdb97fc287          588 events
00:38:37.45  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/lemma
00:38:37.47  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/ml/mo
00:38:37.49  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/ml/na
00:38:37.51  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/posta
00:38:37.53  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/forma
00:38:37.55  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/forma
00:38:37.57  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/forma
00:38:37.59  apache-opennlp-1.9.1/docs/apidocs/opennlp-tools/opennlp/tools/sentd
00:38:37.61  apache-opennlp-1.9.1/docs/apidocs/opennlp-morfologik-addon/opennlp/
00:38:37.63  apache-opennlp-1.9.1/docs/apidocs/opennlp-uima/opennlp/uima/doccat/
00:38:38

In [8]:
%system ./show-final-result.sh 41

Gathering output from counter 41
00:38:38.70 [Started...]
00:38:38.71 Exception in thread "main" java.io.FileNotFoundException: ../shared/en-parser-chunking.bin (No such file or directory)
00:38:38.71 at java.base/java.io.FileInputStream.open0(Native Method)
00:38:38.71 at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
00:38:38.71 at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
00:38:38.71 at java.base/java.io.FileInputStream.<init>(FileInputStream.java:112)
00:38:38.71 at ParserChunking.main(ParserChunking.java:14)
00:38:39.88 container finished with return code 1, duration 3.008516
00:38:39.88 completed in 73.78 seconds


Return to [Table of contents](#Table-of-contents)

### For more resources please refer to [Apache OpenNLP README](https://github.com/neomatrix369/nlp-java-jvm-example/blob/master/images/java/opennlp/README.md) and [Apache OpenNLP Resources](https://github.com/neomatrix369/nlp-java-jvm-example/blob/master/images/java/opennlp/README.md#resources).