#### Find out the version info of the underlying JDK/JVM on which this notebook is running

In [3]:
System.out.println("java.version: " + System.getProperty("java.version"));

java.version: 9.0.4


In [4]:
System.out.println("java.specification.version: " + System.getProperty("java.specification.version"));
System.out.println("java.runtime.version: " + System.getProperty("java.runtime.version"));

java.specification.version: 9
java.runtime.version: 9.0.4+11


In [5]:
import java.lang.management.ManagementFactory;

System.out.println("java runtime VM version: " + ManagementFactory.getRuntimeMXBean().getVmVersion());

java runtime VM version: 9.0.4+11


#### Import the Apache OpenNLP jar files located in the ../shared/apache-opennlp-1.9.1/lib/ folder

In [3]:
List<String> addedJars = %jars "../shared/apache-opennlp-1.9.1/lib/*.jar"

In [4]:
addedJars

[/home/jovyan/work/./../shared/apache-opennlp-1.9.1/lib/jackson-annotations-2.8.4.jar, /home/jovyan/work/./../shared/apache-opennlp-1.9.1/lib/morfologik-stemming-2.1.3.jar, /home/jovyan/work/./../shared/apache-opennlp-1.9.1/lib/grizzly-http-server-2.3.28.jar, /home/jovyan/work/./../shared/apache-opennlp-1.9.1/lib/jersey-common-2.25.jar, /home/jovyan/work/./../shared/apache-opennlp-1.9.1/lib/morfologik-fsa-2.1.3.jar, /home/jovyan/work/./../shared/apache-opennlp-1.9.1/lib/jersey-media-jaxb-2.25.jar, /home/jovyan/work/./../shared/apache-opennlp-1.9.1/lib/jersey-guava-2.25.jar, /home/jovyan/work/./../shared/apache-opennlp-1.9.1/lib/hk2-api-2.5.0-b30.jar, /home/jovyan/work/./../shared/apache-opennlp-1.9.1/lib/grizzly-http-2.3.28.jar, /home/jovyan/work/./../shared/apache-opennlp-1.9.1/lib/jackson-jaxrs-base-2.8.4.jar, /home/jovyan/work/./../shared/apache-opennlp-1.9.1/lib/opennlp-morfologik-addon-1.9.1.jar, /home/jovyan/work/./../shared/apache-opennlp-1.9.1/lib/validation-api-1.1.0.Final.jar

#### In order to find out about each of the classes enlisted in this notebook please refer to the Java API JavaDocs at  https://opennlp.apache.org/docs/1.9.1/apidocs/opennlp-tools/index.html

### Language Detector API: 
##### Import the Language detecting model called langdetect-183.bin from the "../shared/" folder, and show a simple example detecting a language of a sentence

In [28]:
import opennlp.tools.langdetect.LanguageDetectorModel;
import opennlp.tools.langdetect.LanguageDetectorME;
import opennlp.tools.langdetect.LanguageDetector;
import opennlp.tools.langdetect.Language;

try (InputStream modelIn = new FileInputStream("../shared/langdetect-183.bin")) {
    LanguageDetectorModel langModel = new LanguageDetectorModel(modelIn);
    String inputText = "This is a sample text.";
    System.out.println("Sentence: " + inputText);

    // Get the most probable language
    LanguageDetector myCategorizer = new LanguageDetectorME(langModel);
    Language bestLanguage = myCategorizer.predictLanguage(inputText);
    System.out.println("Best language: " + bestLanguage.getLang());
    System.out.println("Best language confidence: " + bestLanguage.getConfidence());

    // Get an array with the most probable languages
    Language[] languages = myCategorizer.predictLanguages("");
    System.out.println("");
    System.out.println("Predict languages (with confidence): " + Arrays.toString(languages));
}

Sentence: This is a sample text.
Best language: lat
Best language confidence: 0.017774467481479657

Predict languages (with confidence): [tur (0.009708737864077673), bel (0.009708737864077673), san (0.009708737864077673), ara (0.009708737864077673), mon (0.009708737864077673), tel (0.009708737864077673), sin (0.009708737864077673), pes (0.009708737864077673), min (0.009708737864077673), cmn (0.009708737864077673), aze (0.009708737864077673), fao (0.009708737864077673), ita (0.009708737864077673), ceb (0.009708737864077673), mkd (0.009708737864077673), eng (0.009708737864077673), nno (0.009708737864077673), lvs (0.009708737864077673), kor (0.009708737864077673), som (0.009708737864077673), swa (0.009708737864077673), hun (0.009708737864077673), fra (0.009708737864077673), nld (0.009708737864077673), mlt (0.009708737864077673), bak (0.009708737864077673), ekk (0.009708737864077673), ron (0.009708737864077673), gle (0.009708737864077673), hin (0.009708737864077673), est (0.009708737864077

**Apparantly it detects this to be Latin, instead of English 
maybe the language detecting model needs more training.
See https://opennlp.apache.org/docs/1.9.1/manual/opennlp.html#tools.langdetect.training on how this can be achieved**

### Sentence Detection API
##### Import the [en] Sentence detecting model called en-sent.bin from the "../shared/" folder, and show a simple example detecting a language of a sentence

In [30]:
import opennlp.tools.sentdetect.SentenceModel;
import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.util.Span;

try (InputStream modelIn = new FileInputStream("../shared/en-sent.bin")) {
  SentenceModel model = new SentenceModel(modelIn);
  SentenceDetectorME sentenceDetector = new SentenceDetectorME(model);
  String sentence = "  First sentence. Second sentence. ";
  System.out.println("Sentence: " + sentence);
  String sentences[] = sentenceDetector.sentDetect(sentence);
  System.out.println(Arrays.toString(sentences));
  Span sentencesUsingSpan[] = sentenceDetector.sentPosDetect(sentence);
  System.out.println();
  System.out.println(Arrays.toString(sentencesUsingSpan));
}

Sentence:   First sentence. Second sentence. 
[First sentence., Second sentence.]

[[2..17), [18..34)]


**As you can see the two ways to use the SentenceDetect API to detect sentences in a piece of text.**

#### Tokenizer API
##### Load the [en] Tokenizer model called en-token.bin from the ../shared folder

In [31]:
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.Tokenizer;
import opennlp.tools.util.Span;
import java.util.Arrays;

try(InputStream modelIn = new FileInputStream("../shared/en-token.bin")) {
    TokenizerModel model = new TokenizerModel(modelIn);
    TokenizerME tokenizer = new TokenizerME(model);
    String sentence = "An input sample sentence.";
    System.out.println("Sentence: " + sentence);    
    String tokens[] = tokenizer.tokenize(sentence);
    System.out.println(Arrays.toString(tokens));
    double tokensProbabilies[] = tokenizer.getTokenProbabilities();
    System.out.println("Probabilities of each of the tokens above");
    Arrays.stream(tokensProbabilies).forEach(System.out::println);
    System.out.println();
    Span tokensUsingSpans[] = tokenizer.tokenizePos(sentence);
    System.out.println(Arrays.toString(tokensUsingSpans));
}

Sentence: An input sample sentence.
[An, input, sample, sentence, .]
Probabilities of each of the tokens above
1.0
1.0
1.0
0.9956236737394807
1.0

[[0..2), [3..8), [9..15), [16..24), [24..25)]


### Name Finder API

In [20]:
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.util.Span;

try (InputStream modelIn = new FileInputStream("../shared/en-ner-person.bin")) {
   TokenNameFinderModel model = new TokenNameFinderModel(modelIn);
   NameFinderME nameFinder = new NameFinderME(model);
   // The sentence has to be split into words and passed to the Name finder function
   String documents[][][] = new String[][][] {{{"Pierre","is", "from", "Paris", "France."}, {"John", "is", "from", "London", "England."}}};
   for (String document[][]: documents) {
      for (String sentence[]: document) {
          System.out.println("Sentence: " + Arrays.toString(sentence));
          Span nameSpans[] = nameFinder.find(sentence);
          System.out.println(Arrays.toString(nameSpans));
      }
      nameFinder.clearAdaptiveData();
   }
}

Sentence: [Pierre, is, from, Paris, France.]
[[0..1) person]
Sentence: [John, is, from, London, England.]
[[0..1) person]


**As you can see above, it has detected the name of the person in both sentences**

### Parts of speech (POS) Tagger API
##### Load the [en] PoS model called en-pos-maxent.bin from the ../shared folder

In [36]:
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.util.Sequence;
import java.util.Arrays;

try (InputStream modelIn = new FileInputStream("../shared/en-pos-maxent.bin")) {
    POSModel model = new POSModel(modelIn);
    POSTaggerME tagger = new POSTaggerME(model);

    // The sentence has to be split into words and passed to the POS Tagger function
    String sentence[] = new String[]{"Most", "large", "cities", "in", "the", "US", "had",
                             "morning", "and", "afternoon", "newspapers", "."};
    System.out.println("Sentence: " + Arrays.toString(sentence));
    String tags[] = tagger.tag(sentence);
    System.out.println(Arrays.toString(tags));
    System.out.println();
    
    System.out.println("Probabilities of tags: ");
    double tagProbabilities[] = tagger.probs();
    Arrays.stream(tagProbabilities).forEach(System.out::println);
    System.out.println();
    
    System.out.println("Tags as sequences (contains probabilities: ");
    Sequence topSequences[] = tagger.topKSequences(sentence);
    System.out.println(Arrays.toString(topSequences));
}

[Most, large, cities, in, the, US, had, morning, and, afternoon, newspapers, .]
[JJS, JJ, NNS, IN, DT, NNP, VBD, NN, CC, NN, NNS, .]

Probabilities of tags
0.6005488809717314
0.9346347227057236
0.9928943439421191
0.993711911129381
0.9959619800700815
0.9632635300742168
0.96904256131942
0.936549747737236
0.9706281118634225
0.8831901977922334
0.9711019283924753
0.9931572030890747

Tags as sequences
[-0.9196402685290461 [JJS, JJ, NNS, IN, DT, NNP, VBD, NN, CC, NN, NNS, .], -1.4538683571912276 [RBS, JJ, NNS, IN, DT, NNP, VBD, NN, CC, NN, NNS, .], -5.124416242584632 [JJS, JJ, NNS, IN, DT, PRP, VBD, NN, CC, NN, NNS, .]]


### Chunking API
#### Load the [en] Chunker model called en-chunker.bin from the ../shared folder

In [47]:
import opennlp.tools.chunker.ChunkerModel;
import opennlp.tools.chunker.ChunkerME;
import java.util.Arrays;

try (InputStream modelIn = new FileInputStream("../shared/en-chunker.bin")){
  ChunkerModel model = new ChunkerModel(modelIn);
  ChunkerME chunker = new ChunkerME(model);

  String sentence[] = new String[] { "Rockwell", "International", "Corp.", "'s",
    "Tulsa", "unit", "said", "it", "signed", "a", "tentative", "agreement",
    "extending", "its", "contract", "with", "Boeing", "Co.", "to",
    "provide", "structural", "parts", "for", "Boeing", "'s", "747",
    "jetliners", "." };

  String pos[] = new String[] { "NNP", "NNP", "NNP", "POS", "NNP", "NN",
    "VBD", "PRP", "VBD", "DT", "JJ", "NN", "VBG", "PRP$", "NN", "IN",
    "NNP", "NNP", "TO", "VB", "JJ", "NNS", "IN", "NNP", "POS", "CD", "NNS",
    "." };

  String tag[] = chunker.chunk(sentence, pos);
  double probs[] = chunker.probs();
  Sequence topSequences[] = chunker.topKSequences(sentence, pos);
  
  System.out.println("Sentence: " + Arrays.toString(sentence) + "\n");
  System.out.println("Tags chunked: " + Arrays.toString(tag) + "\n");
  System.out.println("Tags chunked (with probabilities): " + Arrays.toString(topSequences) + "\n");
}

Sentence: [Rockwell, International, Corp., 's, Tulsa, unit, said, it, signed, a, tentative, agreement, extending, its, contract, with, Boeing, Co., to, provide, structural, parts, for, Boeing, 's, 747, jetliners, .]

Tags chunked: [B-NP, I-NP, I-NP, B-NP, I-NP, I-NP, B-VP, B-NP, B-VP, B-NP, I-NP, I-NP, B-VP, B-NP, I-NP, B-PP, B-NP, I-NP, B-VP, I-VP, B-NP, I-NP, B-PP, B-NP, B-NP, I-NP, I-NP, O]

Tags chunked (with probabilities): [-0.3533550124421968 [B-NP, I-NP, I-NP, B-NP, I-NP, I-NP, B-VP, B-NP, B-VP, B-NP, I-NP, I-NP, B-VP, B-NP, I-NP, B-PP, B-NP, I-NP, B-VP, I-VP, B-NP, I-NP, B-PP, B-NP, B-NP, I-NP, I-NP, O], -4.9833651782143225 [B-NP, I-NP, I-NP, B-NP, I-NP, I-NP, B-VP, B-NP, B-VP, B-NP, I-NP, I-NP, B-PP, B-NP, I-NP, B-PP, B-NP, I-NP, B-VP, I-VP, B-NP, I-NP, B-PP, B-NP, B-NP, I-NP, I-NP, O], -5.207232108117287 [B-NP, I-NP, I-NP, B-NP, I-NP, I-NP, B-VP, B-NP, B-VP, B-NP, I-NP, I-NP, I-NP, B-NP, I-NP, B-PP, B-NP, I-NP, B-VP, I-VP, B-NP, I-NP, B-PP, B-NP, B-NP, I-NP, I-NP, O], -5.250

### Parsing API
#### Load the [en] Parsing model called en-parser-chunking.bin from the ../shared folder

In [68]:
import opennlp.tools.parser.Parse;
import opennlp.tools.parser.Parser;
import opennlp.tools.parser.ParserModel;
import opennlp.tools.parser.ParserFactory;
import opennlp.tools.cmdline.parser.ParserTool; 
import java.util.Arrays;

System.out.println("Started...");
try (InputStream modelIn = new FileInputStream("../shared/en-parser-chunking.bin")){

  ParserModel model = new ParserModel(modelIn);
  Parser parser = ParserFactory.create(model);

  String sentence = "The quick brown fox jumps over the lazy dog.";
  Parse topParses[] = ParserTool.parseLine(sentence, parser, 1);
  
  System.out.println("Sentence: " + sentence + "\n");
  Arrays.stream(topParses).forEach(System.out::println);
}
System.out.println("...Finished");

Started...
Sentence: The quick brown fox jumps over the lazy dog.

The quick brown fox jumps over the lazy dog.
...Finished
