Skip to content
Branch: master
Find file History
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
README-chunking.md Updated the script to run the individual NLP features. Updated the Ja… Nov 25, 2019
README-detecting-language.md Updated the script to run the individual NLP features. Updated the Ja… Nov 25, 2019
README-detecting-sentences.md Updated the script to run the individual NLP features. Updated the Ja… Nov 25, 2019
README-exploring-nlp-java-jupyter-notebook.md Replacing all Java 9 references to Java 11 Dec 2, 2019
README-finding-names.md Updated the script to run the individual NLP features. Updated the Ja… Nov 25, 2019
README-parser.md Updated the script to run the individual NLP features. Updated the Ja… Nov 25, 2019
README-tag-parts-of-speech.md Updated the script to run the individual NLP features. Updated the Ja… Nov 25, 2019
README-tokenise.md Updated the script to run the individual NLP features. Updated the Ja… Nov 25, 2019
README.md Updated the script to run the individual NLP features. Updated the Ja… Nov 28, 2019
chunker.sh Refactored the links in the README.md and also the shell scripts across Nov 28, 2019
common-functions.sh Adding license headers to NLP related shell scripts Nov 28, 2019
detectLanguage.sh Changing all model download actions to download via the shells script… Nov 27, 2019
detectSentence.sh Changing all model download actions to download via the shells script… Nov 27, 2019
nameFinder.sh Refactored the links in the README.md and also the shell scripts across Nov 28, 2019
opennlp.sh Adding license headers to NLP related shell scripts Nov 28, 2019
parser.sh Changing all model download actions to download via the shells script… Nov 27, 2019
posTagger.sh Refactored the links in the README.md and also the shell scripts across Nov 28, 2019
tokenizer.sh Refactored the links in the README.md and also the shell scripts across Nov 28, 2019

README.md

Apache OpenNLP License NLP Java

Run a docker container with Apache OpenNLP written in Java, running under the traditional Java 11 (from OpenJDK or another source) or GraalVM (19.3.0 or higher).

Find out more about Natural Language Processing from the NLP section section.

Startup in traditional JDK or GraalVM mode, by default we are running in GraalVM mode.

Goals

  • Run the Docker container containing the Apache OpenNLP tool by using ./docker-runner --runContainer
  • Run in GraalVM mode inside the docker container by using switchToGraal at the prompt (polyglot JVM i.e. GraalVM JDK Community version from Oracle Labs) (default option)
    • To run tradition Java 11, use switchTo11 at the prompt (optional)
  • Run a number of NLP actions to explore the Apache OpenNLP tool shown below in the Exploring NLP concepts section
  • Run the Docker container in the notebook mode, containing the Apache OpenNLP tool by using ./docker-runner --notebookMode --runContainer
  • Perform NLP actions similar to the ones in the Exploring NLP concepts section
    • Exploring the Apache OpenNLP Java APIs via the notebook directly
    • Exploring the Apache OpenNLP Java APIs via the notebook with the help of remote cloud services

Exploring NLP concepts

Detecting language

Detecting language in a single line text or article (see legend of language abbreviations used).

See Detecting Language

Detecting sentences

Detecting sentences in a single line text or article.

See Detecting sentences

Finding names

Finding person name, organization name, date, time, money, location, percentage information in a single line text or article.

See Finding names

Tokenise

Tokenise a line of text or an article into it’s smaller components (i.e. words, punctuation, numbers).

See Tokenise

Parser

Parse a line of text or an article and identify groups of words or phrases that go together (see Penn Treebank tag set for legend of token types).

See Parser

Tag Parts of Speech

Tag parts of speech of each token in a line of text or an article (see Penn Treebank tag set for legend of token types), also see https://nlp.stanford.edu/software/tagger.shtml.

See Tag Parts of Speech

Chunking

Text chunking by dividing a text or an article into syntactically correlated parts of words, like noun groups, verb groups. You apply this feature on the tagged parts of speech text or article. Apply chunking on a text already tagged by PoS tagger. Also see https://nlpforhackers.io/text-chunking/.

See Chunking

Scripts provided

Go to the previous folder to find the below scripts.

All the above scripts check if the respective model(s) exist and downloads them accordingly into the shared folder.

Exploring NLP concepts from inside a Java-based Jupyter notebook

See Exploring NLP concepts from inside a Java-based Jupyter notebook

Docker image on Docker Hub (optional)

Find the NLP Java/JVM Docker Image on Docker Hub. The docker-runner.sh --pushImageToHub script pushes the image to the Docker hub and the docker-runner.sh --runContainer script runs it from the local repository. If absent, in the the local repository, it downloads this image from Docker Hub.

Resources

IJava (Jupyter interpreter)

Jupyhai

Apache OpenNLP

Other NLP Java/JVM libraries

Awesome AI/ML/DL resources

Other related posts

Contributing

Contributions are very welcome, please share back with the wider community (and get credited for it)!

Please have a look at the CONTRIBUTING guidelines, also have a read about our licensing policy.


Back to main page (table of contents)

You can’t perform that action at this time.