Skip to content

A Java-based surface realiser for Natural Language Generation in Dutch, based on SimpleNLG (https://github.com/simplenlg/simplenlg)

License

Notifications You must be signed in to change notification settings

rfdj/SimpleNLG-NL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SimpleNLG-NL

SimpleNLG-NL is a Dutch surface realiser used for Natural Language Generation in Dutch. It is based on version 1.1 of the bilingual SimpelNLG-EnFr. With that basis, it can be used for all three languages: English, French and Dutch.

The original SimpleNLG is a Java library originally developed by Ehud Reiter, Albert Gatt and Dave Westwater, of the University of Aberdeen.

The Dutch version contains multiple lexicons based on Wiktionary data. The largest lexicon has 79.438 entries. The default lexicon is reduced to 8601 words matched with the top 10.000 most common words from a word frequency list. An even smaller lexicon of 3387 entries is also provided.

SimpleNLG-NL was developed as part of the master's thesis of Ruud de Jong. The thesis describing the process can be found at the theses repository of Twente University.

Usage

To use this library, you have three options: cloning this repo, downloading the JAR release file, or import it with Maven using Jitpack. To use Jitpack, add the following repository and dependency to your POM file:

    <repositories>
        <repository>
            <id>jitpack.io</id>
            <url>https://jitpack.io</url>
        </repository>
    </repositories>
    <dependencies>
        <dependency>
            <groupId>com.github.rfdj</groupId>
            <artifactId>SimpleNLG-NL</artifactId>
            <version>1.1</version>
        </dependency>
    </dependencies>

The API is intentionally kept close to that of SimpleNLG-EnFr, which in turn is based on SimpleNLG.

A basic tutorial can be found in the wiki for SimpleNLG-NL (based on the SimpleNLG wiki).

One noteworthy addition is the DutchFeature.PREVERB feature. Separable Complex Verbs (SCVs) can be split into a preverb and a main verb (e.g. vrijkomen is split into vrij and komen). SimpleNLG-NL tries to detect SCVs, but in case it is unsuccessful, the user can set the feature on the verb or add a pipe in the verb input string, e.g. factory.createVerbPhrase("vrij|komen").

License

SimpleNLG-NL is licensed under the MPL. The Dutch lexicons are based on data from Wiktionary.org, which is licensed under the GNU Free Documentation License and the CC BY-SA 3.0.