Skip to content

WordGen Library - Configurable word and name generator and language scanner

License

Notifications You must be signed in to change notification settings

lume115/wordgen

Repository files navigation

WordGen Library (random word and name generator + language scanner)

The WordGen library allows creating random words (and names) by providing simple text files or rules for syllables and/or characters.

Usage

The library is located in the wgen-lib module. The main parts of this library are:

  • a parser and word generator (WordGenParser, WordGen)

  • and a text/language scanner (Scanner).

The Scanner is used to scan text files for words and to generate syllable rules (= how syllables and characters can be concatenated together). These rules are used by WordGenParser and WordGen to generate random words and/or names.

Parser

To use the word generator the class WordGen has to be instanciated. The easiest method is to provide a rules file (see Syllable Rules below) and load it by using the WordGenParser.

To create a new instance of WordGenParser it’s recommended to use the provided WordGenParser.Builder

With WordGenParser instanciated it’s easy to parse a rules file and create an instance of WordGen (the word generator itself)

public WordGen fromFile(final File file) throws IOException
public WordGen fromFile(final File file, final long seed) throws IOException`

Generator

Once this is done random words are generated by using the following method of the WordGen class:

public String nextWord(final int minLength, final int maxLength)

where minLength and maxLength define the minimum and maximum (both inclusive) length of a word. NOTE: the lenght of the word is not measured in characters but in syllable rules used.

Scanner

To generate rules for a specific language text files can be scanned by a Scanner. The more words are provided in a text file the better and more accurate the rules will become.

To instantiate a new Scanner a Scanner.Builder is provided:

final Scanner scan = new Scanner.Builder().build();

To scan files for words scanFilesForWords is used. This method returns a Set of words found in all provided text files.

public TreeSet<String> scanFilesForWords(final File...files) throws IOException

To generate syllable rules (see blow) for provided text files scanFilesForRules is used. This method returns a List of generated syllable rules. One rule per entry.

public List<String> scanFilesForRules(final File...files) throws IOException

Syllable Rules

To generate random words syllables have to be defined. These syllables need to be put into a simple text file. Only one syllable rule per line is allowed. Rules can be defined manually or by using a Scanner (see above).

Valid syllable rules consist of a position rule (+, -, = or nothing) and an array (list) of syllables/characters to be chosen from. The word generator randomly picks a syllable from the given list. Some examples:

  • +[a,e,i,o,u]

  • [a,e,i,o,u]

  • -[as,is,es,os,us]

Position rules define the allowed position of the syllable in a word:

  • + only at the end of a word

  • - only at the start of a word

  • = everywhere

  • no position rule means: this syllable can only occur inside a word (not allowed at start or end of a word)

Example:

-[a,e,i,o,u]
[ka]
[ga]
[rra,tta]
+[m,n]

In this example every word will begin with a vowel, because only this rule: -[a,e,i,o,u] is allowed to start a word. The name generator will then randomly pick a vowel from the list. For all mid parts of the word the following syllable rules are relevant: [ka], [ga] and [rra,tta]. So every time a syllable is needed, that is neither the start nor the end of the word one of these three syllable rules is randomly picked. The end of a word can only be m or n, because only this rule: +[m,n] is allwoed to a word.

Some output examples using this rules:

  • A

  • Ekakam

  • Igarran

  • Orragakan

  • Un

Expressions And Flags

Syllable rules additionally can hold expressions and flags. Expressions are specific behaviours of syllable rules (some kind of conditions). A flag does not have any functionality by itself, but they are needed for some expressions. Expressions and flags can be appended to a syllable rule. They are seperated by a space.

Consonat and vowel constraints

  • c`, `+v` and `+n`: The next (`) syllable needs to start with a consonant (c) or a vowel (v) or a number (n)

  • -c, -v and -n: The previous (-) syllable needs to end with a consonant (c) or a vowel (v) or a number (n)

The word generator comes with a small set of vowels and numbers:

    public static final String VOWELS = "aeiouyäöüáéíóúýàèìòùỳâêîôûŷ";

    public static final String NUMBERS = "0123456789";

They can be extended by using the following methods that are provided by the Builder:

    public Builder setVowels(final String vowels)

    public Builder setNumbers(final String numbers)

All other characters are treated as consonants.

Other expressions and flags

  • -accept(a,b), accept(a,b)`: the next syllable (`) has to start with a or b or the previous (-) syllable has to end with a or b.

  • +minlen(5), +maxlen(5): min or max length (in characters) of the current word and the next syllable.

  • -minlen(5), -maxlen(5): min and max length (in characters) of the current word.

  • -flag(A,B): the previous syllable rule needs to contain the flag: A and B.

  • +flag(A,B): the next syllable rule needs to contain the flag: A and B.

  • noRepeat`: the next syllable (`) must not be the same as the current one.

  • #A: set the flag A for the current syllable rule

Some examples:

Only a consonant can be appended to this syllable rule:

[a,e,i,o,u] +c

This syllable rule can only be attached to a consonant and only a consonant can be appended to this syllable rule:

[a,e,i,o,u] -c +c

Only a syllable rule that contains flag A can be appended to this syllable rule:

[a,e,i,o,u] +flag(A)

Only a syllable rule that contains flag A can be appended to this syllable rule. Additionally flag B is set for this rule:

[a,e,i,o,u] +flag(A) #B

Only a syllable rule that contains flag A can be appended to this syllable rule and only a consonant can be appended to this syllable rule. Additionally flag B, C and D are set for this rule:

[a,e,i,o,u] +flag(A) +c #B #C #D

More Examples

Can be found in the wgen-examples module, including six fictive languages (three of them have been generated with the help of Scanners). There is also a hispanic name generator

  • Pseudo-finnish has been defined manually and uses flags to simulate vowel harmony.

  • Pseudo-english, pseudo-norwegian and pseudo-polish and pseudo_german has been generated by scanning simple text files containing words of these languages.

  • brarto and simpli are just simple manually defined languages.

Unit Tests

Can be found in the wgen-lib module (src/test/java and src/test/resources)

About

WordGen Library - Configurable word and name generator and language scanner

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages