The WordGen library allows creating random words (and names) by providing simple text files or rules for syllables and/or characters.
The library is located in the wgen-lib
module. The main parts of this library are:
-
a parser and word generator (
WordGenParser
,WordGen
) -
and a text/language scanner (
Scanner
).
The Scanner
is used to scan text files for words and to generate syllable rules (= how syllables and characters can be concatenated together). These rules are used by WordGenParser
and WordGen
to generate random words and/or names.
To use the word generator the class WordGen
has to be instanciated. The easiest method is to provide a rules file (see Syllable Rules below) and load it by using the WordGenParser
.
To create a new instance of WordGenParser
it’s recommended to use the provided WordGenParser.Builder
With WordGenParser
instanciated it’s easy to parse a rules file and create an instance of WordGen
(the word generator itself)
public WordGen fromFile(final File file) throws IOException
public WordGen fromFile(final File file, final long seed) throws IOException`
Once this is done random words are generated by using the following method of the WordGen
class:
public String nextWord(final int minLength, final int maxLength)
where minLength
and maxLength
define the minimum and maximum (both inclusive) length of a word. NOTE: the lenght of the word is not measured in characters but in syllable rules used.
To generate rules for a specific language text files can be scanned by a Scanner
. The more words are provided in a text file the better and more accurate the rules will become.
To instantiate a new Scanner
a Scanner.Builder
is provided:
final Scanner scan = new Scanner.Builder().build();
To scan files for words scanFilesForWords
is used. This method returns a Set of words found in all provided text files.
public TreeSet<String> scanFilesForWords(final File...files) throws IOException
To generate syllable rules (see blow) for provided text files scanFilesForRules
is used. This method returns a List of generated syllable rules. One rule per entry.
public List<String> scanFilesForRules(final File...files) throws IOException
To generate random words syllables have to be defined. These syllables need to be put into a simple text file. Only one syllable rule per line is allowed. Rules can be defined manually or by using a Scanner
(see above).
Valid syllable rules consist of a position rule (+
, -
, =
or nothing) and an array (list) of syllables/characters to be chosen from. The word generator randomly picks a syllable from the given list. Some examples:
-
+[a,e,i,o,u]
-
[a,e,i,o,u]
-
-[as,is,es,os,us]
Position rules define the allowed position of the syllable in a word:
-
+
only at the end of a word -
-
only at the start of a word -
=
everywhere -
no position rule means: this syllable can only occur inside a word (not allowed at start or end of a word)
-[a,e,i,o,u] [ka] [ga] [rra,tta] +[m,n]
In this example every word will begin with a vowel, because only this rule: -[a,e,i,o,u]
is allowed to start a word. The name generator will then randomly pick a vowel from the list. For all mid parts of the word the following syllable rules are relevant: [ka]
, [ga]
and [rra,tta]
. So every time a syllable is needed, that is neither the start nor the end of the word one of these three syllable rules is randomly picked. The end of a word can only be m
or n
, because only this rule: +[m,n]
is allwoed to a word.
Some output examples using this rules:
-
A
-
Ekakam
-
Igarran
-
Orragakan
-
Un
Syllable rules additionally can hold expressions and flags. Expressions are specific behaviours of syllable rules (some kind of conditions). A flag does not have any functionality by itself, but they are needed for some expressions. Expressions and flags can be appended to a syllable rule. They are seperated by a space.
-
c`, `+v` and `+n`: The next (`
) syllable needs to start with a consonant (c
) or a vowel (v
) or a number (n
) -
-c
,-v
and-n
: The previous (-
) syllable needs to end with a consonant (c
) or a vowel (v
) or a number (n
)
The word generator comes with a small set of vowels and numbers:
public static final String VOWELS = "aeiouyäöüáéíóúýàèìòùỳâêîôûŷ";
public static final String NUMBERS = "0123456789";
They can be extended by using the following methods that are provided by the Builder
:
public Builder setVowels(final String vowels)
public Builder setNumbers(final String numbers)
All other characters are treated as consonants.
-
-accept(a,b)
,accept(a,b)`: the next syllable (`
) has to start witha
orb
or the previous (-
) syllable has to end witha
orb
. -
+minlen(5)
,+maxlen(5)
: min or max length (in characters) of the current word and the next syllable. -
-minlen(5)
,-maxlen(5)
: min and max length (in characters) of the current word. -
-flag(A,B)
: the previous syllable rule needs to contain the flag:A
andB
. -
+flag(A,B)
: the next syllable rule needs to contain the flag:A
andB
. -
noRepeat`: the next syllable (`
) must not be the same as the current one. -
#A
: set the flagA
for the current syllable rule
Some examples:
Only a consonant can be appended to this syllable rule:
[a,e,i,o,u] +c
This syllable rule can only be attached to a consonant and only a consonant can be appended to this syllable rule:
[a,e,i,o,u] -c +c
Only a syllable rule that contains flag A
can be appended to this syllable rule:
[a,e,i,o,u] +flag(A)
Only a syllable rule that contains flag A
can be appended to this syllable rule. Additionally flag B
is set for this rule:
[a,e,i,o,u] +flag(A) #B
Only a syllable rule that contains flag A
can be appended to this syllable rule and only a consonant can be appended to this syllable rule. Additionally flag B
, C
and D
are set for this rule:
[a,e,i,o,u] +flag(A) +c #B #C #D
Can be found in the wgen-examples
module, including six fictive languages (three of them have been generated with the help of Scanners
).
There is also a hispanic name generator
-
Pseudo-finnish has been defined manually and uses flags to simulate vowel harmony.
-
Pseudo-english, pseudo-norwegian and pseudo-polish and pseudo_german has been generated by scanning simple text files containing words of these languages.
-
brarto and simpli are just simple manually defined languages.