Skip to content

mitranim/codex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GoDoc

Description

Generator of random synthetic words or names. Takes sample words, analyses them, and lazily produces a set of similar derived words. Works for any language.

Has a JavaScript port with about half the performance: foliant.

Example program using codex:

package main

import (
  "fmt"
  "github.com/Mitranim/codex"
)

func main() {
  source := []string{"jasmine", "katie", "nariko", "karen"}

  traits, err := codex.NewTraits(source)
  if err != nil {
    panic(err)
  }
  gen := traits.Generator()

  // Print twelve random words.
  for i := 0; i < 12; i++ {
    fmt.Println(gen())
  }

  // Printed (your result will be different):
  //   jarik smiko ikatik arinat nasmin katie
  //   rikatin smikas minena ikatin jasmika rinaren

  // Find out how many words can be generated from this sample.
  gen = traits.Generator()
  i := 0
  for gen() != "" {i++}
  fmt.Println("total:", i)

  // Printed:
  //   total: 392
}

Contents

Installation

In a shell:

go get github.com/Mitranim/codex

In your Go files:

import (
  "fmt"
  "github.com/Mitranim/codex"
)

func main() {
  traits, err := codex.NewTraits([]string{"sample", "pair"})
  if err != nil {
    panic(err)
  }

  gen := traits.Generator()
  for word := gen(); word != ""; word = gen() {
    fmt.Println(word)
  }
}

To test the package, cd into the package directory and run:

# Just tests
go test

To run benchmarks:

# With benchmarks
go test -bench .

API Reference

The entry point for everything is a Traits object. It takes existing words as input. Words must consist of known glyphs, as defined by the sound sets in sounds.go or by custom sets assigned to a traits struct (see the reference). If an invalid word is encountered, an error is returned.

type Traits

type Traits struct {
  // Minimum and maximum number of sounds.
  MinNSounds int
  MaxNSounds int
  // Minimum and maximum number of vowels.
  MinNVowels int
  MaxNVowels int
  // Maximum number of consequtive vowels.
  MaxConseqVow int
  // Maximum number of consequtive consonants.
  MaxConseqCons int
  // Set of sounds that occur in the words.
  SoundSet Set
  // Set of pairs of sounds that occur in the words.
  PairSet PairSet

  // Optional custom set of known sounds.
  KnownSounds Set
  // Optional custom set of known vowels.
  KnownVowels Set
}

Traits represent rudimental characteristics of a word or group of words. A traits object unequivocally defines a set of synthetic words that may be derived from them. They're produced by a generator function made with Traits.Generator().

The optional fields KnownSounds and KnownVowels specify custom sets of sounds and vowels. This lets you use codex for any character set, including non-Latin alphabets. See Traits.Examine().

NewTraits([]string) (*Traits, error)

Shortcut for creating a Traits object and calling its Examine() method. These are equivalent:

traits, err := NewTraits([]string{"mountain", "waterfall", "grotto"})

traits := &Traits{}
err := traits.Examine([]string{"mountain", "waterfall", "grotto"})

Ignore this if you're using custom sound sets (e.g. non-Latin).

Traits.Examine([]string) error

Analyses the given words and merges their attributes into self.

traits := &Traits{}
err := traits.Examine([]string{"mountain", "waterfall", "grotto"})

By default, this uses the sets of known sounds and vowels defined in sounds.go. This includes the 26 letters of the standard US English alphabet and some common digraphs like th, which are treated as single phonemes.

However, codex is language-independent. Assign custom KnownSounds and KnownVowels to teach it a sound system of your choosing. It can be Greek or Cyrillic or Elvish or Clingon — doesn't matter as long as the given sounds and vowels cover the words in your input. Refer to sounds.go as an example.

Here's how to teach it Greek:

traits := &codex.Traits{
  KnownSounds: codex.Set.New(nil,
    "α", "β", "γ", "δ", "ε", "ζ", "η", "θ", "ι", "κ", "λ", "μ",
    "ν", "ξ", "ο", "π", "ρ", "σ", "ς", "τ", "υ", "φ", "χ", "ψ", "ω"),
  KnownVowels: codex.Set.New(nil, "α", "ε", "η", "ι", "ο", "υ", "ω"),
}

traits.Examine([]string{"ελ", "διδασκω", "ελληνικο", "αλφαβητο"})

gen := traits.Generator()
for word := gen(); word != ""; word = gen() {
  fmt.Println(word)
}

// "ιδαλφ"
// "κο"
// "ηνικο"
// ...

Traits.Generator() func() string

Creates a generator function that yields a new random synthetic word on each call. The words are guaranteed to never repeat, and to be randomly distributed across the total set of possible words for these traits.

After a generator is exhausted, subsequent calls return "".

A traits object is stateless, and Generator() produces a completely new generator on each call. Generators don't affect each other.

This remains fast even for large source datasets, and is suitable for use on web servers and in other applications where responses must be quick.

traits, err := codex.NewTraits([]string{"goblin", "smoke"})
gen := traits.Generator()

for word := gen(); word != ""; word = gen() {
  fmt.Print(word, " ")
}

// moblin oblin mobli goblin smobli gobli smoke
// this generator is exhausted

ToDo / WIP

Investigation

Consider providing an option to enable reverse pairs in Traits.Examine(). Check the performance impact, particularly with large datasets.

Algorithms

Perhaps Traits.validPart() should also forbid repeated triples.

Tests

Random distribution test for the generators should verify that preceding calls may return words that contain (starting at index 0) words returned from later calls.

Readme

  • Include examples of modifying Traits fields to restrict word characteristics.
  • Document what kind of input data is allowed.