Skip to content
The breeding of Siamese words, aka werewords.
Python Processing
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
experiments
siamDisplay
.gitattributes
.gitignore
Readme.md
competition_computation_0123578910.png
division_minimize_135.png
frugality_fingering_036.png
siamesor.py
siamesor_print.py
were_word_02.png

Readme.md

The Siamesor

A 'siamese words', aka 'werewords', database generator

An inquiry of words sharing letters, and letter patterns, in Python 3.

were word competition computation minimize division fingering frugality

Mechanism

The program currently run in the following way:

  • It imports words from a given dictionary (one word per line, an example is provided in the data folder, but another file can be specified by the -f, --file argument);
  • The process is by and large a brute-force triple loop, going through all the words in the dictionary, trying each padding position, then comparing that to each word in the dictionary, summarised like so:
  for each word in the dict:
    for each padding position:
      for each other word in the dict:
        check if conditions are met & save
  • It is to be noted that the program creates as many copies of the dictionary as there are padding positions needed. Then it retrieves the appropriate one in the loop, as follows:
b a n a n a       |  (padding: 0)
a v a t a r       |  (no common letter)

  b   n   n a     |  (padding: 1)
    a    a
a v   t   r       |  (two common letters)

    b a n a n a   |  (padding: 2)
a v a t a r       |  (no common letter)

(And when avatar is the first word:)

a v a t a r       |  (padding: 0)
b a n a n a       |  (no common letter)


    v   t   r     |  (padding: 1)
  a    a    a
b   n   n         |  (three common letters)

etc.
  • The process uses recursion: given one word, the given padding, the siamesor will move forward through the word, position by position, and for each position search for words that have the same letter in that position. All the words are saved and reused as a reduced dictionary for the next step. Once the process is complete (either by reaching the end of the word, the final allowed position given the padding, the constraint on the minimal number of differing letters or maximal number of common leters), the machine checks whether there are common letters outside the given positions (we want exactly those positions to have common letters, not elsewhere), and then saves the result in what will be a dictionary.
  • The dictionary comprises keys describing the positions of the common letters, as a string. To continue with our above example, the first result would be classified under the key '1,3;2,4', as a tuple: (('banana', '1,3'),('avatar','2,4')), and the second under the key '0,2,4;1,3,5' as: (('avatar, '0,2,4'),('banana','1,3,5')).
  • The dictionary is then saved as a JSON file, the name of which reflects the chosen options (see below for more detail), in the results folder (created if not yet there).
  • I attempted to implement a multiprocessing pipeline, that parallelises the first step, the first loop through all the words, but so far it is unclear whether this improves performance significantly. Given the current time it takes to build a database, it is not so essential unless one would want to build the entire thing (all word lengths, all paddings).

Commands

Typical use

A few example cases:

  • $ python siamesor.py --equal_length --no_padding (equivalent to $ python siamesor.py -qg) will produce all possibilities for words of equal lengths, the shifting/padding mechanism being disabled;

  • $ python siamesor.py --min_length 6 --max_length 7 (equivalent to $ python siamesor.py -m 6 -M 7) will only browse through word lengths 6 to 7;

  • $ python siamesor.py --min_length 6 -intersect --max_length 7 --word supranational (equivalent to $ python siamesor.py -i -m 6 -M 7 -w supranational) will search for possibilities matching the word "supranational" within words of length 6 to 7, with the constraint that the intersect letters together should form a word from the given dictionary;

  • $ python siamesor.py --compact --processors 18 (equivalent to $ python siamesor.py -c -p 18) will search through all possibilities using your mighty 18 cores for speedy parallel computation, and store them in a compact format (the final output dictionary keys will be produced regardless of the position within the word. This is done by aligning all positions leftward to 0: hence positions '2,3' will be equivalent to, and stored under, '0,1', as will be '3,4', '4,5', etc.;

  • $ python siamesor.py --allowed_letters i --min_common 3 --verbose (equivalent to $ python siamesor.py -v -a i -C 3) will search for possibilities within equally long, unpadded words for intersect letters comprising at least three 'i's, and print every single result out to the console.

Output of the -h command

usage: siamesor.py [-h] [-m MIN_LENGTH] [-M MAX_LENGTH] [-g] [-q] [-i]
                   [-I INTERSECT_WORD] [-a ALLOWED_LETTERS]
                   [-A ALLOWED_REMAINDER] [-l] [-G MIN_PADDING]
                   [-D MIN_DIFFERENT] [-C MIN_COMMON] [-w WORD] [-c]
                   [-k STRUCTURE] [-f FILE] [-d DICT_LIMIT] [-v] [-e]
                   [-P PRINT_LIMIT] [-s SAMPLES] [-r] [-p PROCESSORS] [-t]

Find siamese words, aka werewords

optional arguments:
  -h, --help            show this help message and exit
  -m MIN_LENGTH, --min_length MIN_LENGTH
                        Minimum word length for siamese database. Defaults to
                        4.
  -M MAX_LENGTH, --max_length MAX_LENGTH
                        Max word length for siamese database. Defaults to
                        none.
  -g, --no_padding      Disable the padding mechanism shifting one out of the
                        words left/right. Defaults to False.
  -q, --equal_length    Only produces siamese using two words of equal
                        lengths. Defaults to False.
  -i, --intersect       Adds the constraint that the intersect letters must
                        form a word in the dictionary.
  -I INTERSECT_WORD, --intersect_word INTERSECT_WORD
                        Specify the word that the intersect letters must form.
  -a ALLOWED_LETTERS, --allowed_letters ALLOWED_LETTERS
                        Adds the constraint that the intersect letters must
                        only be taken from the given input. Must be comma-
                        separated, e.g. -a a,e,i,o,u,y, for vowels only.
  -A ALLOWED_REMAINDER, --allowed_remainder ALLOWED_REMAINDER
                        Adds the constraint that the remaining letters (not
                        the intersect ones, different for each word, must only
                        be taken from the given input. Must be comma-
                        separated, e.g. -a a,e,i,o,u,y, for vowels only.
  -l, --single_intersect
                        Adds the constraint that the intersect must only be
                        composed of one letter (any permitted).
  -G MIN_PADDING, --min_padding MIN_PADDING
                        Minumum overlap allowed between the two considered
                        words when shifting one left/right. Defaults to 3.
  -D MIN_DIFFERENT, --min_different MIN_DIFFERENT
                        Minumum number of differing letters between the two
                        considered words. Defaults to 2.
  -C MIN_COMMON, --min_common MIN_COMMON
                        Minumum number of common letters between the two
                        considered words. Defaults to 2.
  -w WORD, --word WORD  Search for siamese containing the specified word.
  -c, --compact         Store results in the dictionary by structure, that is,
                        take the letter positions, e.g. '1,4,5', shift them
                        leftward to zero '0,3,4'. The other option stores both
                        position data for both siamese: '1,3:3,5', except when
                        both positions are identical, in which case only one
                        will be used, e.g. '3,6'.
  -k STRUCTURE, --structure STRUCTURE
                        Search for siamese with positions equal to given
                        structure. Format: numbers separated by commas,
                        positions separated by a colon. Example: '1,2' will
                        mean that positions 1 and 2 will have to be met in
                        both words. '1,2:3,4', positions 1 and 2 for one, 3
                        and 4 for the other.
  -f FILE, --file FILE  The source dictionary file (one word per line)
  -d DICT_LIMIT, --dict_limit DICT_LIMIT
                        Maximum word length allowed when importing dictionary.
  -v, --verbose         Setting verbose to True will make the script print all
                        results to the console. Defaults to false.
  -e, --quiet           Limiting the printing to the total found. Defaults to
                        false.
  -P PRINT_LIMIT, --print_limit PRINT_LIMIT
                        Number of siamese to be printed before stopping.
                        Independent of verbose argument. Defaults to 0.
  -s SAMPLES, --samples SAMPLES
                        Number of random samples from results once database is
                        built (each 'structure' key from the final dictionary
                        will be sampled in turn). Defaults to 1.
  -r, --no_recap        Disables the recap section (printing the number of
                        siamese for each structure.
  -p PROCESSORS, --processors PROCESSORS
                        Number of cores used for parallel processing. Default:
                        number of cores detected by the multiprocessing
                        module.
  -t, --time            Calculate the total time of the program.
You can’t perform that action at this time.