<a href="https://colab.research.google.com/github/psb-david-petty/google-colaboratory/blob/master/spellingbee.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# `spellingbee.py`

[Spelling Bee](https://www.nytimes.com/puzzles/spelling-bee) is a puzzle in the NYTimes. The puzzle consists of seven letters in a beehive-like hexagon with one letter in the middle and six letters around the outside. The object of the puzzle is to make as many English words as possible using only the seven letters *that must include the middle letter*. For example, if the letters are `qwertyu` and the words must include `q`, then possible words are: `'equerry', 'queer', 'queerer', 'query', 'queue', 'queuer', 'qwerty', 'tuque'`. 

This project started out as an exploration of Python [`set`](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset)s through writing functions that check whether a word has *only* certain letters (`hasonly` using [`union`](https://docs.python.org/3/library/stdtypes.html#frozenset.union)) and whether that word *definitely* has a letter (`musthave` using [`intersection`](https://docs.python.org/3/library/stdtypes.html#frozenset.intersection)). It was started long enough ago (2016) that some of the code has been depricated.

**But how is it possible to find words that fit these criteria?** From [Scrabble](https://scrabble.hasbro.com/en-us)&reg; of course! Scrabble players have long used dictionaries to determine valid words and some of these are on-line (used by on-line version of word games). The classic word list is known as the [OWL](https://en.wikipedia.org/wiki/NASPA_Word_List) (Official Word List) and several are available on-line. 

Once the words in the word list are added to a [`set`](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset), it is a simple matter to see whether the letters in any word fir the [Spelling Bee](https://www.nytimes.com/puzzles/spelling-bee) criteria.

I originally downloaded word-lists I found on-line (starting with OWL3) for use in this project, but by [open-sourcing](https://bigtechquestion.com/2019/03/07/software/windows/what-does-open-sourcing-mean/) this tool, I would also have to publish the word-lists. To avoid that, I added `wordset.py` to read the word-list files from URIs (either as raw `.txt` files or from [`.zip`](https://docs.python.org/3/library/zipfile.html) files).

This [`Colab Notebook`](https://github.com/psb-david-petty/google-colaboratory/blob/master/spellingbee.ipynb) currently runs `python3 spellingbee.py -iv` so the seven letters can be input at the very bottom of the notebook. The first letter entered *must* be included in every word.

## The word-lists

| Source | Link | Description |
| --- | --- | --- |
| [dolph](https://github.com/dolph/dictionary) | [https://raw.githubusercontent.com/dolph/dictionary/master/enable1.txt](https://raw.githubusercontent.com/dolph/dictionary/master/enable1.txt) | [TK]([https://en.wikipedia.org/wiki/To_come_(publishing) |
| [dolph](https://github.com/dolph/dictionary) | [https://raw.githubusercontent.com/dolph/dictionary/master/ospd.txt](https://raw.githubusercontent.com/dolph/dictionary/master/ospd.txt) | [TK]([https://en.wikipedia.org/wiki/To_come_(publishing) |
| [dolph](https://github.com/dolph/dictionary) | [https://raw.githubusercontent.com/dolph/dictionary/master/popular.txt](https://raw.githubusercontent.com/dolph/dictionary/master/popular.txt) | [TK]([https://en.wikipedia.org/wiki/To_come_(publishing) |
| [dolph](https://github.com/dolph/dictionary) | [https://raw.githubusercontent.com/dolph/dictionary/master/unix-words](https://raw.githubusercontent.com/dolph/dictionary/master/unix-words) | [TK]([https://en.wikipedia.org/wiki/To_come_(publishing) |
| [WordGameDictionary](https://www.wordgamedictionary.com/word-lists/) | [https://www.wordgamedictionary.com/english-word-list/download/english.txt](https://www.wordgamedictionary.com/english-word-list/download/english.txt) | [TK]([https://en.wikipedia.org/wiki/To_come_(publishing) |
| [WordGameDictionary](https://www.wordgamedictionary.com/word-lists/) | [https://www.wordgamedictionary.com/sowpods/download/sowpods.txt](https://www.wordgamedictionary.com/sowpods/download/sowpods.txt) | [TK]([https://en.wikipedia.org/wiki/To_come_(publishing) |
| [WordGameDictionary](https://www.wordgamedictionary.com/word-lists/) | [https://www.wordgamedictionary.com/twl06/download/twl06.txt](https://www.wordgamedictionary.com/twl06/download/twl06.txt) | [TK]([https://en.wikipedia.org/wiki/To_come_(publishing) |
| [yawl](https://github.com/elasticdog/yawl) | [https://raw.githubusercontent.com/elasticdog/yawl/master/yawl-0.3.2.03.tar.gz](https://raw.githubusercontent.com/elasticdog/yawl/master/yawl-0.3.2.03.tar.gz) | `yawl-0.3.2.03/sigword.list` |
| [yawl](https://github.com/elasticdog/yawl) | [https://raw.githubusercontent.com/elasticdog/yawl/master/yawl-0.3.2.03.tar.gz](https://raw.githubusercontent.com/elasticdog/yawl/master/yawl-0.3.2.03.tar.gz) | `yawl-0.3.2.03/word.list` |
| [SDSawtelle](https://sdsawtelle.github.io/blog/output/scrabble-cheatsheet-with-python.html) | [https://sdsawtelle.github.io/blog/output/scrabble-cheatsheet-with-python.html](https://sdsawtelle.github.io/blog/output/scrabble-cheatsheet-with-python.html) | Python cannot directly extract `OWL3_Dictionary.7z` without additional libraries |


## Other enhancements

- I made this a command-line tool using [`optparse`](https://docs.python.org/3/library/optparse.html). (TODO: update to use [`argparse`](https://docs.python.org/3/library/argparse.html).) The command-line help is:

```
Usage: spellingbee.py {LETTERS | -i} [-l L] [-? -v]

Find spelling-bee words using LETTERS and including LETTERS[0].

Options:
  --version         show program's version number and exit
  -?, --help        show this help message and exit
  -i, --input       input LETTERS from keyboard? [False]
  -l L, --length=L  words of length >= L [5]
  -v, --verbose     log status information while processing [False]
```
- Added [`logging`](https://docs.python.org/3/howto/logging.html).
- This [`Colab Notebook`](https://github.com/psb-david-petty/google-colaboratory/blob/master/spellingbee.ipynb) was originally developed from a multi-file module &mdash; which required adapting some code and changing some [`import`](https://docs.python.org/3/reference/import.html) statements.
- The biggest addition was reading the word-lists on line from URIs, rather than publishing the word-lists myself.

In [1]:
#!/usr/bin/env python3
#
# log.py
#
import logging, tempfile

1234567890123456789012345678901234567890123456789012345678901234567890
"""
Logging module that logs to the console and a temporary log file.
"""
__all__ = ["log", "log_path", ]
__author__ = "David C. Petty"
__copyright__ = "Copyright 2021, David C. Petty"
__license__ = "https://choosealicense.com/licenses/mit/"
__version__ = "0.0.1"
__maintainer__ = "David C. Petty"
__email__ = "david_petty@psbma.org"
__status__ = "Development"

log_path = None             # Initialize global log_path for temporary log file.


def log(name, level=logging.INFO):
    """Return logger with name and level."""
    global log_path
    new_file = log_path is None

    # If name already has a logger, return it.
    if name in logging.root.manager.loggerDict:
        return logging.getLogger(name)

    FORMAT = '{asctime:s} {name:^10s} ' \
             '[{threadName:^10s}] {levelname:<8s} {message:s}'
    FORMAT = '{asctime:s} {name:^10s} {levelname:<8s} {message:s}'
    logging.basicConfig(filename='/dev/null', level=logging.NOTSET)
    logger = logging.getLogger(name)

    # Create file handler which logs messages at level.
    if new_file:
        fd, log_path = tempfile.mkstemp('.log', 'spellingbee-')
    fh = logging.FileHandler(log_path, 'a')
    fh.setLevel(level)

    # Create console handler which logs messages at level.
    ch = logging.StreamHandler()
    ch.setLevel(level)

    # Create formatter and add it to handlers.
    formatter = logging.Formatter(
        FORMAT, style='{', datefmt='%Y/%m/%d-%H:%M:%S')
    ch.setFormatter(formatter)
    fh.setFormatter(formatter)

    # Add the handlers to logger.
    logger.addHandler(ch)
    logger.addHandler(fh)

    return logger


if __name__ == '__main__':
    logger = log(__name__)
    logger.debug('D: SPAM')
    logging.debug('D: SPAM')
    logger.info('I: SPAM, SPAM')
    logger.warning('W: SPAM, SPAM, SPAM')
    logger.error('E: SPAM, SPAM, SPAM, SPAM')
    logger.critical('C: SPAM, SPAM, SPAM, SPAM, SPAM')


2021/11/14-19:15:59  __main__  INFO     I: SPAM, SPAM
2021/11/14-19:15:59  __main__  ERROR    E: SPAM, SPAM, SPAM, SPAM
2021/11/14-19:15:59  __main__  CRITICAL C: SPAM, SPAM, SPAM, SPAM, SPAM


In [2]:
#!/usr/bin.env python3
#
# word.py
#
import string

1234567890123456789012345678901234567890123456789012345678901234567890
"""
Letter utilities for solving NYTimes Spelling Bee puzzle.
"""
__all__ = ["hasonly", "musthave", "is_valid", ]
__author__ = "David C. Petty"
__copyright__ = "Copyright 2016-2021, David C. Petty"
__license__ = "https://choosealicense.com/licenses/mit/"
__version__ = "0.0.1"
__maintainer__ = "David C. Petty"
__email__ = "david_petty@psbma.org"
__status__ = "Development"


def hasonly(word, letters):
    """Return True if elements of word are only in letters, otherwise False."""
    letterset = set(letters)
    return letterset.union(set(word)) == letterset


def musthave(word, letters):
    """Return True if elements of letters are all in word, otherwise False."""
    letterset = set(letters)
    return letterset.intersection(set(word)) == letterset


# Return True if w is a (hyphenated) word that is all one case, False otherwise.
is_valid = lambda w: w and hasonly(w, string.ascii_letters + '-') \
    and (w == w.lower() or w == w.upper())


In [3]:
#!/usr/bin.env python3
#
# wordset.py
#
import os.path
# from log import log
# from word import is_valid

1234567890123456789012345678901234567890123456789012345678901234567890
"""
Functions to read wordlists from file or URI and parse them into sets.
"""
__all__ = ["wordsites", "wordfiles", ]
__author__ = "David C. Petty"
__copyright__ = "Copyright 2016-2021, David C. Petty"
__license__ = "https://choosealicense.com/licenses/mit/"
__version__ = "0.0.1"
__maintainer__ = "David C. Petty"
__email__ = "david_petty@psbma.org"
__status__ = "Development"

logger = log(__name__)  # initialize logger


# https://stackoverflow.com/a/5711095
import io, gzip, tarfile, zipfile
from urllib.request import urlopen

# https://docs.python-requests.org/en/master/
# or: requests.get(url).content

# https://docs.python.org/3/library/zipfile.html
# zipfile = ZipFile(io.BytesIO(resp.read()))
# names = zipfile.namelist()
# for name in names:
#     for line in zipfile.open(name).readlines():
#         print(line.decode('utf-8'))

# https://stackoverflow.com/a/49174340
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

from urllib.parse import urlparse

format = lambda k, w: f"{k}({len(w)}): {sorted(list(w))[: 10]} ..."

def txtwordset(uri, verbose=False):
    """Return set of words parsed from raw URI. Echo results if verbose."""
    name = os.path.basename(urlparse(uri).path)
    with urlopen(uri) as resp:
        wordset = {w.lower() for w in
            [line.decode('utf-8').strip() for line in resp.readlines()]
                if is_valid(w)}
        logger.info(format(name, wordset))
        return wordset

def zipwordsets(uri, names, wordssets, verbose=False):
    """"""
    with urlopen(uri) as resp:
        with tarfile.open(fileobj=io.BytesIO(resp.read()), mode='r:gz') as tar:
            zipname = os.path.basename(urlparse(uri).path)
            if verbose: logger.info(f"{zipname}: {tar.getnames()}")
            for path in names:
                name = os.path.basename(path)
                wordset = {w.lower() for w in
                           [line.decode('utf-8').strip() for line in
                        tar.extractfile(path).readlines()]
                           if is_valid(w)}
                logger.info(format(name, wordset))
                wordssets[name] = wordset

log_site = lambda s: logger.info(f"{'#' * 10} SITE: {s}")

def wordssites(verbose=False):
    """Return list of sets of words from:
    URI: https://raw.githubusercontent.com/dolph/dictionary/master/enable1.txt
    URI: https://raw.githubusercontent.com/dolph/dictionary/master/ospd.txt
    URI: https://raw.githubusercontent.com/dolph/dictionary/master/popular.txt
    URI: https://raw.githubusercontent.com/dolph/dictionary/master/unix-words
    URI: https://www.wordgamedictionary.com/english-word-list/download/english.txt
    URI: https://www.wordgamedictionary.com/sowpods/download/sowpods.txt
    URI: https://www.wordgamedictionary.com/twl06/download/twl06.txt
    URI: https://raw.githubusercontent.com/elasticdog/yawl/master/yawl-0.3.2.03.tar.gz yawl-0.3.2.03/sigword.list
    URI: https://raw.githubusercontent.com/elasticdog/yawl/master/yawl-0.3.2.03.tar.gz yawl-0.3.2.03/word.list
    URI: https://sdsawtelle.github.io/blog/output/scrabble-cheatsheet-with-python.html # cannot directly extract OWL3_Dictionary.7z
    """
    wordssets = dict()

    # Read word-lists from dolph URIs.
    log_site('dolph')
    for uri in [
        'https://raw.githubusercontent.com/dolph/dictionary/master/enable1.txt',
        'https://raw.githubusercontent.com/dolph/dictionary/master/ospd.txt',
        'https://raw.githubusercontent.com/dolph/dictionary/master/popular.txt',
        'https://raw.githubusercontent.com/dolph/dictionary/master/unix-words',
    ]:
        key = os.path.basename(urlparse(uri).path)
        wordssets[key] = txtwordset(uri, verbose)

    # Read word-lists from wordgamedictionary URIs.
    log_site('wordgamedictionary')
    for uri in [
        'https://www.wordgamedictionary.com/english-word-list/download/english.txt',
        'https://www.wordgamedictionary.com/sowpods/download/sowpods.txt',
        'https://www.wordgamedictionary.com/twl06/download/twl06.txt',
    ]:
        key = os.path.basename(urlparse(uri).path)
        wordssets[key] = txtwordset(uri, verbose)

    # Read word-lists from elasticdog URIs.
    log_site('elasticdog')
    uri = 'https://raw.githubusercontent.com/elasticdog/yawl/master/yawl-0.3.2.03.tar.gz'
    keys = ['yawl-0.3.2.03/word.list', 'yawl-0.3.2.03/sigword.list', ]
    zipwordsets(uri, keys, wordssets, verbose)

    return wordssets


def wordsfiles(wordsdir=os.path.dirname(os.path.abspath(globals().get('__file__', ''))),
      wordsfiles=[
          'enable1.txt', 'ospd.txt', 'popular.txt', 'unix-words',
          'english.txt', 'sowpods.txt', 'twl06.txt',
          'sigword.list', 'word.list',
          'OWL3_Dictionary.txt',
      ], verbose=False):
    """"""
    # Read word-list files from local directory into dictionary of word-sets.
    wordssets = dict()
    for wordsname in wordsfiles:
        logger.info(f"NAME: {wordsname}")
        with open(os.path.join(wordsdir, wordsname), 'r') as wordsfile:
            wordssets[wordsname] = {w.lower() for w in wordsfile.read().split('\n')
                if is_valid(w)}

    return wordssets


In [4]:
#!/usr/bin.env python3
#
# spellingbee.py
#
import itertools, optparse, os, sys
# from log import log, log_path
# from word import hasonly, musthave
# from wordset import wordsfiles, wordssites

1234567890123456789012345678901234567890123456789012345678901234567890
"""
Solution to the NYTimes Spelling Bee puzzle.
https://www.nytimes.com/puzzles/spelling-bee
"""
__all__ = ["spellingbee", ]
__author__ = "David C. Petty"
__copyright__ = "Copyright 2016-2021, David C. Petty"
__license__ = "https://choosealicense.com/licenses/mit/"
__version__ = "0.1.1"
__maintainer__ = "David C. Petty"
__email__ = "david_petty@psbma.org"
__status__ = "Development"

logger = log(__name__)  # initialize logger


def spellingbee(must, only, length):
    """Return list of spelling-bee words."""
    # Word-list files linked from:
    # https://github.com/dolph/dictionary
    # https://www.wordgamedictionary.com/sowpods/download/sowpods.txt
    # https://github.com/elasticdog/yawl
    # https://sdsawtelle.github.io/blog/output/scrabble-cheatsheet-with-python.html

    if '__file__' in globals():                         # not a Colab notebook
        wordsdict = wordsfiles()                        # locally from files
    wordsdict = wordssites()                            # on-line from sites

    # Print pairwise intersections of word-lists.
    for one, other in itertools.combinations(wordsdict, 2):
        logger.info(f"Intersection from files:"
            f"{one}: {len(wordsdict[one])}; "
            f"{other}: {len(wordsdict[other])}; "
            f"\u2229 {len(wordsdict[one].intersection(wordsdict[other]))}")
    logger.info(f"{len(set.union(*wordsdict.values()))} unique words.")

    # words is the union of all words-sets.
    words = set.union(*wordsdict.values())

    # Return list of words that must have must and have only only.
    m, o = must.lower( ), only.lower( )
    return [w for w in sorted(words)
        if len(w) >= length and musthave(w, m) and hasonly(w, o)]

# TODO: fix spacing

class SpellingbeeOptionParser( optparse.OptionParser ):
    def __init__( self, **kwargs ):
        optparse.OptionParser.__init__( self, **kwargs )
        self.remove_option( "-h" )
        self.add_option( "-?", "--help", action="help",
            help="show this help message and exit" )
    def error( self, msg ):
        name = self.get_prog_name( )
        sys.stderr.write( "{name}: error: {msg}\n\n".format( **locals( ) ) )
        self.print_help( )
        sys.exit( 2 )

def test( argv ):
    import logging
    # Parse command-line options.
    usage = "usage: %prog {LETTERS | -i} [-l L] [-? -v]"
    description = "Find spelling-bee words using LETTERS and including LETTERS[0]."
    parser = SpellingbeeOptionParser( usage=usage, description=description, version=__version__ )
    parser.add_option( "-i", "--input",
        action="store_true", dest="i", default=False,
        help="input LETTERS from keyboard? [%default]" )
    parser.add_option( "-l", "--length",
        action="store", type='int', dest="l", default=5,
        help="words of length >= L [%default]" )
    parser.add_option( "-v", "--verbose",
        action="store_true", dest="verbose", default=False,
        help="log status information while processing [%default]" )
    opts, args = parser.parse_args( args=argv[ 1: ] )
    # Process command-line options.
    len_args = 0 if opts.i else 1
    if len( args ) != len_args:
        error = f"too {'few' if len(args) < len_args else 'many'} arguments"
        parser.error( error )
    letters, = args if not opts.i else (input('SpellingBee letters: '),)
    if not opts.verbose: logging.disable(logging.INFO)
    logger.info(f"python3 {' '.join(argv)}")
    logger.info(f"LOG PATH: {log_path}")
    # Solve SpellingBee.
    solution = spellingbee( letters[ 0 ], letters, opts.l )
    # Score and print solutions.
    threePointers = [ w for w in solution if set( w ) == set( letters ) ]   # pangram
    score = len( solution ) + 2 * len( threePointers )
    print(f"Letters: {letters}")
    print(f"Words: {solution}\nPangrams: {threePointers}")
    print(f"{len(solution)} words score {score}")

if __name__ == '__main__':
    is_idle, is_pycharm, is_jupyter = (
        'idlelib' in sys.modules,
        int(os.getenv('PYCHARM', 0)),
        '__file__' not in globals()
        )
    if any((is_idle, is_pycharm, is_jupyter, )):
        # Tests for hasonly and musthave
        logger.debug(hasonly('victor', 'vteimpr'))      # False
        logger.debug(hasonly('viper', 'vteimpr'))       # True
        logger.debug(musthave('viper', 'vteimpr'))      # False
        logger.debug(musthave('primitive', 'vteimpr'))  # True
        letters = 'vteimpr'
        letters = 'mailpry'
        letters = 'uatonmi' # 2016/01/31
        letters = 'maiortu' # 2016/05/15
        letters = 'ncehikt' # 2016/07/03
        letters = 'oglntuy' # 2019/03/02
        letters = 'cehilnp' # 2019/11/10
        letters = 'lcnauif' # 2021/01/11
        letters = 'pemntil' # 2021/11/08
        letters = 'yrmaloj' # 2021/11/12
        letters = 'dmoralu' # 2021/11/14
        # test([sys.argv[0], letters, '-v', ])
        # Collab Jupyter Notebook
        test([sys.argv[0], '-iv', ])
    else:
        test(sys.argv)


SpellingBee letters: qwertyu


2021/11/14-19:16:06  __main__  INFO     python3 /usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py -iv
2021/11/14-19:16:06  __main__  INFO     LOG PATH: /tmp/spellingbee-rc_sb_zu.log
2021/11/14-19:16:06  __main__  INFO     ########## SITE: dolph
2021/11/14-19:16:08  __main__  INFO     enable1.txt(172823): ['aa', 'aah', 'aahed', 'aahing', 'aahs', 'aal', 'aalii', 'aaliis', 'aals', 'aardvark'] ...
2021/11/14-19:16:09  __main__  INFO     ospd.txt(79339): ['aa', 'aah', 'aahed', 'aahing', 'aahs', 'aal', 'aalii', 'aaliis', 'aals', 'aardvark'] ...
2021/11/14-19:16:09  __main__  INFO     popular.txt(25322): ['aa', 'aardvark', 'aargh', 'aback', 'abacus', 'abandon', 'abandoned', 'abandoning', 'abandonment', 'abandons'] ...
2021/11/14-19:16:11  __main__  INFO     unix-words(210687): ['a', 'aa', 'aal', 'aalii', 'aam', 'aardvark', 'aardwolf', 'aba', 'abac', 'abaca'] ...
2021/11/14-19:16:11  __main__  INFO     ########## SITE: wordgamedictionary
2021/11/14-19:16:14  __main__  INFO     engli

Letters: qwertyu
Words: ['equerry', 'queer', 'queerer', 'queery', 'queet', 'query', 'queue', 'queuer', 'quyte', 'qwerty', 'requere', 'truqueur', 'tuque']
Pangrams: []
13 words score 13
