
Ridler May 22nd, 2020
-- solution by Mark Mace

From Mark Bradwin comes a fishy puzzle about state names:

Ohio is the only state whose name doesn’t share any letters 
with the word “mackerel.” It’s strange, but it’s true.

But that isn’t the only pairing of a state and a word you can 
say that about — it’s not even the only fish! Kentucky has 
“goldfish” to itself, Montana has “jellyfish” and Delaware 
has “monkfish,” just to name a few.

What is the longest “mackerel?” That is, what is the longest 
word that doesn’t share any letters with exactly one state? 
(If multiple “mackerels” are tied for being the longest, can 
you find them all?)

Extra credit: Which state has the most “mackerels?” That is, 
which state has the most words for which it is the only state without any letters in common with those words?


In [119]:
''' 
Load libraries and word state and general word data.
'''
import numpy as np
import pandas as pd

# load and get state names
state_df = pd.read_csv('list_of_states.csv')
state_df = state_df[state_df.State != 'District of Columbia']
state_names = state_df.State.to_list()

# load Peter Norvig’s word list
all_words = []
with open('word.list.txt') as fp:
   cnt = 0
   for line in fp:
       all_words.append(line.strip())
       cnt += 1
print(f'Total word count: {cnt}')


Total word count: 263533


In [120]:
# get all words and then lengths
word_df = pd.DataFrame(all_words, columns=['word'])
word_df['length'] = word_df.word.apply(len)


In [121]:
# determine if two words have overlap using set intersection
# returns True if there is overlap, false otherwise
def overlap(string_1, string_2):
    set_1 = set(list(string_1))
    set_2 = set(list(string_2))
    if len(set_1.intersection(set_2)):
        return True
    else:
        return False


In [122]:
# find all overlap lengths between all words and state names
for state in state_names:
    word_df[state] = word_df.word.apply(lambda x: overlap(x, state))

In [123]:
# sort by word length and find first instance of zero
# word_df = word_df.sort_values(by=['length'], ascending=False)
word_df.head()

Unnamed: 0,word,length,Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,...,South Dakota,Tennessee,Texas,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming
0,aa,2,True,True,True,True,True,True,False,True,...,True,False,True,True,False,True,True,True,False,False
1,aah,3,True,True,True,True,True,True,False,True,...,True,False,True,True,False,True,True,True,False,False
2,aahed,5,True,True,True,True,True,True,True,True,...,True,True,True,True,True,True,True,True,False,False
3,aahing,6,True,True,True,True,True,True,True,True,...,True,True,True,True,True,True,True,True,True,True
4,aahs,4,True,True,True,True,True,True,False,True,...,True,True,True,True,False,True,True,True,True,False


In [124]:
# determine number of states which overlap and save to column 'no_overlap'
word_df = word_df.assign(no_overlap = lambda x: x[state_names].sum(axis=1))

In [170]:
# dataframe of longest words which have no overlap with one state
one_no_overlap_df = word_df[word_df.no_overlap == 49].sort_values(by=['length'], ascending=False)

# get absolute longest word/words
longest_words_df = one_no_overlap_df[one_no_overlap_df.length == max(one_no_overlap_df.length)]
list_of_longest_words = longest_words_df.word.to_list()
state_without_match = [longest_words_df.columns[(longest_words_df == False).iloc[i]].tolist()[0]
                       for i in range(len(longest_words_df))]



In [174]:
print('Longest word with one state which has no overlap (i.e. a mackerel)')
for word, state in list(zip(list_of_longest_words, state_without_match)):
    print(f'word: {word} (length {len(word)}), state: {state}')

Longest word with one state which has no overlap (i.e. a mackerel)
word: floccinaucinihilipilification (length 29), state: New Jersey


Repeating the same logic as before, we can get the a list of all states 

In [180]:
# get state which word doesn't overlap with for all one-overlap words
no_overlap_states = [one_no_overlap_df.columns[(one_no_overlap_df == False).iloc[i]].tolist()[0]
                       for i in range(len(one_no_overlap_df))]

In [181]:
states_ranked, n_no_overlaps = np.unique(no_overlap_states, return_counts=True)

In [185]:
print(f'The state with the most words which have no overlap is {states_ranked[np.argmax(n_no_overlaps)]} with {max(n_no_overlaps)} words.')


The state with the most words which have no overlap is Ohio with 8552 words.


In [194]:
print('Any here are all of the words for the curious')
for word in one_no_overlap_df[one_no_overlap_df[states_ranked[np.argmax(n_no_overlaps)]] == False].word.tolist():
    print(word+', ', end='')

Any here are all of the words for the curious
untranslatablenesses, transcendentalnesses, unpersuadablenesses, preternaturalnesses, unpreventablenesses, unanswerablenesses, transmutablenesses, translatablenesses, unacceptablenesses, unmanageablenesses, untranslatableness, undependablenesses, supernaturalnesses, transcendentnesses, transcendentalness, unpersuadableness, unspeakablenesses, unpeaceablenesses, respectablenesses, measurelessnesses, regrettablenesses, warrantablenesses, understatednesses, pleasurablenesses, unpreventableness, exaggeratednesses, extravagantnesses, unsteadfastnesses, unendurablenesses, unalterablenesses, untractablenesses, unbreakablenesses, unutterablenesses, preternaturalness, ultrastructurally, crestfallennesses, presentablenesses, transparentnesses, transplacentally, transversenesses, unsealablenesses, bareleggednesses, affectlessnesses, transcendentness, transcendentally, transmutableness, untameablenesses, measurablenesses, ununderstandable, translatable