## Time to play the puzzle

This week's puzzle: Take the last two letters of a state's capital city and the first two letters of the state. Rearrange them to name an activity that that state is associated with.


Start by importing the packages we'll be working with. This includes Pandas for working with the data, itertools for creating different permutations of letter combos, and NLTK for checking if something is a word.

In [37]:
import pandas as pd
import itertools
import nltk
from nltk.corpus import words

Next, we'll read in the data. We got a list of states and capitals from [Wikipedia](https://en.wikipedia.org/wiki/List_of_capitals_in_the_United_States#State_capitals). After some minor data cleaning, we save the data as a csv and then read the csv into a Pandas DataFrame.

In [38]:
# Read in the data

df = pd.read_csv('capitals.csv', header=0, delimiter=';')
df.head()

Unnamed: 0,State,City,Capital,Since,Area (mi2),Proper,MSA/µSA,CSA,Rank in state
0,Alabama,Montgomery,1846,159.8,199000,386047,476207.0,3,
1,Alaska,Juneau,1906,2716.7,32255,32255,,3,
2,Arizona,Phoenix,1912,517.6,1597000,4845832,4899104.0,1,
3,Arkansas,Little Rock,1821,116.2,206000,748031,912604.0,1,
4,California,Sacramento,1854,97.9,516000,2397382,2680831.0,6,


We really only need the State and City columns, so let's save those as series that we can work with.

In [39]:
states = df['State']
states.head()

0        Alabama
1         Alaska
2        Arizona
3       Arkansas
4     California
Name: State, dtype: object

In [40]:
cities = df['City']
cities.head()

0     Montgomery
1         Juneau
2        Phoenix
3    Little Rock
4     Sacramento
Name: City, dtype: object

Now, we want to get the first two letters of each state and the last two letters of each city. 

In [41]:
# Get the first two letters of state and last two letters of city.
letters = []
for i in range(len(states)):
    letter1 = states[i][0]
    letter2 = states[i][1]
    letter3 = cities[i][-1]
    letter4 = cities[i][-2]
    let = letter1.lower() + letter2 + letter3 + letter4
    letters.append(let)
letters

[' Ayr',
 ' Aua',
 ' Axi',
 ' Akc',
 ' Cot',
 ' Cre',
 ' Cdr',
 ' Dre',
 ' Fee',
 ' Gat',
 ' Hul',
 ' Ies',
 ' Idl',
 ' Isi',
 ' Ise',
 ' Kak',
 ' Ktr',
 ' Leg',
 ' Mat',
 ' Msi',
 ' Mno',
 ' Mgn',
 ' Mlu',
 ' Mno',
 ' Myt',
 ' Man',
 ' Nnl',
 ' Nyt',
 ' Ndr',
 ' Nno',
 ' NeF',
 ' Nyn',
 ' Nhg',
 ' Nkc',
 ' Osu',
 ' Oyt',
 ' Ome',
 ' Pgr',
 ' Rec',
 ' Sai',
 ' Ser',
 ' Tel',
 ' Tni',
 ' Uyt',
 ' Vre',
 ' Vdn',
 ' Wai',
 ' Wno',
 ' Wno',
 ' Wen']

Hmm, each of those values is only 3 letters. Looks like we must have a leading space in front of each state. Let's clean the data and then try again.

In [43]:
# Loop through states and remove space
for i in range(len(states)):
    states[i] = states[i].strip()
states

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


0            Alabama
1             Alaska
2            Arizona
3           Arkansas
4         California
5           Colorado
6        Connecticut
7           Delaware
8            Florida
9            Georgia
10            Hawaii
11             Idaho
12          Illinois
13           Indiana
14              Iowa
15            Kansas
16          Kentucky
17         Louisiana
18             Maine
19          Maryland
20     Massachusetts
21          Michigan
22         Minnesota
23       Mississippi
24          Missouri
25           Montana
26          Nebraska
27            Nevada
28     New Hampshire
29        New Jersey
30        New Mexico
31          New York
32    North Carolina
33      North Dakota
34              Ohio
35          Oklahoma
36            Oregon
37      Pennsylvania
38      Rhode Island
39    South Carolina
40      South Dakota
41         Tennessee
42             Texas
43              Utah
44           Vermont
45          Virginia
46        Washington
47     West V

Now let's try to get our letters for each state and city again.

In [45]:
# Get the first two letters of state and last two letters of city.
letters = []
for i in range(len(states)):
    letter1 = states[i][0]
    letter2 = states[i][1]
    letter3 = cities[i][-1]
    letter4 = cities[i][-2]
    let = letter1.lower() + letter2 + letter3 + letter4
    letters.append(let)
letters

['alyr',
 'alua',
 'arxi',
 'arkc',
 'caot',
 'core',
 'codr',
 'dere',
 'flee',
 'geat',
 'haul',
 'ides',
 'ildl',
 'insi',
 'iose',
 'kaak',
 'ketr',
 'loeg',
 'maat',
 'masi',
 'mano',
 'mign',
 'milu',
 'mino',
 'miyt',
 'moan',
 'nenl',
 'neyt',
 'nedr',
 'neno',
 'neeF',
 'neyn',
 'nohg',
 'nokc',
 'ohsu',
 'okyt',
 'orme',
 'pegr',
 'rhec',
 'soai',
 'soer',
 'teel',
 'teni',
 'utyt',
 'vere',
 'vidn',
 'waai',
 'weno',
 'wino',
 'wyen']

Yay, we've got our four letters! Now, we need to get the different permutations of each of these letter combinations. To do this, we're using the [itertools library](https://docs.python.org/3.1/library/itertools.html).

Essentially, for each item in our letters list, we want to find the different words that could be created by changing the order of our four letters. If I remember high school math correctly, this is that thing where we find out how many permutations there are by multiplying 4 * 3 * 2 * 1 = 24. So for each four letter set, I expect to have 24 permutations.

We create an empty dataframe where we'll store our data. Then we loop through each of the items in the letters list using the itertools permutations function. In order to get the data back in a single string so that we can assess if it's a word, we concatenate the values in each permutation to create a string we call word. We then store the word in our combos list, and create a Pandas series where the data is the combos list and the name is the corresponding state, and then add that series as a column to our df_combos dataframe.

In [46]:
#Get the different permutationss of the letters

df_combos = pd.DataFrame()
for i in range(len(letters)):
    combos = []
    st = letters[i]
    per = itertools.permutations(st)
    for val in per:
        word = (val[0]+val[1]+val[2]+val[3])
        combos.append(word)
    ser = pd.Series(data=combos, name=states[i])
    df_combos[states[i]] = ser
df_combos


Unnamed: 0,Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,Florida,Georgia,...,South Dakota,Tennessee,Texas,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming
0,alyr,alua,arxi,arkc,caot,core,codr,dere,flee,geat,...,soer,teel,teni,utyt,vere,vidn,waai,weno,wino,wyen
1,alry,alau,arix,arck,cato,coer,cord,deer,flee,geta,...,sore,tele,tein,utty,veer,vind,waia,weon,wion,wyne
2,aylr,aula,axri,akrc,coat,croe,cdor,dree,fele,gaet,...,seor,teel,tnei,uytt,vree,vdin,waai,wneo,wnio,weyn
3,ayrl,aual,axir,akcr,cota,creo,cdro,dree,feel,gate,...,sero,tele,tnie,uytt,vree,vdni,waia,wnoe,wnoi,weny
4,arly,aalu,airx,acrk,ctao,ceor,crod,deer,fele,gtea,...,sroe,tlee,tien,utty,veer,vnid,wiaa,woen,woin,wnye
5,aryl,aaul,aixr,ackr,ctoa,cero,crdo,dere,feel,gtae,...,sreo,tlee,tine,utyt,vere,vndi,wiaa,wone,woni,wney
6,layr,laua,raxi,rakc,acot,ocre,ocdr,edre,lfee,egat,...,oser,etel,etni,tuyt,evre,ivdn,awai,ewno,iwno,ywen
7,lary,laau,raix,rack,acto,ocer,ocrd,eder,lfee,egta,...,osre,etle,etin,tuty,ever,ivnd,awia,ewon,iwon,ywne
8,lyar,luaa,rxai,rkac,aoct,orce,odcr,erde,lefe,eagt,...,oesr,eetl,enti,tyut,erve,idvn,aawi,enwo,inwo,yewn
9,lyra,luaa,rxia,rkca,aotc,orec,odrc,ered,leef,eatg,...,oers,eelt,enit,tytu,erev,idnv,aaiw,enow,inow,yenw


We could stop here, perhaps exporting the data to a csv to make it easier to look at. 

In [48]:
#export to csv
df_combos.to_csv('answers.csv')

But 50 states x 24 words is 1,000 words (well, mostly non-words) to sift through. That's quite a lot to do manually!

This is where NLTK comes in. We'll [download a corpus from NLTK](https://www.nltk.org/data.html) to use as our word list to check if the strings in our dataframe are words. We're using the "words" corpus. We already imported NLTK and the corpus above, but here is the code again for reference:

In [49]:
# NLTK dictionary

import nltk
from nltk.corpus import words

I found a [relevant question on stackoverflow](https://stackoverflow.com/questions/3788870/how-to-check-if-a-word-is-an-english-word-with-python) that recommended storing the words as a set to make the code more efficient.

In [50]:
# Convert words from corpus to a set to make it faster
wordset = set(words.words())

Next, I want to create a function to test if a word is in the my set of words. I added a couple of test cases below to see the results using a word or non-word value.

In [51]:
# Function to test if a word is a word
def test(str):
    wordcheck = False
    if str in wordset:
        wordcheck = True
    return wordcheck

test('hello')
# test('helo')

True

Now that we know how to check if an item is a word, we can loop through our dataframe and check to see if each item is a word. If the item is indeed a word, we'll print it out below along with the name of the state it is associated with.

In [52]:
# Loop through df_combos
# Check if each item is a word
# If it is a word, print the label (the state) and the value (the word)

for label, content in df_combos.items():
    for index, value in content.items():
        if test(value):
            print(label + ': ' + value)



Alabama: aryl
Alabama: lyra
Alabama: yarl
Alabama: ryal
Alaska: aula
Alaska: aula
Arkansas: rack
Arkansas: cark
California: coat
Colorado: core
Colorado: cero
Connecticut: cord
Delaware: dere
Delaware: deer
Delaware: dree
Delaware: dree
Delaware: deer
Delaware: dere
Delaware: rede
Delaware: reed
Delaware: rede
Delaware: reed
Florida: flee
Florida: flee
Florida: feel
Florida: feel
Georgia: geat
Georgia: geta
Georgia: gaet
Georgia: gate
Hawaii: haul
Hawaii: hula
Idaho: ides
Idaho: desi
Idaho: side
Illinois: dill
Illinois: dill
Kansas: kaka
Kansas: kaka
Kansas: kaka
Kansas: kaka
Kentucky: trek
Louisiana: loge
Louisiana: ogle
Louisiana: egol
Louisiana: goel
Maine: atma
Maine: atma
Maryland: mias
Maryland: saim
Maryland: sima
Massachusetts: mano
Massachusetts: moan
Massachusetts: mona
Massachusetts: noma
Michigan: ming
Minnesota: limu
Mississippi: mino
Missouri: mity
Montana: moan
Montana: mona
Montana: mano
Montana: noma
New Hampshire: dern
New Hampshire: rend
New Jersey: neon
New Jersey: 

And we have it - our list of states and words! Can you spot the answer to this week's puzzle?