# Control Structures + Functions
>- Understand control structures
    - For loop
    - conditionals
    - data structures + control structures = list/dictionary comprehensions
- Create a function
We will learn these ideas by cleaning a real world dataset.

>Data is a subset taken from the New York Times interactive, [The 258 People, Places and Things Donald
Trump Has Insulted on Twitter: A Complete List](http://www.nytimes.com/interactive/2016/01/28/upshot/donald-trump-twitter-insults.html)

In [2]:
data = {'Hi.Llary Clin,ton': ['Crooked',
  'not a talented person or politician',
  'Crooked'],
 'Hillary Clinton': ['Crooked', 'Crooked'],
 'Hillary clinton': ['Crooked'],
 'New Hampshire Union Leader': ['highly unethical', 'wont survive'],
 'New Hampshire Union leader': ['failing'],
 'New Hampshire, Union Leader': ['kicked out of the ABC news debate like a dog'],
 'The System': ['rigged', 'rigged'],
 'The system': ['allowed Crooked Hillary to get away with murder'],
 'The. System.': ['Rigged', 'rigged', 'Very very unfair!']}

## Better Learning Through Doing
>Our data consists of a dictionary of a sample of Trump tweets -- specifically insults lobbed at Clinton, a NH newspaper and "the system". Notice, however, the keys are not standardized. In other words, there are three different ways Hillary is spelled, so if we want to examine the insults aimed at her, we would have to check each key. We will fix this and learn our objectives along the way. 

## Goal: 
>***One dictionary with one key each for Hillary, The system and New Hampshire Union Leader. This will involve combining these disparate keys.  ***

First, we will learn about looping and conditionals. 

Instead of printing out an entry at a time by each key, let's loop over this.

In [3]:
print("The keys are: %s" % list(data.keys()))

The keys are: ['New Hampshire Union Leader', 'New Hampshire, Union Leader', 'The System', 'Hillary clinton', 'New Hampshire Union leader', 'Hillary Clinton', 'The. System.', 'Hi.Llary Clin,ton', 'The system']


Let's store these keys in a list.

In [4]:
names = list(data.keys())

In [5]:
print (names[0], data[names[0]]) # This is impractical!

New Hampshire Union Leader ['highly unethical', 'wont survive']


## *Check your understanding*

Print out the key-value pair for all iterations of "the system". It should look similar to above. 

The for loop will save us here. 

### Loops

In [8]:
for name in names: # 'name' is a variable and you can name it whatever you want. Best practices say we should make it meanignful!
    print(name)

New Hampshire Union Leader
New Hampshire, Union Leader
The System
Hillary clinton
New Hampshire Union leader
Hillary Clinton
The. System.
Hi.Llary Clin,ton
The system


Now we will repeat the individual combination, to include keys and values from our dict.

In [9]:
for name in names:
    print (name, data[name])

New Hampshire Union Leader ['highly unethical', 'wont survive']
New Hampshire, Union Leader ['kicked out of the ABC news debate like a dog']
The System ['rigged', 'rigged']
Hillary clinton ['Crooked']
New Hampshire Union leader ['failing']
Hillary Clinton ['Crooked', 'Crooked']
The. System. ['Rigged', 'rigged', 'Very very unfair!']
Hi.Llary Clin,ton ['Crooked', 'not a talented person or politician', 'Crooked']
The system ['allowed Crooked Hillary to get away with murder']


Similarly we could iterate over position, instead of value -- index by position, should look familiar from slicing strings/lists.

In [10]:
for i in range(len(names)):
    print(names[i], data[names[i]])

New Hampshire Union Leader ['highly unethical', 'wont survive']
New Hampshire, Union Leader ['kicked out of the ABC news debate like a dog']
The System ['rigged', 'rigged']
Hillary clinton ['Crooked']
New Hampshire Union leader ['failing']
Hillary Clinton ['Crooked', 'Crooked']
The. System. ['Rigged', 'rigged', 'Very very unfair!']
Hi.Llary Clin,ton ['Crooked', 'not a talented person or politician', 'Crooked']
The system ['allowed Crooked Hillary to get away with murder']


## *Check your understanding*
Now loop over the values in the data dictionary and print them out. 

Now that we understand the for loop. Let's use it to find specific data points -- those relating to "the system". For this we will need conditionals.

### Conditionals

In [16]:
for name in names:
    if 'sys' in name.lower(): # Is this slice 'sys' in the key? If so, print it out. Notice the lower method!
        print (name)
    else:
        print(12*"=")

The System
The. System.
The system


Now we can use this to subset our data. 

In [17]:
for name in names:
    if 'sys' in name.lower(): 
        print (name, data[name])

The System ['rigged', 'rigged']
The. System. ['Rigged', 'rigged', 'Very very unfair!']
The system ['allowed Crooked Hillary to get away with murder']


We can also do this for Hillary. 

In [18]:
for name in names:
    if 'clin' in name.lower(): # Of course, if another key had the sequence 'clin' we would get that too. 
        print (name, data[name])

Hillary clinton ['Crooked']
Hillary Clinton ['Crooked', 'Crooked']
Hi.Llary Clin,ton ['Crooked', 'not a talented person or politician', 'Crooked']


## *Check your understanding*
Try subsetting by the NH newspaper.

OK, now that we see there are multiple keys each for Hillary, the system and the NH newspaper, we want to combine them. We will need to standardize these keys to move forward. 

To standardize them, let's:
* remove punctuation
* make all lower case
* remove blank spaces

We know that for strings, we can use the `.remove()` method. We could chain a method for each of these bullets, but let's be smarter about it. I once heard that in programming, if ever you do something more than once, make a function for it.

In [19]:
def fixString(x): # We have named our function fixString -- remember to use only descriptive names. 
    '''Accepts a string and returns the string in lower case and without periods, commas, colons, semicolons, single blank spaces and parentheses.
    Input: a string
    Output: a cleaned string'''
    return x.lower().replace('.', '').replace(',', '').replace(':', '').replace(';', '').replace(' ', '')

In [20]:
fixString?

In [21]:
fixString("Hi.Llary Clin,ton") #Test it

'hillaryclinton'

In [22]:
fixString('Hillary:::Clint...on')

'hillaryclinton'

## *Check your understanding*
Create a function that changes the case to upper case and counts the length of the string. `len()` will be helpful here.

These are all consistent now. But remember, we dont want to do this for each key. We know lists are mutable so let's loop over our names list and apply our function. 

In [23]:
for name in names:
    print(fixString(name)) # Looks good -- this is just printing it though, so let's actually mutate the list.

newhampshireunionleader
newhampshireunionleader
thesystem
hillaryclinton
newhampshireunionleader
hillaryclinton
thesystem
hillaryclinton
thesystem


In [24]:
for i in range(len(names)):
    names[i] = fixString(names[i])

In [25]:
names # All clean

['newhampshireunionleader',
 'newhampshireunionleader',
 'thesystem',
 'hillaryclinton',
 'newhampshireunionleader',
 'hillaryclinton',
 'thesystem',
 'hillaryclinton',
 'thesystem']

In [28]:
for i, j in zip(names, data.values()): # Use a zip object here to combine and print
    print (i,j)

newhampshireunionleader ['highly unethical', 'wont survive']
newhampshireunionleader ['kicked out of the ABC news debate like a dog']
thesystem ['rigged', 'rigged']
hillaryclinton ['Crooked']
newhampshireunionleader ['failing']
hillaryclinton ['Crooked', 'Crooked']
thesystem ['Rigged', 'rigged', 'Very very unfair!']
hillaryclinton ['Crooked', 'not a talented person or politician', 'Crooked']
thesystem ['allowed Crooked Hillary to get away with murder']


This appears good, but we will also want to have the values be standardized to lower case. 

### List Comprehension

In [32]:
[j.upper() for j in names]

['NEWHAMPSHIREUNIONLEADER',
 'NEWHAMPSHIREUNIONLEADER',
 'THESYSTEM',
 'HILLARYCLINTON',
 'NEWHAMPSHIREUNIONLEADER',
 'HILLARYCLINTON',
 'THESYSTEM',
 'HILLARYCLINTON',
 'THESYSTEM']

In [38]:
cleanValues = []
for i in data.values():
    cleanValues.append([j.lower() for j in i])

In [39]:
cleanValues

[['highly unethical', 'wont survive'],
 ['kicked out of the abc news debate like a dog'],
 ['rigged', 'rigged'],
 ['crooked'],
 ['failing'],
 ['crooked', 'crooked'],
 ['rigged', 'rigged', 'very very unfair!'],
 ['crooked', 'not a talented person or politician', 'crooked'],
 ['allowed crooked hillary to get away with murder']]

Now let's store this as a list of dictionaries -- note that at this point we cannot create a single dictionary since the values for each unique key have not been combined. 

In [40]:
dictList = []
for i, j in zip(names, cleanValues):
    print(i,j)
    dictList.append({i:j})

newhampshireunionleader ['highly unethical', 'wont survive']
newhampshireunionleader ['kicked out of the abc news debate like a dog']
thesystem ['rigged', 'rigged']
hillaryclinton ['crooked']
newhampshireunionleader ['failing']
hillaryclinton ['crooked', 'crooked']
thesystem ['rigged', 'rigged', 'very very unfair!']
hillaryclinton ['crooked', 'not a talented person or politician', 'crooked']
thesystem ['allowed crooked hillary to get away with murder']


In [41]:
dictList

[{'newhampshireunionleader': ['highly unethical', 'wont survive']},
 {'newhampshireunionleader': ['kicked out of the abc news debate like a dog']},
 {'thesystem': ['rigged', 'rigged']},
 {'hillaryclinton': ['crooked']},
 {'newhampshireunionleader': ['failing']},
 {'hillaryclinton': ['crooked', 'crooked']},
 {'thesystem': ['rigged', 'rigged', 'very very unfair!']},
 {'hillaryclinton': ['crooked',
   'not a talented person or politician',
   'crooked']},
 {'thesystem': ['allowed crooked hillary to get away with murder']}]

As an aside, another way to loop is through what are called comprehensions. These exist for lists and dictionaries and are more compact and sometimes more readable. 

In [43]:
[{i:j} for i,j in zip(names, cleanValues)] # Produces the same list of dictionaries as the above for loop -- but in one line.

[{'newhampshireunionleader': ['highly unethical', 'wont survive']},
 {'newhampshireunionleader': ['kicked out of the abc news debate like a dog']},
 {'thesystem': ['rigged', 'rigged']},
 {'hillaryclinton': ['crooked']},
 {'newhampshireunionleader': ['failing']},
 {'hillaryclinton': ['crooked', 'crooked']},
 {'thesystem': ['rigged', 'rigged', 'very very unfair!']},
 {'hillaryclinton': ['crooked',
   'not a talented person or politician',
   'crooked']},
 {'thesystem': ['allowed crooked hillary to get away with murder']}]

Finally, we want to combine this list of dictionaries into one dictionary, where there are unique keys and combined values.

In [44]:
finalDict = {}
for d in dictList:
    for key, value in d.items():
        finalDict.setdefault(key, []).extend(value)

In [45]:
finalDict

{'hillaryclinton': ['crooked',
  'crooked',
  'crooked',
  'crooked',
  'not a talented person or politician',
  'crooked'],
 'newhampshireunionleader': ['highly unethical',
  'wont survive',
  'kicked out of the abc news debate like a dog',
  'failing'],
 'thesystem': ['rigged',
  'rigged',
  'rigged',
  'rigged',
  'very very unfair!',
  'allowed crooked hillary to get away with murder']}

In [46]:
finalDict.keys()

dict_keys(['thesystem', 'newhampshireunionleader', 'hillaryclinton'])

In [47]:
finalDict.values()

dict_values([['rigged', 'rigged', 'rigged', 'rigged', 'very very unfair!', 'allowed crooked hillary to get away with murder'], ['highly unethical', 'wont survive', 'kicked out of the abc news debate like a dog', 'failing'], ['crooked', 'crooked', 'crooked', 'crooked', 'not a talented person or politician', 'crooked']])

So, with our cleaned data. We can start to analyze it. 

In [48]:
finalDict['hillaryclinton']

['crooked',
 'crooked',
 'crooked',
 'crooked',
 'not a talented person or politician',
 'crooked']

In [49]:
finalDict['thesystem']

['rigged',
 'rigged',
 'rigged',
 'rigged',
 'very very unfair!',
 'allowed crooked hillary to get away with murder']

How many times do we see rigged?

In [50]:
finalDict['thesystem'].count('rigged')

4