### Day 16!  Today is list comprehensions.

In [17]:
from collections import Counter
import string
import re
import requests

In [4]:
names = 'pybites mike bob julian tim sara guido'.split()

In [5]:
names

['pybites', 'mike', 'bob', 'julian', 'tim', 'sara', 'guido']

In [6]:
for name in names:
    print(name.title()) 

Pybites
Mike
Bob
Julian
Tim
Sara
Guido


---
What about names that begin with certain letters? Like, names that begin with letters A-M.  We can build a list quickly using the [string](https://docs.python.org/3/library/string.html) module.

In [9]:
listLetters = list(string.ascii_lowercase)[:13]
listLetters

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm']

In [10]:
new_names = []
for name in names:
    if name[0] in listLetters:
        new_names.append(name.title())
new_names

['Mike', 'Bob', 'Julian', 'Guido']

---
Luckily for us, there is a more pythonic way to write this!!

In [12]:
new_names2 = [name.title() for name in names if name[0] in listLetters]

In [14]:
new_names2

['Mike', 'Bob', 'Julian', 'Guido']

---
Let's look at a more real world example! 

In [19]:
txtHarryPotter = requests.get('http://projects.bobbelderbos.com/pcc/harry.txt')
wordsHarryPotter = txtHarryPotter.text.lower().split()
wordsHarryPotter[:5]

['the', 'boy', 'who', 'lived', 'mr.']

In [21]:
cnt = Counter(wordsHarryPotter)
cnt.most_common(5)

[('the', 202), ('he', 136), ('a', 108), ('and', 100), ('to', 93)]

---
Our words contains stop words and non-alphabetic characters.  Let's clean those up! 

In [24]:
words = [re.sub(r'\W+', r'', word) for word in wordsHarryPotter]

In [25]:
'-' in words

False

In [26]:
resp = requests.get('http://projects.bobbelderbos.com/pcc/stopwords.txt')
stopwords = resp.text.lower().split()
stopwords[:5]

['a', 'about', 'above', 'across', 'after']

---
Now, let's remove all of the stop words from our words list.

In [27]:
words = [word for word in words if word.strip() and word not in stopwords]
words[:5]

['boy', 'lived', 'mr', 'mrs', 'dursley']

In [28]:
'the' in words

False

---
How about we use Counter to get the most common words!

In [30]:
cnt = Counter(words)
cnt.most_common(5)

[('dursley', 45),
 ('dumbledore', 35),
 ('said', 32),
 ('mr', 30),
 ('professor', 30)]