# Day 8 Reading Journal

Read _Think Python_, [Chapter 9](http://greenteapress.com/thinkpython2/html/thinkpython2010.html), [13](http://www.greenteapress.com/thinkpython2/html/thinkpython2014.html)

## [Chapter 9](http://greenteapress.com/thinkpython2/html/thinkpython2010.html) Case study: word play

Download the word list for this chapter from [http://thinkpython2.com/code/words.txt](http://thinkpython2.com/code/words.txt) and save it in your ReadingJournal folder. You can then run the test code below to verify you've got the word list.

In [None]:
# Quick test: open 'words.txt' in the current folder and print the first couple entries ("aa", "aah")
fp = open("words.txt")
for i in range(2):
    word = fp.readline().strip()
    print("Word #{num}:".format(num=i), word)
fp.close()  # Close the file when we're done with it

Let's encapsulate this behavior into a helper function to use with the rest of the exercises.

In this function we use the Python `with` statement to open the file for us. We'll learn more about exactly what this does when we talk about Exceptions, but for now it's enough to say that it opens the file for us and closes it automatically at the end of the `with` block.

If you're interested, you can read more about techniques for [reading and writing files](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files).

In [None]:
def get_lines(filename):
    """
    Read all lines from `filename` and return a list of strings, 
    one per line, with whitespace stripped from the ends.
    """
    lines = []
    with open(filename) as fp:
        for line in fp:
            # Remove whitespace (or do whatever other processing you like)
            processed_line = line.strip()
            lines.append(processed_line)
            
    return lines

# Test get_lines helper function by reading word list
word_list = get_lines("words.txt")
print(word_list[78835])
            

### Chapter 9.1

Complete the `longer_than` function below and test it by finding long words in the word list.

In [None]:
def longer_than(word, threshold):
    """
    Return True if 'word' is longer than 'threshold'
    
    >>> longer_than("python", 15)
    False
    >>> longer_than("documentation", 6)
    True
    """
    return False

In [None]:
# Look for long words by using your helper function in a list comprehension
[ word for word in get_lines("words.txt") if longer_than(word, 20) ]

### Chapter 9.4

Write a function named `uses_only` that takes a word and a string of letters, and that returns `True` if the word contains only letters in the allowed string. Can you make a sentence using only the letters "acefhlo"? Other than “Hoe alfalfa?”

_Note_: If you're stuck you may want to try Exercise 2 first.

In [None]:
def uses_only(word, allowed):
    """
    Return True if 'word' contains only letters in 'allowed' string, False otherwise.
    
    >>> uses_only("software", "code")
    False
    >>> uses_only("banana", "ban")
    True
    """
    pass

### Chapter 9.7

This question is based on a Puzzler that was broadcast on the radio program [Car Talk](http://www.cartalk.com/content/puzzlers):

> Give me a word with three consecutive double letters. I’ll give you a couple of words that almost qualify, but don’t. For example, the word committee, c-o-m-m-i-t-t-e-e. It would be great except for the ‘i’ that sneaks in there. Or Mississippi: M-i-s-s-i-s-s-i-p-p-i. If you could take out those i’s it would work. But there is a word that has three consecutive pairs of letters and to the best of my knowledge this may be the only word. Of course there are probably 500 more but I can only think of one. What is the word? 

Write a program to find it. Try on your own, but if you get stuck you can review Allen's solution: http://thinkpython2.com/code/cartalk1.py.

## [Chapter 13](http://www.greenteapress.com/thinkpython2/html/thinkpython2014.html) Case study: data structure selection

The content in this chapter will be helpful for the next mini project on text mining. The exercises build on each other and get more interesting as you go. You should complete as many as your time and interest allows, and bookmark the rest to revisit as you work on MP3.

 - Section 13.3-4 gives a good example of some techniques for working with files, processing text, and doing some simple analysis. 
 - Section 13.8 and the Markov generation in Exercise 8 can be a lot of fun. 
 - Now that you know a wide range of different data structures, Section 13.9 starts to give some guidance for choosing between them
 - Section 13.10 explains Allen's "4 r's" of debugging strategy
 
This chapter is also excellent preparation for the [Word Frequency Analysis Project Toolbox](https://sd19spring.github.io/toolboxes/word-frequency-analysis) if you're looking for a straightforward toolbox to start on.

### Chapter 13.1

Write a function that reads a file and returns its contents split into a list of single word strings. 
It should break each line into words, strip whitespace and punctuation from the words, and convert them to lowercase.

Implementation tips:

 - Re-use or modify your `get_lines` function
 - Keep things simple by implementing the transformations step-by-step, possibly using helper functions
 - Consider using the [string methods](https://docs.python.org/3/library/stdtypes.html#string-methods) `strip`, `replace` and `translate`
  ```python
  >>> "banana".replace('a', 'o')
  'bonono'
  ```
 - The string module provides a string named `whitespace`, which contains space, tab, newline, etc., and `punctuation` which contains the punctuation characters.
 
  ```python
>>> import string
>>> string.punctuation
'!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~'
```

### Chapter 13.2  

Go to [Project Gutenberg](http://gutenberg.org) and download your favorite out-of-copyright book in plain text format.

Use your function from the previous exercise to read the book you downloaded and break it into individual words. 
_Note_: for best results, skip over the header information at the beginning of the file.

Write a program that can count the total number of words in the book, and the number of times each word is used.
_Hint_: You can use a modified version of your `histogram` function from Reading Journal 7.

For fun, try comparing different books by different authors, written in different eras. Which author uses the most extensive vocabulary? 