# Keeping Secrets

## Part 1: Code Style

Here's the final version of the file that was projected in class today:

In [None]:
import requests
import twitter
import wikipedia


# oliver_twist_full_text = requests.get('http://www.gutenberg.org/ebooks/730.txt.utf-8').text
# print(oliver_twist_full_text[200:])
#
# summary = wikipedia.summary("Olin College")
# summary

# sorted(set(summary.split()))

def read_key(name):
    return open(name + '.txt').read().strip()

f = open('consumer_key')
CONSUMER_KEY = .read().strip()

CONSUMER_SECRET = open('consumer_secret.txt').read().strip()
ACCESS_TOKEN_SECRET = open('access_token_secret.txt').read().strip()

ACCESS_TOKEN_KEY = '13835862-rTQP5Ur1v2KfTjQ6WTaw6lWzPnBd06C78O4EeWBmk'

api = twitter.Api(consumer_key=CONSUMER_KEY,
                  consumer_secret=CONSUMER_SECRET,
                  access_token_key=ACCESS_TOKEN_KEY,
                  access_token_secret=ACCESS_TOKEN_SECRET)
print(api.GetUserTimeline(screen_name='gvanrossum'))

I mentioned that it was sloppy, but didn't say why. It was because these lines:

    f = open('consumer_key')
    CONSUMER_KEY = .read().strip()

    CONSUMER_SECRET = open('consumer_secret.txt').read().strip()
    ACCESS_TOKEN_SECRET = open('access_token_secret.txt').read().strip()

are doing the same thing (initializing a variable from a text file) in two different ways:

    f = open('consumer_key')
    CONSUMER_KEY = .read().strip()
    
(used once) and:

    CONSUMER_SECRET = open('consumer_secret.txt').read().strip()
    
(used twice, once for `CONSUMER_SECRET` and once for `ACCESS_TOKEN_SECRET`).

This makes it look like two different things are going on, whereas it's really the same thing three times.

Here's one fix, that makes them obviously parallel.

In [None]:
CONSUMER_KEY = open('consumer_key.txt').read().strip()
CONSUMER_SECRET = open('consumer_secret.txt').read().strip()
ACCESS_TOKEN_SECRET = open('access_token_secret.txt').read().strip()

It could also have been turned into:

    f1 = open('consumer_key.txt')
    CONSUMER_KEY = f1.read().strip()

    f2 = open('consumer_secret.txt')
    CONSUMER_SECRET = f2.read().strip()

    f3 = open('access_token_secret.txt')
    ACCESS_TOKEN_SECRET = f3.read().strip()

or

    f = open('consumer_key.txt')
    CONSUMER_KEY = f.read().strip()

    f = open('consumer_secret.txt')
    CONSUMER_SECRET = f.read().strip()

    f = open('access_token_secret.txt')
    ACCESS_TOKEN_SECRET = f.read().strip()

Note that `f` in the second example takes on three different values at three different times. The *values* created by `open('consumer_key.txt')` and `open('consumer_secret.txt')` don't have anything more to do with each other in the second example than they do in the first.

(By the way, `CONSUMER_KEY` doesn't need to be kept secret. I just goofed in class and secret-ized one of the public keys first. I'm going to pretend for the rest of this notebook that `CONSUMER_KEY`, which doesn't need to be kept secret, is one of the secrets along with `CONSUMER_SECRET` and `ACCESS_TOKEN_SECRET`, which do.)

### Factoring, and DRY

The versions above all violate a principle of software engineering, [Don't Repeat Yourself](https://en.wikipedia.org/wiki/Don't_repeat_yourself) (“DRY”). The `open(...).read().strip()` pattern is repeated three times.

It's easy to introduce a minor difference into one of these repetitions, and it's inconvenient to make a change to the pattern, since you have to make the same change in three different places. (Although Atom's [Multiple Cursors](http://flight-manual.atom.io/using-atom/sections/editing-and-deleting-text/#multiple-cursors-and-selections) (https://www.sitepoint.com/12-favorite-atom-tips-and-shortcuts-to-improve-your-workflow/#multiplecursors) mitigate this.)

The solution is to [factor](https://en.wikipedia.org/wiki/Code_refactoring) the common code into a function. In this case, the three lines above become the five non-blank lines below:

In [None]:
def read_key(name):
    return open(name).read().strip()

CONSUMER_KEY = read_key('consumer_key.txt')
CONSUMER_SECRET = read_key('consumer_secret.txt')
ACCESS_TOKEN_SECRET = read_key('access_token_secret.txt')

(Factoring in code is the same idea as factoring in algebra. $2x + 2y$ has a common element (factor) $2$, which can be pulled out from the $x$ and $y$ that it applies to: $2x + 2y = 2(x + y)$.)

We could also refactor the `.txt` from the three calls to `read_key`:

In [None]:
def read_key(name):
    return open(name + '.txt').read().strip()

CONSUMER_KEY = read_key('consumer_key')
CONSUMER_SECRET = read_key('consumer_secret')
ACCESS_TOKEN_SECRET = read_key('access_token_secret')

And we could have used the longer form, with an explicit variable to hold the open file, without having to repeat *that* in three places:

In [None]:
def read_key(name):
    f = open(name + '.txt')
    return f.read().strip()

CONSUMER_KEY = read_key('consumer_key')
CONSUMER_SECRET = read_key('consumer_secret')
ACCESS_TOKEN_SECRET = read_key('access_token_secret')

Whether to use DRY or WET (look it up!) code in this case is a judgement call. On the one hand, it's nice to avoid repetition, for the reasons stated above. On the other hand, reading the block above requires chasing more values around from line to line. In this case, with so few lines so close together, the fact that it's WET isn't much of a problem – it's a judgement call about whether to DRY it here.

## Part 2: Cleaning up: file.close

As mentioned in class, the code above leaves the three files open. This prevents them from being deleted, and it uses up operating system resources. We could close them when we're done:

In [None]:
def read_key(name):
    f = open(name + '.txt')
    key = f.read().strip()
    f.close()
    return key

CONSUMER_KEY = read_key('consumer_key')
CONSUMER_SECRET = read_key('consumer_secret')
ACCESS_TOKEN_SECRET = read_key('access_token_secret')

Nos the work put into DRYing the code pays off. It would be painful to repeat the `open` / `read` / `close` pattern three times:

    f1 = open('consumer_key.txt')
    CONSUMER_KEY = f1.read().strip()
    f1.close()

    f2 = open('consumer_secret.txt')
    CONSUMER_SECRET = f2.read().strip()
    f2.close()

    f3 = open('access_token_secret.txt')
    ACCESS_TOKEN_SECRET = f3.read().strip()
    f3.close()

## Part 3: Moving to a single file

It was sugggested in class that the keys could share a single file. For example, the first line could be the `CONSUMER_KEY`, the second the `CONSUMER_SECRET`, and the third the `ACCESS_TOKEN_SECRET`.

Given a file `secrets.txt` that contains:

    cSEN7ExG7qQYBpJf4n14egJKmxPX5NxceYAbZyTkR9SH9
    3icp5X0PiDLm6KJXC1rLuUtgE
    8YquG1daLXmErnz3NXxW54nWJtZwi8AYreUJzFceevi16Kot7l

[I can put these in a public notebook now that I've regenerated the tokens.]

In [None]:
keys = open('secrets.txt').readlines()
CONSUMER_KEY = keys[0].strip()
CONSUMER_SECRET = keys[1].strip()
ACCESS_TOKEN_SECRET = keys[2].strip()

In [None]:
Another approach is to create a Python file `secrets.py`:
    
    CONSUMER_KEY = 'cSEN7ExG7qQYBpJf4n14egJKmxPX5NxceYAbZyTkR9SH9'
    CONSUMER_SECRET = '3icp5X0PiDLm6KJXC1rLuUtgE'
    ACCESS_TOKEN_SECRET = '8YquG1daLXmErnz3NXxW54nWJtZwi8AYreUJzFceevi16Kot7l'

Now I have to remember not to add `secrets.txt` or `secrets.py` to my repository. This will be the subject of a future post. (Hint: [`gitignore`]().)

Finally, I could create a more structured text file, and parse it:

`secrets.txt`:

    CONSUMER_KEY: cSEN7ExG7qQYBpJf4n14egJKmxPX5NxceYAbZyTkR9SH9
    CONSUMER_SECRET: 3icp5X0PiDLm6KJXC1rLuUtgE
    ACCESS_TOKEN_SECRET: 8YquG1daLXmErnz3NXxW54nWJtZwi8AYreUJzFceevi16Kot7l
    
CONSUMER_KEY: cSEN7ExG7qQYBpJf4n14egJKmxPX5NxceYAbZyTkR9SH9
CONSUMER_SECRET: 3icp5X0PiDLm6KJXC1rLuUtgE
ACCESS_TOKEN_SECRET: 8YquG1daLXmErnz3NXxW54nWJtZwi8AYreUJzFceevi16Kot7l

In [None]:
def read_key(name):
    f = open('secrets.txt')
    for line in f.readlines():
        if line.startswith(name):
            return line.split(':', 2)[-1].strip()
    # FIXME: this should raise an exception, but I don't think we've covered that yet

CONSUMER_KEY = read_key('CONSUMER_KEY')
CONSUMER_SECRET = read_key('CONSUMER_SECRET')
ACCESS_TOKEN_SECRET = read_key('ACCESS_TOKEN_SECRET')

## Advanced Material

### More on File Close

Consider the code from above:

    def read_key(name):
        f = open(name + '.txt')
        key = f.read().strip()
        f.close()
        return key

What happens if there's an error during the execution of the `key = f.read().strip()` line?

Then the file with never be closed. (`f.close()` won't be executed.)

Encountering an error in a \*.py file terminates the program anyway, and a program's open files are all closed when it is terminated, so this seems like it might not matter.

However, there are techniques for catching and recovering from an error. If code that calls `read_key` recovers from an error in `read_key`, it will leave the file open.

Here are a couple of techniques for ensuring that the file is closed:

In [None]:
def read_key(name):
    f = open(name + '.txt')
    try:
        key = f.read().strip()
    finally:
        f.close()
    return key

In [None]:
def read_key(name):
    with open(name + '.txt') as f:
        key = f.read().strip()
        return key

`try...finally` evaluates the code in the `try` block, and then *whether or not an error occurred* it executes the `finally` block. Then, if there *had been* an error, code *after* the `finally` block is skipped and the error is thrown to the caller, and the caller's caller, etc. until it reaches someone who handles the error (or until the whole program is exited).

`with expr as f` is the same as `f = expr`, except that something like `f.close()` is automatically called when the block is exited, whether by `return`, and exception, or just running out of lines of code in the block.

These two implementations could be further abbreviated:

In [None]:
def read_key(name):
    f = open(name + '.txt')
    try:
        return f.read().strip()
    finally:
        f.close()

In [None]:
def read_key(name):
    with open(name + '.txt') as f:
        return f.read().strip()

### Configuration files

Having a text file that contains a set of values is something that is done *all the time*. A couple of standard configuration formats are [YAML](http://yaml.org) and INI. (Some others are [JSON](http://www.json.org) and [CSON](https://github.com/bevry/cson).)

#### Parsing INI files

(Sometimes these are called just "configuration" files, and end in `.cfg`.)

```
# secrets.ini
[twitter]
CONSUMER_KEY = cSEN7ExG7qQYBpJf4n14egJKmxPX5NxceYAbZyTkR9SH9
CONSUMER_SECRET = 3icp5X0PiDLm6KJXC1rLuUtgE
ACCESS_TOKEN_SECRET = 8YquG1daLXmErnz3NXxW54nWJtZwi8AYreUJzFceevi16Kot7l
```

Using the [configparser package](https://docs.python.org/3/library/configparser.html)

In [None]:
import configparser

config = configparser.ConfigParser()
config.read('secrets.init')

def read_key(name):
    return config.get('twitter', name)

#### YAML

In [None]:
import yaml  # requires `sudo pip3 install PyYAML`

def read_key(name):
    with open('secrets.txt') as f:
        secrets = yaml.load(f)
    return secrets[name]

#### DIY

Also, back to the do-it-yourself version from the earlier section:
    
I might use more advanced techniques to write this function:

    def read_key(name):
        f = open('secrets.txt')
        for line in f.readlines():
            if line.startswith(name):
                return line.split(':', 2)[-1].strip()
        # FIXME: this should raise an exception, but I don't think we've covered that yet

thus:

In [None]:
def read_key(name):
    with open('secrets.txt') as f:
        # beware: secrets[key] has whitespace at each end
        secrets = dict(line.split(:, 2) for line in f.readlines())
    return secrets[key].strip()

In [None]:
import re

def read_key(name):
    with open('secrets.txt') as f:
        # in this version, the `secrets.keys()` doesn't contain whitespace at its ends
        secrets = dict(re.findall(r'(.+?):\s*(.+)', f.read()))
    return secrets[key]

## Environment variables

Finally, instead of reading the values from a *file*, you can read them from an *environment variable*.

Without going into detail on what that *means* (there's more [here](http://hackingthelibrary.org/assignments/lab-3/#an-introduction-to-environment-variables) if you want it), here's how to *do* this:

In the terminal, execute the following to *set* the environment variable:

    $ export CONSUMER_KEY='cSEN7ExG7qQYBpJf4n14egJKmxPX5NxceYAbZyTkR9SH9'

In your Python file:

    import os
    
    CONSUMER_KEY = os.environ['CONSUMER_KEY']
 
 Repeat for `CONSUMER_SECRET` and `ACCESS_TOKEN_SECRET`.
 
 Now run your Python program in the same terminal window where you executed the `export`:
 
     $ python text_mining.py

Having to type the `export` line (and having to remember where you put the value of the key) each time you open a new terminal window is a bother. Do this instead:

Find out which shell you're running, by executing `printenv SHELL` in a terminal window.

If you are running `bash` (the default if you followed the Get Set instruction), add the `export CONSUMER_KEY=...` and other `export` lines to the end of the file `.bashrc` in your home directory. (You can't see `.bashrc`, since it starts with a `.`. You can open it anyway, via `atom ~/.bashrc`.)

Now all new terminal windows will have these variables set.

Since these variables are set in every terminal, you may want to give them more descriptive names; for example `TWITTER_CONSUMER_KEY` etc. Then you can use the same track later for Facebook or other services, without the names colliding.

Finally, you can use project-specific enviornment variables, by setting up your system so that environment variables are automatically set when `cd` into a directory. See the [direnv](https://direnv.net) and [autoenv](https://github.com/kennethreitz/autoenv) utilities for information on how to do this.