# Variables

Download files for today's class: 

* Storage containers for data values, like little data gift boxes

![](https://cdn.pixabay.com/photo/2016/09/14/20/48/birthday-1670415_960_720.png)

## Example Word Count Python Code

In [13]:
"""
Example Python code for
calculating word frequency
in a text file
"""

#Import Libraries and Modules

import re
from collections import Counter
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords

# Define Functions

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

# Define Filepaths and Assign Variables

filepath_of_text = "../texts/literature/The-Yellow-Wallpaper.txt"
nltk_stop_words = stopwords.words("english")
number_of_desired_words = 40

# Read in File

full_text = open(filepath_of_text).read()

# Manipulate and Analyze File

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in nltk_stop_words]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

# Output Results

print(most_frequent_meaningful_words)

[('john', 45), ('one', 33), ('said', 30), ('would', 27), ('get', 24), ('see', 24), ('room', 24), ('pattern', 24), ('paper', 23), ('like', 21), ('little', 20), ('much', 16), ('good', 16), ('think', 16), ('well', 15), ('know', 15), ('go', 15), ('really', 14), ('thing', 14), ('wallpaper', 13), ('night', 13), ('long', 12), ('course', 12), ('things', 12), ('take', 12), ('always', 12), ('could', 12), ('jennie', 12), ('great', 11), ('says', 11), ('feel', 11), ('even', 11), ('used', 11), ('dear', 11), ('time', 11), ('enough', 11), ('away', 11), ('want', 11), ('never', 10), ('must', 10)]


[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/melaniewalsh/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


# Assigning Variables

Variables are one of the fundamental building blocks of Python. A variable is like a tiny container where you store values and data — filenames, words, numbers, collections of words and numbers, etc. You can name variables almost anything you want (more on that below). The variable name will point to a value that you "assign" it. You might think about variable assignment like putting a value "into" the variable, as if the variable is a little box 🎁

You assign variables with an equals `=` sign, which is slightly confusing. In Python, a single equals sign `=` is the "assignment operator," and a double equals sign `==` is the "real" equals sign, e.g. `2 * 2 == 4`.

Let's look at some of the variables that we used when we counted the most frequent words in Charlotte Perkins Gilman's "The Yellow Wallpaper."

In [73]:
filepath_of_text = "../texts/literature/The-Yellow-Wallpaper.txt"
nltk_stop_words = stopwords.words("english")
number_of_desired_words = 40

full_text = open(filepath_of_text).read()

We made the variables:
- `filepath_of_text` and assigned it `=` the location of our "The Yellow Wallpaper" text file ("../texts/The-Yellow-Wallpaper.txt")
- `nltk_stop_words` and assigned it `=` the stopwords from the `nltk` library
- `number_of_desired_words` and assigned it `=` `40` because we wanted the 40 most frequently occuring words
- `full_text` and assigned it `=` the contents of "The Yellow Wallpaper" text file

## `Print()` Vs Jupyter Display

We can check to see what's "inside" these variables by running a cell with the variable's name. This is one of the handiest features of a Jupyter notebook. Outside the Jupyter environment, you would need to run `print(filepath_of_text)` to display the variable.

In [5]:
filepath_of_text

'../texts/The-Yellow-Wallpaper.txt'

In [106]:
nltk_stop_words

['i',
 'me',
 'my',
 'myself',
 'we',
 'our',
 'ours',
 'ourselves',
 'you',
 "you're",
 "you've",
 "you'll",
 "you'd",
 'your',
 'yours',
 'yourself',
 'yourselves',
 'he',
 'him',
 'his',
 'himself',
 'she',
 "she's",
 'her',
 'hers',
 'herself',
 'it',
 "it's",
 'its',
 'itself',
 'they',
 'them',
 'their',
 'theirs',
 'themselves',
 'what',
 'which',
 'who',
 'whom',
 'this',
 'that',
 "that'll",
 'these',
 'those',
 'am',
 'is',
 'are',
 'was',
 'were',
 'be',
 'been',
 'being',
 'have',
 'has',
 'had',
 'having',
 'do',
 'does',
 'did',
 'doing',
 'a',
 'an',
 'the',
 'and',
 'but',
 'if',
 'or',
 'because',
 'as',
 'until',
 'while',
 'of',
 'at',
 'by',
 'for',
 'with',
 'about',
 'against',
 'between',
 'into',
 'through',
 'during',
 'before',
 'after',
 'above',
 'below',
 'to',
 'from',
 'up',
 'down',
 'in',
 'out',
 'on',
 'off',
 'over',
 'under',
 'again',
 'further',
 'then',
 'once',
 'here',
 'there',
 'when',
 'where',
 'why',
 'how',
 'all',
 'any',
 'both',
 'each

In [74]:
number_of_desired_words

40

In [107]:
full_text

'THE YELLOW WALLPAPER\n\nBy Charlotte Perkins Gilman\n\n\n\nIt is very seldom that mere ordinary people like John and myself secure\nancestral halls for the summer.\n\nA colonial mansion, a hereditary estate, I would say a haunted house, and\nreach the height of romantic felicity—but that would be asking too\nmuch of fate!\n\nStill I will proudly declare that there is something queer about it.\n\nElse, why should it be let so cheaply? And why have stood so long\nuntenanted?\n\nJohn laughs at me, of course, but one expects that in marriage.\n\nJohn is practical in the extreme. He has no patience with faith, an\nintense horror of superstition, and he scoffs openly at any talk of things\nnot to be felt and seen and put down in figures.\n\nJohn is a physician, and perhaps—(I would not say it to a living\nsoul, of course, but this is dead paper and a great relief to my\nmind)—perhaps that is one reason I do not get well faster.\n\nYou see, he does not believe I am sick!\n\nAnd what can one 

Your turn! Pick another variable from the script above and see what's inside it below.

In [224]:
#your_chosen_variable

You can run the `print` function inside the Jupyter environment, too, which is sometimes useful because:

- Jupyter will only display the last variable in a cell, but `print()` can display multiple variables
- Jupyter will display text with `\n` characters (which means "new line") but `print()` will display the text appropriately formatted with new lines


In [111]:
filepath_of_text
nltk_stop_words
number_of_desired_words
full_text

'THE YELLOW WALLPAPER\n\nBy Charlotte Perkins Gilman\n\n\n\nIt is very seldom that mere ordinary people like John and myself secure\nancestral halls for the summer.\n\nA colonial mansion, a hereditary estate, I would say a haunted house, and\nreach the height of romantic felicity—but that would be asking too\nmuch of fate!\n\nStill I will proudly declare that there is something queer about it.\n\nElse, why should it be let so cheaply? And why have stood so long\nuntenanted?\n\nJohn laughs at me, of course, but one expects that in marriage.\n\nJohn is practical in the extreme. He has no patience with faith, an\nintense horror of superstition, and he scoffs openly at any talk of things\nnot to be felt and seen and put down in figures.\n\nJohn is a physician, and perhaps—(I would not say it to a living\nsoul, of course, but this is dead paper and a great relief to my\nmind)—perhaps that is one reason I do not get well faster.\n\nYou see, he does not believe I am sick!\n\nAnd what can one 

Only the last variable in the cell above, `full_text`, is displayed with `\n` characters. But if you `print()` each variable...

In [108]:
print(filepath_of_text)
print(nltk_stop_words)
print(number_of_desired_words)
print(full_text)

../texts/The-Yellow-Wallpaper.txt
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'onl

...then each of the variables are displayed, plus the "The Yellow Wallpaper" is properly formatted with new lines.

## Variable Names

Though we named our variables `filepath_of_text`, `nltk_stop_words`,`number_of_desired_words`, and `full_text`, we could have named them almost anything else.

Variable names can be as long or as short as you want, and they can include:
- upper and lower-case letters (A-Z)
- digits (0-9)
- underscores (_)

Variable names *cannot* include:
- ❌ other punctuation (-.!?@)
- ❌ spaces ( )
- ❌ a reserved Python word

Instead of `filepath_of_text`, we could have simply named the variable `filepath`.

In [50]:
filepath = "../texts/literature/The-Yellow-Wallpaper.txt"

In [12]:
filepath = "../texts/literature/The-Yellow-Wallpaper.txt"

In [13]:
filepath

'../texts/The-Yellow-Wallpaper.txt'

Or we could have gone even simpler and named the filepath `f`.

In [13]:
f = "../texts/literature/The-Yellow-Wallpaper.txt"

In [14]:
f

'../texts/The-Yellow-Wallpaper.txt'

### Striving for Good Variable Names

As you start to code, you will almost certainly be tempted to use extremely short variables names like `f`.

Your fingers will get tired, your coffee will wear off, you will see other people using variables like `f`, and you'll promise yourself that you'll definitely remember what `f` means. But you probably won't.

Thus, you must resist the temptation of bad variable names. Clear and precisely-named variables will:

1. Make your code more readable (both to yourself and others)
2. Reinforce your understanding of Python and what's happening in the code
3. Clarify and strengthen your thinking


### Example Python Code ❌ **With Bad Variable Names** ❌

For the sake of illustration, here's our same word count Python code with poorly named variables. The code works exactly the same as our original code and outputs the 40 most frequently occurring words in "The Yellow Wallpaper" — but it's *so much harder to read*!

Imagine if you stumbled across this code for the first time and were trying to figure out how it works. Or imagine that you wrote this code two summers ago and were returning to it to do some updates. You'd have to spend a lot more time deciphering and decoding.

In [15]:
import re
from collections import Counter
from nltk.corpus import stopwords

def sp(t):
    lt = t.lower()
    sw = re.split("\W+", lt)
    return sw

f = "../texts/literature/The-Yellow-Wallpaper.txt"
st = stopwords.words("english")

with open(f, encoding="utf-8") as fo:
    ft = fo.read()

words = sp(ft)
words = [w for w in words if w not in st]
words = Counter(words)
words = words.most_common(40)

print(words)

[('john', 45), ('one', 33), ('said', 30), ('would', 27), ('get', 24), ('see', 24), ('room', 24), ('pattern', 24), ('paper', 23), ('like', 21), ('little', 20), ('much', 16), ('good', 16), ('think', 16), ('well', 15), ('know', 15), ('go', 15), ('really', 14), ('thing', 14), ('wallpaper', 13), ('night', 13), ('long', 12), ('course', 12), ('things', 12), ('take', 12), ('always', 12), ('could', 12), ('jennie', 12), ('great', 11), ('says', 11), ('feel', 11), ('even', 11), ('used', 11), ('dear', 11), ('time', 11), ('enough', 11), ('away', 11), ('want', 11), ('never', 10), ('must', 10)]


### Example Python Code ✨ **With Good Variable Names** ✨

In [17]:
import re
from collections import Counter
from nltk.corpus import stopwords

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = "../texts/literature/The-Yellow-Wallpaper.txt"
nltk_stop_words = stopwords.words("english")
number_of_desired_words = 40

full_text = open(filepath_of_text).read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in nltk_stop_words]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

print(most_frequent_meaningful_words)

[('john', 45), ('one', 33), ('said', 30), ('would', 27), ('get', 24), ('see', 24), ('room', 24), ('pattern', 24), ('paper', 23), ('like', 21), ('little', 20), ('much', 16), ('good', 16), ('think', 16), ('well', 15), ('know', 15), ('go', 15), ('really', 14), ('thing', 14), ('wallpaper', 13), ('night', 13), ('long', 12), ('course', 12), ('things', 12), ('take', 12), ('always', 12), ('could', 12), ('jennie', 12), ('great', 11), ('says', 11), ('feel', 11), ('even', 11), ('used', 11), ('dear', 11), ('time', 11), ('enough', 11), ('away', 11), ('want', 11), ('never', 10), ('must', 10)]


### Off-Limit Names

The only variable names that are off-limits are names that are reserved by, or built into, the Python programming language itself, such as `print`, `True`, and `list`. It's not something to worry too much about. You'll know very quickly if a name is reserved by Python because it will show up in green and often give you an error message.

In [10]:
True = "../texts/literature/The-Yellow-Wallpaper.txt"

SyntaxError: can't assign to keyword (<ipython-input-10-fbaebf398d20>, line 1)

In [29]:
filepath-of-text = "../texts/literature/The-Yellow-Wallpaper.txt"

SyntaxError: can't assign to operator (<ipython-input-29-d608198cf705>, line 1)

## Re-Assigning Variables

Variable assignment does not set a variable in stone. You can later re-assign the same variable a different value.

For instance, I could re-assign `filepath_of_text` to the filepath for the lyrics of Beyonce's album *Lemonade* instead of Perkins-Gilman's "The Yellow Wallpaper."

In [225]:
filepath_of_text = "../texts/music/Beyonce-Lemonade.txt"

In [226]:
filepath_of_text

'../texts/Beyonce-Lemonade.txt'

If I change this one variable in our example code, then we get the most frequent words for *Lemonade*.

In [23]:
import re
from collections import Counter
from nltk.corpus import stopwords

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = "../texts/music/Beyonce-Lemonade.txt"
nltk_stop_words = stopwords.words("english")
number_of_desired_words = 40

full_text = open(filepath_of_text).read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in nltk_stop_words]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

print(most_frequent_meaningful_words)

[('love', 93), ('like', 50), ('slay', 49), ('sorry', 44), ('okay', 42), ('oh', 38), ('get', 32), ('daddy', 28), ('let', 28), ('back', 24), ('said', 22), ('work', 21), ('cause', 21), ('ft', 21), ('hold', 20), ('night', 19), ('feel', 19), ('hurt', 19), ('best', 19), ('winner', 19), ('every', 18), ('bout', 18), ('money', 17), ('baby', 16), ('boy', 16), ('long', 16), ('shoot', 16), ('good', 16), ('catch', 16), ('know', 15), ('ooh', 15), ('got', 14), ('come', 14), ('pray', 14), ('way', 13), ('gon', 13), ('kiss', 13), ('rub', 12), ('girl', 12), ('see', 12)]


### Your Turn

Ok now it's your turn to insert a new file path and calculate a new word frequency! Take a look inside our `/texts` directory and see which texts you can choose from.

(Do you remember how to look inside a directory and see what's there? Go back to [the command line lesson](https://melaniewalsh.github.io/Intro-Cultural-Analytics/Command-Line/The-Command-Line.htm) if you need a refresher. Hint: To look at the contents of a directory *inside* a directory, you can use the `-R` flag, short for "recursive.")

In [None]:
!ls # your code here

In [None]:
!ls -R #your code here

Pick a file from the list above and assign `filepath_of_text` to its corresponding filepath below:

In [None]:
import re
from collections import Counter
from nltk.corpus import stopwords

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = #Insert a New Text File Here
nltk_stop_words = stopwords.words("english")
number_of_desired_words = 40

full_text = open(filepath_of_text).read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in nltk_stop_words]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

print(most_frequent_meaningful_words)

Now let's change the `number_of_desired_words` variable! Rename the variable and then chooose a value other than 40 and see what happens.

In [None]:
import re
from collections import Counter
from nltk.corpus import stopwords

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = #Insert a New Text File Here
nltk_stop_words = stopwords.words("english")
#your_new_variable_name = #number

full_text = open(filepath_of_text).read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in nltk_stop_words]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(#your_new_variable_name)

print(most_frequent_meaningful_words)

Bonus: how might you put the stopwords back in?

In [None]:
import re
from collections import Counter
from nltk.corpus import stopwords

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = #Insert a New Text File Here
nltk_stop_words = stopwords.words("english")
#your_new_variable_name = #number

full_text = open(filepath_of_text).read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in nltk_stop_words]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(#your_new_variable_name)

print(most_frequent_meaningful_words)