# counting everything

The basic counting idiom in python is just a formalization of how we count things in regular life. Arrange them into a set of things of the general type to be counted, then go through them one by one, making a note of each that meets some criterion. 

That's a fancy way of saying, if you want to count words, you need words; if you want to do letters, you want letters. Don't try to count words with a pile of letters; transform them into words first.

Counting many words as once takes advantage of the name-based strucutre of a dictionary to keep all of the separate counts divided up, but it's fundamentally not that different from counting instances of one word. Noteably, you don't loop over the dictionary or digest your text into a new format. As before, we almost certainly just want a list of all the words in order (actually, they don't have to be in order for this, as we'll see later in the course—all that matters is that we have an easy way to go through them all).

Let's review how this works by fleshing out the following function:

In [1]:
def count_word(word, text: str):
    '''
    counts all occurances of word in text and returns the count as an integer
    for this function, a word means a space separated string of characters; no need to lowercase or strip punctuation
    '''
    count = 0
    return count


With a dictionary, we do something pretty similar, except that for each word, we have to make sure it's in the dictionary, then add to our count. There are actually two ways to do this, depending on how we handle the case where we have a new word and it's not in the dictionary.

In [2]:
def count_all_words(text: str):
    '''
    counts all of the words in text and returns a dict of counts
    '''


Our return value here is a bit less transparent than in the case of `count_word()`. We could further process the dictionary, but if we want to print it out, we need to know a few dictionary methods.

In [13]:
d = {'fish' : 14, 'whale' : 6, 'shark' : 3, 'shrimp' : 4}
print(dict)

for count in dict:
    print(count)

{'fish': 14, 'whale': 6, 'shark': 3, 'shrimp': 4}
fish
whale
shark
shrimp


If we loop over a dictionary, we just get the keys; it's the equivalent of looping over `dict.keys()`. That might suggest we try looping over `dict.values()` but that won't print the keys, which is probably even less helpful. Of course, you could loop over the values and for each of them, `print(f'{key} : {dict[key]}')` but there's an easier way. `dict.items()` returns a list of what in python are called tuples. A **tuple** (a double, triple etc) is kind of like a list, except you can't change it in any way. It's a useful way to package together information that doesn't make much sense without it's other parts, or to show that the individual pieces won't be updated in any way.

tuples are especially useful in python because of what's called unpacking, where python will automatically map from separate things onto collections of things. Consider this:

In [14]:
word, count = ['fish', 14]
print(word)
print(count)

fish
14


It works with tuples exactly like with lists, which brings us around to how to actually make a tuple. Tuples come in parentheses, with items divided by commas—a lot likie lists. In actual code, there's not much point in making them (think about making lists: how often have you made a list that's not an empty list? And, why would an empty tuple be less useful than an empty list?) but if you want, you can do it!

In [15]:
my_tuple = ('word', 'count')
print(my_tuple)

('word', 'count')


Now, for the dictionary case:

In [16]:
for item in d.items():
    print(item)

('fish', 14)
('whale', 6)
('shark', 3)
('shrimp', 4)


these are tuples, and this isn't a terrible way to print out the dictionary, but we can do the unpacking thing to do a bit better:

In [19]:
for key, value in d.items():
    print(f'word:{key} -> count:{value}')

word:fish -> count:14
word:whale -> count:6
word:shark -> count:3
word:shrimp -> count:4


When we start writing things like this to file, rather than printing them to screen, we can skip this step, because usually what we want for this kind of data is a list containing rows of data, where each row is another list, or something like it—which includes a tuple. `dict.items()` is dressed up a bit, but that's basically what it is, and you can write it directly into a csv with a column for each of words and counts.

In [20]:
d.items()

dict_items([('fish', 14), ('whale', 6), ('shark', 3), ('shrimp', 4)])

In [None]:
def count_file('filename')
with open(filename, 'R', encoding='UTF-8') as f:
    text = f.read
    return count_all_words(text)
ygb_counts = count_file('hawthorne_young_goodman_brown.txt')
print(ygb_counts.items()[:100])