# Hist 3368 - Week 2 - Counting words

Let's explore two ways of counting.  Each one uses another data type that is not a list.

First, let's look at how you count with a Pandas Series.


### Counting words with pandas' Series data type

Make sure pandas is installed.  We use the "import" command to reach up into the cloud and call down a new software package.  When we use "import" we can also use "as" to give the package a familiar nickname, in this case, 'pd'.

In [5]:
import pandas as pd

Next, let's use the *song* list that we made above as the basis for a new pandas 'Series'.  Remember that a Series is just another kind of datatype, in this case one that's very good at counting.  We'll use the pandas command, .Series().  Because it's a pandas command, we have to tell Python to call pandas before calling .Series().  

Also, note the capital **S** in Series.  Pandas has its own punctuation. In this command, capitalization is important.

In [6]:
seriessong = pd.Series(song)

In [7]:
print(seriessong)

0       row
1       row
2       row
3      your
4      boat
5    gently
6      down
7       the
8    stream
dtype: object


Note that the Series datatype has a particular look. It has row numbers going down the left.  It tells you down below that it is a datatype "object."  Essentially, a pandas Series is a tiny spreadsheet. Our Series, seriessong, is one column wide. 

If we're ever confused, we can ask Python to tell us what type of data we're looking at.

In [9]:
type(seriessong)

pandas.core.series.Series

Pandas is really good at counting Series quickly, using the command *.value_counts().  

In [11]:
seriessong.value_counts()

row       3
boat      1
gently    1
the       1
stream    1
down      1
your      1
dtype: int64

Because we'll want to call value_counts again, this time let's save the results as a new variable, *songcounts.* 

What type of data is songcounts? Let's ask.

In [None]:
songcounts = seriessong.value_counts()

In [None]:
type(songcounts)


We can navigate the Series the same way we do a list -- with square brackets.  

We can call the first member of songcounts by using square brackets and the number 0, which you will remember is how Python classifies the first item in a list.

In [14]:
songcounts[0]

3

We can use the function .get() to call the value of the pandas Series if we know the word in question

In [37]:
songcounts.get('row')

3

In [38]:
songcounts.get('boat')

1

In [39]:
songcounts.get('your')

1

We can use square brackets to ask Python for more information.

In this case, we put inside the square brackets the conditions we want to meet:  "show us the parts of songcounts where the value of songcounts is 3."

In [27]:
songcounts[songcounts == 3]

row    3
dtype: int64

Note that when we called .value_counts, the names of the words were stored in the pandas space known as the 'index'.

We can always get to that 'index' by using the function .index.

.index produces a list of values which is the axis of the pandas Series.

In [45]:
songcounts.index

Index(['row', 'boat', 'gently', 'the', 'stream', 'down', 'your'], dtype='object')

We can navigate this as we would a list.  .index[0] produces the first item in the list.

In [46]:
songcounts.index[0]

'row'

Try navigating the Pandas series song_counts some more.

In [33]:
songcounts[3]

1

In [34]:
songcounts[-1]

1

In [47]:
songcounts.index[-1]

'your'

In [35]:
songcounts[-3:]

stream    1
down      1
your      1
dtype: int64

### Counting words with the 'Dictionary' data type

We can also count the words in a list using a dictionary, using the 'collections' software package, which has a function called 'Counter'. 

The function 'Counter' produces word counts in the format of a dictionary.

In general, we'll use collections and Counter less frequently than we'll use pandas Series and the function .value_counts().  It's still useful to know that there are many ways to count in Python.  

Test your memory of how to navigate a dictionary object by investigating the use of Counter applied to the variable *song*.

In [49]:
from collections import Counter

In [50]:
Counter(song)

Counter({'row': 3,
         'your': 1,
         'boat': 1,
         'gently': 1,
         'down': 1,
         'the': 1,
         'stream': 1})

Can we navigate to the count for one word?

In [53]:
songcounts = Counter(song)

In [55]:
songcounts[3]

0

That doesn't appear to work the way we might have thought if we were working from a list. Why not?  The answer is that we are dealing data structured as a *dictionary*.  

Dictionaries expect you to call the 'key' of the dictionary in square brackets.  Then they will return the value.  

In this case, the keys are words and the values are counts.

In [59]:
songcounts['row']

3

This variation does the same thing:

In [66]:
songcounts.get('row')

3

You'll get a 'key error' -- or a value of zero -- if you try looking up a key that doesn't exist.

In [61]:
songcounts['banana']

0

You can use the 'in' operator to check if a particular item is found in a dictionary.

In [64]:
'banana' in songcounts

False

In [65]:
'row' in songcounts

True

What about looking up a key based on a value?  Well, multiple keys may have the same value.  So there's no easy command for it. This is important: *not every data type is as easy to navigate as every other data type.*



You can navigate dictionaries in other ways.

We could call all the keys with .keys()

In [68]:
songkeys = songcounts.keys()
songkeys

dict_keys(['row', 'your', 'boat', 'gently', 'down', 'the', 'stream'])

Or all the values with .values()

In [75]:
songvalues = songcounts.values()
songvalues

dict_values([3, 1, 1, 1, 1, 1, 1])

### Assignment

Create a variable called by your last name. Write out the lyrics to a new poem or song of your choice of at least five lines as a list.

* Use the Series method with the function .value_counts() to count the words in the song. 
     * Write out the code to navigate this list:
        * What is the first item in the Series?
        * What is the last item in the Series?
        
* Use the Dictionary method and the function Counter() to count the words in the song.
     * Write out the code to navigate this list:
        * How many times does the word 'the' appear in the song?
        
Take a screenshot just of your code and results. Upload it to Canvas.


