# Counting made easy

# Using Counter on lists
Counter is a powerful tool for counting, validating, and learning more about the elements within a dataset that is found in the collections module. You pass an iterable (list, set, tuple) or a dictionary to the Counter. You can also use the Counter object similarly to a dictionary with key/value assignment, for example counter[key] = value.

A common usage for Counter is checking data for consistency prior to using it, so let's do just that. In this exercise, you'll be using data from the Chicago Transit Authority on ridership.

1. Import the Counter object from collections.
2. Print the first ten items from the stations list.
3. Create a Counter of the stations list called station_count.
4. Print the station_count.

In [1]:
# Import the Counter object
import numpy as np
import pandas as pd
from collections import Counter

In [2]:
Stations = r'F:\Data Analysis\Springboard\Data Science Career Track\3.Data Types for Data Science in Python\Datasets\cta_daily_station_totals.csv'

stations = pd.read_csv(Stations, index_col = 0)
stations

Unnamed: 0_level_0,stationname,date,daytype,rides
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
40010,Austin-Forest Park,01/01/2015,SUNDAY/HOLIDAY,587
40010,Austin-Forest Park,01/02/2015,WEEKDAY,1386
40010,Austin-Forest Park,01/03/2015,SATURDAY,785
40010,Austin-Forest Park,01/04/2015,SUNDAY/HOLIDAY,625
40010,Austin-Forest Park,01/05/2015,WEEKDAY,1752
...,...,...,...,...
41690,Cermak-McCormick Place,11/26/2016,SATURDAY,767
41690,Cermak-McCormick Place,11/27/2016,SUNDAY/HOLIDAY,688
41690,Cermak-McCormick Place,11/28/2016,WEEKDAY,1406
41690,Cermak-McCormick Place,11/29/2016,WEEKDAY,1545


In [3]:
# Print the first ten items from the stations list
print(stations[:10])

                   stationname        date         daytype  rides
station_id                                                       
40010       Austin-Forest Park  01/01/2015  SUNDAY/HOLIDAY    587
40010       Austin-Forest Park  01/02/2015         WEEKDAY   1386
40010       Austin-Forest Park  01/03/2015        SATURDAY    785
40010       Austin-Forest Park  01/04/2015  SUNDAY/HOLIDAY    625
40010       Austin-Forest Park  01/05/2015         WEEKDAY   1752
40010       Austin-Forest Park  01/06/2015         WEEKDAY   1777
40010       Austin-Forest Park  01/07/2015         WEEKDAY   1269
40010       Austin-Forest Park  01/08/2015         WEEKDAY   1435
40010       Austin-Forest Park  01/09/2015         WEEKDAY   1631
40010       Austin-Forest Park  01/10/2015        SATURDAY    771


In [4]:
# Create a Counter of the stations list: station_count
station_count = Counter(stations)

# Print the station_count
print(station_count)

Counter({'stationname': 1, 'date': 1, 'daytype': 1, 'rides': 1})


# Finding most common elements
Another powerful usage of Counter is finding the most common elements in a list. This can be done with the .most_common() method.

1. Import the Counter object from collections.
2. Create a Counter of the stations list called station_count.
3. Print the 5 most common elements.

In [5]:
 
# Create a Counter of the stations list: station_count
station_count = Counter(stations)

# Find the 5 most common elements
station_count.most_common(5)

[('stationname', 1), ('date', 1), ('daytype', 1), ('rides', 1)]

# Dictionaries of unknown structure - Defaultdict


# Creating dictionaries of an unknown structure
Occasionally, you'll need a structure to hold nested data, and you may not be certain that the keys will all actually exist. This can be an issue if you're trying to append items to a list for that key. You might remember the NYC data that we explored in the video. In order to solve the problem with a regular dictionary, you'll need to test that the key exists in the dictionary, and if not, add it with an empty list.

1. Create an empty dictionary called ridership.
2. Iterate over entries, unpacking it into the variables date, stop, and riders.
3. Check to see if the date already exists in the ridership dictionary. If it does not exist, create an empty list for the date key.
4. Append a tuple consisting of stop and riders to the date key of the ridership dictionary.
5. Print the ridership for '03/09/2016'.

In [6]:
#note, entries data not available so it will result in error after running the codes below

# Create an empty dictionary: ridership
ridership = {}

# Iterate over the entries
for date, stop, riders in entries:
    # Check to see if date is already in the ridership dictionary
    if date not in ridership:
        # Create an empty list for any missing date
        ridership[date] = []
    # Append the stop and riders as a tuple to the date keys list
    ridership[date].append((stop, riders))
    
# Print the ridership for '03/09/2016'
print(ridership['03/09/2016'])

NameError: name 'entries' is not defined

# Safely appending to a key's value list

Often when working with dictionaries, you will need to initialize a data type before you can use it. A prime example of this is a list, which has to be initialized on each key before you can append to that list.

A defaultdict allows you to define what each uninitialized key will contain. When establishing a defaultdict, you pass it the type you want it to be, such as a list, tuple, set, int, string, dictionary or any other valid type object.

1. Import defaultdict from collections.
2. Create a defaultdict with a default type of list called ridership.
3. Iterate over the list entries, unpacking it into the variables date, stop, and riders, exactly as you did in the previous exercise.
4. Use stop as the key of the ridership dictionary and append riders to its value.
5. Print the first 10 items of the ridership dictionary. You can use the .items() method for this. Remember, you have to convert ridership.items() to a list before slicing.

In [7]:
#note, entries data not available so it will result in error after running the codes below
# Import defaultdict
from collections import defaultdict

# Create a defaultdict with a default type of list: ridership
ridership = defaultdict(list)

# Iterate over the entries
for date, stop, riders in entries:
    # Use the stop as the key of ridership and append the riders to its value
    ridership[stop].append(riders)
    
# Print the first 10 items of the ridership dictionary
print(list(ridership.items())[:10])

NameError: name 'entries' is not defined

# Maintaining Dictionary Order with OrderedDict


# Working with OrderedDictionaries
Recently in Python 3.6, dictionaries were made to maintain the order in which the keys were inserted; however, in all versions prior to that you need to use an OrderedDict to maintain insertion order.

1. Import OrderedDict from collections.
2. Create an OrderedDict called ridership_date.
3. Iterate over the list entries, unpacking it into date and riders.
4. If a key does not exist in ridership_date for the date, set it equal to 0 (if only you could use defaultdict here!)
5. Add riders to the date key of ridership_date.
6. Print the first 31 records. Remember to convert the items into a list.

In [8]:
#note, entries data not available so it will result in error after running the codes below


# Import OrderedDict from collections
from collections import OrderedDict

# Create an OrderedDict called: ridership_date
ridership_date = OrderedDict()

# Iterate over the entries
for date, riders in entries:
    # If a key does not exist in ridership_date, set it to 0
    if  date not in ridership_date:
        ridership_date[date] = 0
        
    # Add riders to the date key in ridership_date
    ridership_date[date] += riders
    
# Print the first 31 records
print(list(ridership_date.items())[:31])

NameError: name 'entries' is not defined

# Powerful Ordered popping

Where OrderedDicts really shine is when you need to access the data in the dictionary in the order you added it. OrderedDict has a .popitem() method that will return items in reverse of which they were inserted. You can also pass .popitem() the last=False keyword argument and go through the items in the order of how they were added.

Here, you'll use the ridership_date OrderedDict you created in the previous exercise.

1. Print the first key in ridership_date (Remember to make keys a list before slicing).
2. Pop the first item from ridership_date and print it.
3. Print the last key in ridership_date.
4. Pop the last item from ridership_date and print it.

In [9]:
#note, entries data not available so it will result in error after running the codes below

# Print the first key in ridership_date
print(list(ridership_date.keys())[0])

# Pop the first item from ridership_date and print it
print(ridership_date.popitem(last=False))

# Print the last key in ridership_date
print(list(ridership_date.keys())[-1])

# Pop the last item from ridership_date and print it
print(ridership_date.popitem())

IndexError: list index out of range

# What do you mean I don't have any class? Namedtuple


# Creating namedtuples for storing data
Often times when working with data, you will use a dictionary just so you can use key names to make reading the code and accessing the data easier to understand. Python has another container called a namedtuple that is a tuple, but has names for each position of the tuple. You create one by passing a name for the tuple type and a list of field names.

1. Import namedtuple from collections.
2. Create a namedtuple called DateDetails with a type name of DateDetails and fields of 'date', 'stop', and 'riders'.
3. Create a list called labeled_entries.
4. Iterate over the entries list, unpacking it into date, stop, and riders.
5. Create a new DateDetails namedtuple instance for each entry and append it to labeled_entries.
6. Print the first 5 items in labeled_entries. This has been done for you, so hit 'Submit Answer' to see the result!

In [10]:
 Import namedtuple from collections
from collections import namedtuple

# Create the namedtuple: DateDetails
DateDetails = namedtuple('DateDetails', ['date', 'stop', 'riders'])

# Create the empty list: labeled_entries
labeled_entries = []

# Iterate over the entries list
for date, stop, riders in entries:
    # Append a new DateDetails namedtuple instance for each entry to labeled_entries
    labeled_entries.append(DateDetails(date, stop, riders))
    
# Print the first 5 items in labeled_entries
print(labeled_entries[:5])

SyntaxError: invalid syntax (<ipython-input-10-d377eb239785>, line 1)

# Leveraging attributes on namedtuples

Once you have a namedtuple, you can write more expressive code that is easier to understand. Remember, you can access the elements in the tuple by their name as an attribute. For example, you can access the date of the namedtuples in the previous exercise using the .date attribute.

1. Iterate over the first twenty items in the labeled_entries list:
2. Print each item's stop.
3. Print each item's date.
4. Print each item's riders.


In [11]:
# Iterate over the first twenty items in labeled_entries
for item in labeled_entries[:20]:
    # Print each item's stop
    print(item.stop)

    # Print each item's date
    print(item.date)

    # Print each item's riders
    print(item.riders)

NameError: name 'labeled_entries' is not defined