In [1]:
# Import libraries
from collections import Counter
from collections import defaultdict
from collections import OrderedDict
from collections import namedtuple

from pprint import pprint
from datetime import datetime

import pandas as pd

In [2]:
# Read data from file into list of list
df = pd.read_csv('data/cta_daily_station_totals.csv')
print(df.head())

   station_id         stationname        date         daytype  rides
0       40010  Austin-Forest Park  01/01/2015  SUNDAY/HOLIDAY    587
1       40010  Austin-Forest Park  01/02/2015         WEEKDAY   1386
2       40010  Austin-Forest Park  01/03/2015        SATURDAY    785
3       40010  Austin-Forest Park  01/04/2015  SUNDAY/HOLIDAY    625
4       40010  Austin-Forest Park  01/05/2015         WEEKDAY   1752


In [3]:
#stations = ['stationname'] + df.stationname.to_list()
stations = df.stationname.to_list()
entries = list(df[['date', 'stationname', 'rides']].to_records(index=False))
entries_Austin_Forest_Park = list(df[df.stationname == 'Austin-Forest Park'][['date', 'rides']].to_records(index=False))

# 03. Meet the collections module

The collections module is part of Python's standard library and holds some more advanced data containers. You'll learn how to use the Counter, defaultdict, OrderedDict and namedtuple in the context of answering questions about the Chicago transit dataset.

## 03.01 Counting made easy

See the video.

In [4]:
nyc_eatery_types = ['Mobile Food Truck', 'Mobile Food Truck', 'Mobile Food Truck', 'Mobile Food Truck', 'Mobile Food Truck', 
                    'Mobile Food Truck', 'Mobile Food Truck', 'Mobile Food Truck', 'Mobile Food Truck', 'Mobile Food Truck', 
                    'Food Cart', 'Food Cart', 'Food Cart', 'Food Cart', 'Food Cart', 'Food Cart', 'Food Cart', 'Snack Bar', 
                    'Snack Bar', 'Snack Bar', 'Snack Bar', 'Snack Bar', 'Restaurant', 'Restaurant', 'Restaurant', 
                    'Fruit & Vegetable Cart']

nyc_eatery_count_by_types = Counter(nyc_eatery_types)
print(nyc_eatery_count_by_types)

print(nyc_eatery_count_by_types['Restaurant'])

Counter({'Mobile Food Truck': 10, 'Food Cart': 7, 'Snack Bar': 5, 'Restaurant': 3, 'Fruit & Vegetable Cart': 1})
3


## 03.02 Using Counter on lists

__Counter__ is a powerful tool for counting, validating, and learning more about the elements within a dataset that is found in the __collections__ module. You pass an iterable (list, set, tuple) or a dictionary to the __Counter__. You can also use the __Counter__ object similarly to a dictionary with key/value assignment, for example __counter[key] = value__.

A common usage for __Counter__ is checking data for consistency prior to using it, so let's do just that. In this exercise, you'll be using data from the Chicago Transit Authority on ridership.

**Instructions**

1. Import the Counter object from collections.
2. Print the first ten items from the stations list.
3. Create a Counter of the stations list called station_count.
4. Print the station_count.

**Results:**<br>
<font color=darkgreen>Great work!</font>

In [5]:
# Print the first ten items from the stations list
print(stations[:10])

# Create a Counter of the stations list: station_count
station_count = Counter(stations)

# Print the station_count
pprint(station_count)

['Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park']
Counter({'Austin-Forest Park': 700,
         'Harlem-Lake': 700,
         'Pulaski-Lake': 700,
         'Quincy/Wells': 700,
         'Davis': 700,
         "Belmont-O'Hare": 700,
         'Jackson/Dearborn': 700,
         'Sheridan': 700,
         'Damen-Brown': 700,
         'Morse': 700,
         '35th/Archer': 700,
         '51st': 700,
         'Dempster-Skokie': 700,
         'Pulaski-Cermak': 700,
         'LaSalle/Van Buren': 700,
         'Ashland-Lake': 700,
         'Oak Park-Forest Park': 700,
         'Sox-35th-Dan Ryan': 700,
         'Randolph/Wabash': 700,
         'Damen-Cermak': 700,
         'Western-Forest Park': 700,
         'Cumberland': 700,
         '79th': 700,
         'Kedzie-Homan-Forest Park': 700,
         'State/Lake': 700,
         'Mai

## 03.03 Finding most common elements

Another powerful usage of __Counter__ is finding the most common elements in a list. This can be done with the __.most_common()__ method.

Practice using this now to find the most common stations in a stations list.

**Instructions**

1. Import the Counter object from collections.
2. Create a Counter of the stations list called station_count.
3. Print the 5 most common elements.

**Results:**<br>
<font color=darkgreen>Great work!</font>

In [6]:
# Find the 5 most common elements
print(station_count.most_common(5))

[('Austin-Forest Park', 700), ('Harlem-Lake', 700), ('Pulaski-Lake', 700), ('Quincy/Wells', 700), ('Davis', 700)]


## 03.04 Dictionaries of unknown structure - Defaultdict

See the video.

## 03.05 Creating dictionaries of an unknown structure

Occasionally, you'll need a structure to hold nested data, and you may not be certain that the keys will all actually exist. This can be an issue if you're trying to append items to a list for that key. You might remember the NYC data that we explored in the video. In order to solve the problem with a regular dictionary, you'll need to test that the key exists in the dictionary, and if not, add it with an empty list.

You'll be working with a list of entries that contains ridership details on the Chicago transit system. You're going to solve this same type of problem with a much easier solution in the next exercise.

**Instructions**

1. Create an empty dictionary called ridership.
2. Iterate over entries, unpacking it into the variables date, stop, and riders.
3. Check to see if the date already exists in the ridership dictionary. If it does not exist, create an empty list for the date key.
4. Append a tuple consisting of stop and riders to the date key of the ridership dictionary.
5. Print the ridership for '03/09/2016'.

**Results:**<br>
<font color=darkgreen>Nicely done! In the next exercise, you'll practice creating a defaultdict and see how useful it can be.</font>

In [7]:
# Create an empty dictionary: ridership
ridership = {}

# Iterate over the entries
for date, stop, riders in entries:
    # Check to see if date is already in the ridership dictionary
    if date not in ridership:
        # Create an empty list for any missing date
        ridership[date] = []
    # Append the stop and riders as a tuple to the date keys list
    ridership[date].append((stop, riders))
    
# Print the ridership for '03/09/2016'
pprint(ridership['03/09/2016'][:10])

[('Austin-Forest Park', 2128),
 ('Harlem-Lake', 3769),
 ('Pulaski-Lake', 1502),
 ('Quincy/Wells', 8139),
 ('Davis', 3656),
 ("Belmont-O'Hare", 5294),
 ('Jackson/Dearborn', 8369),
 ('Sheridan', 5823),
 ('Damen-Brown', 3048),
 ('Morse', 4826)]


## 03.06 Safely appending to a key's value list

Often when working with dictionaries, you will need to initialize a data type before you can use it. A prime example of this is a list, which has to be initialized on each key before you can append to that list.

A __defaultdict__ allows you to define what each uninitialized key will contain. When establishing a __defaultdict__, you pass it the type you want it to be, such as a __list__, __tuple__, __set__, __int__, __string__, __dictionary__ or any other valid type object.

**Instructions**

1. Import defaultdict from collections.
2. Create a defaultdict with a default type of list called ridership.
3. Iterate over the list entries, unpacking it into the variables date, stop, and riders, exactly as you did in the previous exercise.
4. Use stop as the key of the ridership dictionary and append riders to its value.
5. Print the first 10 items of the ridership dictionary. You can use the .items() method for this. Remember, you have to convert ridership.items() to a list before slicing.

**Results:**<br>
<font color=darkgreen>Great work!</font>

In [8]:
# Create a defaultdict with a default type of list: ridership
ridership = defaultdict(list)

# Iterate over the entries
for date, stop, riders in entries:
    # Use the stop as the key of ridership and append the riders to its value
    ridership[stop].append(riders)
    
# Print the first 10 items of the ridership dictionary
print(list(ridership.items())[:1])

[('Austin-Forest Park', [587, 1386, 785, 625, 1752, 1777, 1269, 1435, 1631, 771, 588, 2065, 2108, 2012, 2069, 2003, 953, 706, 1216, 2115, 2132, 2185, 2072, 854, 585, 2095, 2251, 2133, 2083, 2074, 953, 596, 1583, 2263, 2179, 2105, 2076, 1049, 612, 2095, 2191, 2117, 1931, 1943, 800, 584, 1434, 2078, 1869, 1455, 1830, 841, 621, 1884, 2100, 2046, 2066, 2016, 875, 615, 1975, 2391, 2058, 2035, 2008, 989, 635, 2105, 2148, 2152, 2155, 2182, 1340, 718, 2191, 2220, 2154, 2248, 2183, 1073, 664, 1924, 2060, 2049, 2138, 1930, 972, 693, 2059, 2060, 2120, 2062, 1751, 928, 664, 2047, 2032, 2030, 1899, 2096, 1012, 688, 2090, 2160, 2182, 2184, 2235, 1060, 732, 2090, 2161, 2115, 2203, 2180, 885, 738, 2152, 2175, 2230, 2218, 2320, 1207, 773, 2171, 2090, 2225, 2333, 2098, 1042, 678, 2048, 2097, 2118, 2198, 2273, 1095, 779, 2103, 2119, 2090, 2206, 2081, 1095, 767, 795, 2025, 2171, 2271, 2175, 910, 668, 2148, 2110, 2198, 2152, 2138, 1129, 773, 2041, 2156, 2172, 2093, 2010, 1225, 843, 2006, 2126, 2062, 2341, 

## 03.07 Maintaining Dictionary Order with OrderedDict

See the video.

## 03.08 Working with OrderedDictionaries

Recently in Python 3.6, dictionaries were made to maintain the order in which the keys were inserted; however, in all versions prior to that you need to use an __OrderedDict__ to maintain insertion order.

Let's create a dictionary of all the stop times by route and rider, then use it to find the ridership throughout the day.

**Instructions**

1. Import OrderedDict from collections.
2. Create an OrderedDict called ridership_date.
3. Iterate over the list entries, unpacking it into date and riders.
4. If a key does not exist in ridership_date for the date, set it equal to 0 (if only you could use defaultdict here!)
5. Add riders to the date key of ridership_date.
5. Print the first 31 records. Remember to convert the items into a list.

**Results:**<br>
<font color=darkgreen>Great work using the OrderedDict! Do you see any interesting patterns in the ridership in January 2015?</font>

In [9]:
# Create an OrderedDict called: ridership_date
ridership_date = OrderedDict()

# Iterate over the entries
for date, riders in entries_Austin_Forest_Park:
    day = datetime.strptime(date, "%m/%d/%Y").strftime("%a")
    date_day = (date, day)
    
    # If a key does not exist in ridership_date, set it to 0
    if date_day not in ridership_date:
        ridership_date[date_day] = 0
        
    # Add riders to the date key in ridership_date
    ridership_date[date_day] += riders
    
# Print the first 31 records
pprint(list(ridership_date.items())[:31])

[(('01/01/2015', 'Thu'), 587),
 (('01/02/2015', 'Fri'), 1386),
 (('01/03/2015', 'Sat'), 785),
 (('01/04/2015', 'Sun'), 625),
 (('01/05/2015', 'Mon'), 1752),
 (('01/06/2015', 'Tue'), 1777),
 (('01/07/2015', 'Wed'), 1269),
 (('01/08/2015', 'Thu'), 1435),
 (('01/09/2015', 'Fri'), 1631),
 (('01/10/2015', 'Sat'), 771),
 (('01/11/2015', 'Sun'), 588),
 (('01/12/2015', 'Mon'), 2065),
 (('01/13/2015', 'Tue'), 2108),
 (('01/14/2015', 'Wed'), 2012),
 (('01/15/2015', 'Thu'), 2069),
 (('01/16/2015', 'Fri'), 2003),
 (('01/17/2015', 'Sat'), 953),
 (('01/18/2015', 'Sun'), 706),
 (('01/19/2015', 'Mon'), 1216),
 (('01/20/2015', 'Tue'), 2115),
 (('01/21/2015', 'Wed'), 2132),
 (('01/22/2015', 'Thu'), 2185),
 (('01/23/2015', 'Fri'), 2072),
 (('01/24/2015', 'Sat'), 854),
 (('01/25/2015', 'Sun'), 585),
 (('01/26/2015', 'Mon'), 2095),
 (('01/27/2015', 'Tue'), 2251),
 (('01/28/2015', 'Wed'), 2133),
 (('01/29/2015', 'Thu'), 2083),
 (('01/30/2015', 'Fri'), 2074),
 (('01/31/2015', 'Sat'), 953)]


## 03.09 Powerful Ordered popping

Where OrderedDicts really shine is when you need to access the data in the dictionary in the order you added it. OrderedDict has a __.popitem()__ method that will return items in reverse of which they were inserted. You can also pass __.popitem()__ the __last=False__ keyword argument and go through the items in the order of how they were added.

Here, you'll use the ridership_date OrderedDict you created in the previous exercise.

**Instructions**

1. Print the first key in ridership_date (Remember to make keys a list before slicing).
2. Pop the first item from ridership_date and print it.
3. Print the last key in ridership_date.
4. Pop the last item from ridership_date and print it.

**Results:**<br>
<font color=darkgreen>Wonderful work!</font>

In [10]:
# Print the first key in ridership_date
print(list(ridership_date.keys())[0])

# Pop the first item from ridership_date and print it
print(ridership_date.popitem(last=False))

# Print the last key in ridership_date
print(list(ridership_date.keys())[-1])

# Pop the last item from ridership_date and print it
print(ridership_date.popitem())

('01/01/2015', 'Thu')
(('01/01/2015', 'Thu'), 587)
('11/30/2016', 'Wed')
(('11/30/2016', 'Wed'), 2197)


## 03.10 What do you mean I don't have any class? Namedtuple

See the video.

## 03.11 Creating namedtuples for storing data

Often times when working with data, you will use a dictionary just so you can use key names to make reading the code and accessing the data easier to understand. Python has another container called a __namedtuple__ that is a tuple, but has names for each position of the tuple. You create one by passing a name for the tuple type and a list of field names.

For example, __Cookie = namedtuple("Cookie", ['name', 'quantity'])__ will create a container, and you can create new ones of the type using __Cookie('chocolate chip', 1)__ where you can access the name using the __name__ attribute, and then get the quantity using the __quantity__ attribute.

In this exercise, you're going to restructure the transit data you've been working with into namedtuples for more descriptive code.

**Instructions**

1. Import namedtuple from collections.
2. Create a namedtuple called DateDetails with a type name of DateDetails and fields of 'date', 'stop', and 'riders'.
3. Create a list called labeled_entries.
4. Iterate over the entries list, unpacking it into date, stop, and riders.
5. Create a new DateDetails namedtuple instance for each entry and append it to labeled_entries.
6. Print the first 5 items in labeled_entries. This has been done for you, so hit 'Submit Answer' to see the result!

**Results:**<br>
<font color=darkgreen>Namedtuples are great for making an easy-to-use datatype. Let's look at how we can use them to make our code easier to read and reason about.</font>

In [11]:
# Create the namedtuple: DateDetails
DateDetails = namedtuple('DateDetails', ['date', 'stop', 'riders'])

# Create the empty list: labeled_entries
labeled_entries = []

# Iterate over the entries list
for date, stop, riders in entries:
    # Append a new DateDetails namedtuple instance for each entry to labeled_entries
    labeled_entries.append(DateDetails(date, stop, riders))
    
# Print the first 5 items in labeled_entries
pprint(labeled_entries[:5])

[DateDetails(date='01/01/2015', stop='Austin-Forest Park', riders=587),
 DateDetails(date='01/02/2015', stop='Austin-Forest Park', riders=1386),
 DateDetails(date='01/03/2015', stop='Austin-Forest Park', riders=785),
 DateDetails(date='01/04/2015', stop='Austin-Forest Park', riders=625),
 DateDetails(date='01/05/2015', stop='Austin-Forest Park', riders=1752)]


## 03.12 Leveraging attributes on namedtuples

Once you have a namedtuple, you can write more expressive code that is easier to understand. Remember, you can access the elements in the tuple by their name as an attribute. For example, you can access the date of the namedtuples in the previous exercise using the __.date__ attribute.

Here, you'll use the tuples you made in the previous exercise to see how this works.

**Instructions**

1. Iterate over the first twenty items in the labeled_entries list:
2. Print each item's stop.
3. Print each item's date.
4. Print each item's riders.

**Results:**<br>
<font color=darkgreen>Congratulations on finishing Chapter 3! See you in Chapter 4, where you'll learn how to deal with Dates and Times in Python.</font>

In [12]:
# Iterate over the first twenty items in labeled_entries
for item in labeled_entries[:20]:
    # Print each item's stop, date and riders
    print(item.stop, item.date, item.riders)

Austin-Forest Park 01/01/2015 587
Austin-Forest Park 01/02/2015 1386
Austin-Forest Park 01/03/2015 785
Austin-Forest Park 01/04/2015 625
Austin-Forest Park 01/05/2015 1752
Austin-Forest Park 01/06/2015 1777
Austin-Forest Park 01/07/2015 1269
Austin-Forest Park 01/08/2015 1435
Austin-Forest Park 01/09/2015 1631
Austin-Forest Park 01/10/2015 771
Austin-Forest Park 01/11/2015 588
Austin-Forest Park 01/12/2015 2065
Austin-Forest Park 01/13/2015 2108
Austin-Forest Park 01/14/2015 2012
Austin-Forest Park 01/15/2015 2069
Austin-Forest Park 01/16/2015 2003
Austin-Forest Park 01/17/2015 953
Austin-Forest Park 01/18/2015 706
Austin-Forest Park 01/19/2015 1216
Austin-Forest Park 01/20/2015 2115


# Aditional material

- **Datacamp course**: https://learn.datacamp.com/courses/data-types-for-data-science-in-python