

## Collections Module
- Part of Standard Library
- Advanced data containers




## Counter
- Special dictionary used for counting data, measuring frequency
- dictionary based

```python 
In [1]: from collections import Counter
In [2]: nyc_eatery_count_by_types = Counter(nyc_eatery_types)
In [3]: print(nyc_eatery_count_by_type)
Counter({'Mobile Food Truck': 114, 'Food Cart': 74, 'Snack Bar': 24,
'Specialty Cart': 18, 'Restaurant': 15, 'Fruit & Vegetable Cart': 4})
In [4]: print(nyc_eatery_count_by_types['Restaurant'])
15
```


## Counter to find the most common
- `.most_common()` method returns the counter values in descending order

- Great for frequency analytics and how often somethign occurs 

Find the top 3 eatary tops in the NYC park system

```python
In [1]: print(nyc_eatery_count_by_types.most_common(3))
[('Mobile Food Truck', 114), ('Food Cart', 74), ('Snack Bar', 24)]
```






---


### Using Counter on lists
Counter is a powerful tool for counting, validating, and learning more about the elements within a dataset that is found in the collections module. You pass an iterable (list, set, tuple) or a dictionary to the Counter. You can also use the Counter object similarly to a dictionary with key/value assignment, for example counter[key] = value.

A common usage for Counter is checking data for consistency prior to using it, so let's do just that. In this exercise, you'll be using data from the Chicago Transit Authority on ridership.

### INSTRUCTIONS
100XP
- Import the Counter object from collections.
- Print the first ten items from the stations list.
- Create a Counter of the stations list called station_count.
- Print the station_count.

```python
# practice

# Using Counter on list


# Import the Counter object
from collections import Counter

# Print the first ten items from the stations list
print(stations[:10])

# Create a Counter of the stations list: station_count
station_count = Counter(stations)

# Print the station_count
print(station_count)

```

# Dictionaries of unknown structure - defaultdict




## Dictionary Handling

```python
In [1]: for park_id, name in nyc_eateries_parks:
   ...:     if park_id not in eateries_by_park:
   ...:        eateries_by_park[park_id] = []
   ...:     eateries_by_park[park_id].append(name)
In [2]: print(eateries_by_park['M010'])
{'MOHAMMAD MATIN','PRODUCTS CORP.', 'Loeb Boathouse Restaurant',
'Nandita Inc.', 'SALIM AHAMED', 'THE NY PICNIC COMPANY', 
'THE NEW YORK PICNIC COMPANY, INC.', 'NANDITA, INC.', 
'JANANI FOOD SERVICE, INC.'}
```




## Using defaultdict
- Pass it a default type that every key will have even if it doesn't currently exist
- Works exactly like a dictionary








```python
In [1]: from collections import defaultdict
In [2]: eateries_by_park = defaultdict(list)
In [3]: for park_id, name in nyc_eateries_parks:
   ...:     eateries_by_park[park_id].append(name)
In [4]: print(eateries_by_park['M010'])
{'MOHAMMAD MATIN','PRODUCTS CORP.', 'Loeb Boathouse Restaurant', 
'Nandita Inc.', 'SALIM AHAMED', 'THE NY PICNIC COMPANY', 
'THE NEW YORK PICNIC COMPANY, INC.', 'NANDITA, INC.', 
'JANANI FOOD SERVICE, INC.'}
```



You can use `defaultdic` as a type of counter for a list of dictionary where we are counting multiple keys from those dictionaries.

- find how many have phone numbers or website


```python
In [1]: from collections import defaultdict
In [2]: eatery_contact_types = defaultdict(int)
In [3]: for eatery in nyc_eateries:
   ...:     if eatery.get('phone'):
   ...:         eatery_contact_types['phones'] += 1
   ...:     if eatery.get('website'):
   ...:         eatery_contact_types['websites'] += 1
In [4]: print(eatery_contact_types)
defaultdict(<class 'int'>, {'phones': 28, 'websites': 31})
```


---

### Creating dictionaries of an unknown structure
Occasionally, you'll need a structure to hold nested data, and you may not be certain that the keys will all actually exist. This can be an issue if you're trying to append items to a list for that key. You might remember the NYC data that we explored in the video. In order to solve the problem with a regular dictionary, you'll need to test that the key exists in the dictionary, and if not, add it with an empty list.

You'll be working with a list of entries that contains ridership details on the Chicago transit system. You're going to solve this same type of problem with a much easier solution in the next exercise.

INSTRUCTIONS
100XP
- Create an empty dictionary called ridership.
- Iterate over entries, unpacking it into the variables date, stop, and riders.
- Check to see if the date already exists in the ridership dictionary. If it does not exist, create an empty list for the date key.
- Append a tuple consisting of stop and riders to the date key of the ridership dictionary.
- Print the ridership for '03/09/2016'.



```python
# Create an empty dictionary: ridership
ridership = {}

# Iterate over the entries
for date, stop, riders in entries:
    # Check to see if date is already in the dictionary
    if date not in ridership:
        # Create an empty list for any missing date
        ridership[date] = []
    # Append the stop and riders as a tuple to the date keys list
    ridership[date].append((stop,riders))
    
# Print the ridership for '03/09/2016'
print(ridership['03/09/2016'])

```

## Safely appending to a key's value list
Often when working with dictionaries, you know the data type you want to have each key be; however, some data types such as lists have to be initialized on each key before you can append to that list.

A defaultdict allows you to define what each uninitialized key will contain. When establishing a defaultdict, you pass it the type you want it to be, such as a list, tuple, set, int, string, dictionary or any other valid type object.

INSTRUCTIONS
100XP
- Import defaultdict from collections.
- Create a defaultdict with a default type of list called ridership.
Iterate over the list entries, unpacking it into the variables date, stop, and riders, exactly as you did in the previous exercise.
- Use stop as the key of the ridership dictionary and append riders to its value.
- Print the first 10 items of the ridership dictionary. You can use the .items() method for this. Remember, you have to convert items to a list before slicing.


```python
# Import defaultdict
from collections import defaultdict

# Create a defaultdict with a default type of list: ridership
ridership = defaultdict(list)

# Iterate over the entries
# entries is a list of tuples [('01/02/2015', 'Berwin', '2890'),..]
for date,stop, riders in entries:
    # Use the stop as the key of ridership and append the riders to its value
    ridership[stop].append(riders)
    
# Print the first 10 items of the ridership dictionary
print(list(ridership.items())[:10])
```

# Maintaining Dictionary Order with OrderedDict

Normal dictionary does not maintain the order of the keys that you insert them,

You may want to store data by `date` or `ranking`

## Order in Python dictionaries
- Python version < 3.6 NOT ordered
- Python version > 3.6 ordered


## Getting started with OrderedDict




```python
In [1]: from collections import OrderedDict
In [2]: nyc_eatery_permits = OrderedDict()
In [3]: for eatery in nyc_eateries:
   ...:     nyc_eatery_permits[eatery['end_date']] = eatery
In [4]: print(list(nyc_eatery_permits.items())[:3]
('2029-04-28', {'name': 'Union Square Seasonal Cafe',
'location': 'Union Square Park', 'park_id': 'M089',
'start_date': '2014-04-29', 'end_date': '2029-04-28', 
'description': None, 'permit_number': 'M89-SB-R', 'phone': '212-677-7818', 
'website': 'http://www.thepavilionnyc.com/', 'type_name': 'Restaurant'})
```




## OrderedDict power feature

- `.popitem()` method returns items in reverse insertion order

```python
In [1]: print(nyc_eatery_permits.popitem())
('2029-04-28', {'name': 'Union Square Seasonal Cafe',
'location': 'Union Square Park', 'park_id': 'M089',
'start_date': '2014-04-29', 'end_date': '2029-04-28',
'description': None, 'permit_number': 'M89-SB-R', 'phone': '212-677-7818',
'website': 'http://www.thepavilionnyc.com/', 'type_name': 'Restaurant'})
In [2]: print(nyc_eatery_permits.popitem())
('2027-03-31', {'name': 'Dyckman Marina Restaurant',
'location': 'Dyckman Marina Restaurant', 'park_id': 'M028',
'start_date': '2012-04-01', 'end_date': '2027-03-31',
'description': None, 'permit_number': 'M28-R', 'phone': None,
'website': None, 'type_name': 'Restaurant'})
```

## OrderedDict power feature (2)
- You can use the `last=False` keyword argument to return the items in insertion order
 
```python 
In [3]: print(nyc_eatery_permits.popitem(last=False))
('2012-12-07', {'name': 'Mapes Avenue Ballfields Mobile Food Truck',
'location': 'Prospect Avenue, E. 181st Street', 'park_id': 'X289',
'start_date': '2009-07-01', 'end_date': '2012-12-07', 
'description': None, 'permit_number': 'X289-MT', 'phone': None,
'website': None, 'type_name': 'Mobile Food Truck'})
```





```python
# Import OrderedDict from collections
from collections import OrderedDict

# Create an OrderedDict called: ridership_date
ridership_date = OrderedDict()

# Iterate over the entries
for date, riders in entries:
    # If a key does not exist in ridership_date, set it to 0
    if not date in ridership_date:
        ridership_date[date] = 0
        
    # Add riders to the date key in ridership_date
    ridership_date[date] += riders
    
# Print the first 31 records
print(list(ridership_date.items())[:31])
```

```python
# Print the first key in ridership_date
print(list(ridership_date.keys())[0])

# Pop the first item from ridership_date and print it
print(ridership_date.popitem(last=False))


# Print the last key in ridership_date
print(list(ridership_date.keys())[-1])

# Pop the last item from ridership_date and print it
print(ridership_date.popitem())


```

# namedtuple


## What is a namedtuple?
- A tuple where each position (column) has a name
- Ensure each one has the same properties
- Alternative to a `pandas` DataFrame row







## Creating a namedtuple
- Pass a name and a list of fields

```python
In [1]: from collections import namedtuple
In [2]: Eatery = namedtuple('Eatery', ['name', 'location', 'park_id',
   ...: 'type_name'])
In [3]: eateries = []
In [4]: for eatery in nyc_eateries:
   ...:     details = Eatery(eatery['name'],
   ...:                      eatery['location'],
   ...:                      eatery['park_id'],
   ...:                      eatery['type_name'])
   ...:     eateries.append(details)

In [5]: print(eateries[0])
Eatery(name='Mapes Avenue Ballfields Mobile Food Truck',
location='Prospect Avenue, E. 181st Street',
park_id='X289', type_name='Mobile Food Truck')
```



## Leveraging namedtuples
- Each field is available as an attribute of the namedtuple

```python
In [1]: for eatery in eateries[:3]:
   ...:     print(eatery.name)
   ...:     print(eatery.park_id)
   ...:     print(eatery.location)

Mapes Avenue Ballfields Mobile Food Truck
X289
Prospect Avenue, E. 181st Street

Claremont Park Mobile Food Truck
X008
East 172 Street between Teller & Morris avenues

Slattery Playground Mobile Food Truck
X085
North corner of Valenti Avenue & East 183 Street
```






```python
# Create the namedtuple: DateDetails
DateDetails = namedtuple('DateDetails', ['date', 'stop', 'riders'])

# Create the empty list: labeled_entries
labeled_entries = []

# Iterate over the entries
for date, stop, riders in entries:
    # Append a new DateDetails namedtuple instance for each entry to labeled_entries
    entry = DateDetails(date,stop,riders)
    labeled_entries.append(entry)
    
# Print the first 5 items in labeled_entries
print(labeled_entries[:5])

```

```python
# Iterate over the first twenty items in labeled_entries
for item in labeled_entries[:20]:
    # Print each item's stop
    print(item.stop)

    # Print each item's date
    print(item.date)

    # Print each item's riders
    print(item.riders)
    
    ```