<a href="https://colab.research.google.com/github/prof-rossetti/intro-to-python/blob/master/notes/python/datatypes/Intermediate_Datatypes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Instructions

Make a copy of this notebook so you can edit and save your own version of it. Then read through the notebook and take opportunities to explore in the designated "exploration" code cells. 

# Intermediate Datatypes

Here are some datatypes that are essentially containers for other datatypes. We can use them to represent collections of related data:

  + Tuple
  + Set
  + List
  + Dictionary

We'll focus mainly on Lists and Dictionaries. 




# Tuples

Reference: 
  + https://docs.python.org/3.3/library/stdtypes.html?highlight=tuple#tuple
  + https://www.w3schools.com/python/python_tuples.asp

A "tuple" contains an ordered list of items, but unlike the "list" datatype, tuples are "immutable", meaning their values can't change and their items can't be re-assigned. Tuples may be faster to use than lists, but beginners can feel free to use lists instead for most purposes.


```python
coords = (5,4)

print(coords)
print(coords[0]) #> 5
print(coords[1]) #> 4
```

In [None]:
#
# EXPLORATION CELL (TUPLES)
#







# Sets

Reference: 

  + https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset
  + https://www.w3schools.com/python/python_sets.asp


A "set" is like a "list" in that it contains a number of items, but the main difference is that values in a list can be repeated, whereas values in a set must be unique. 

This makes sets faster than lists in some cases, but beginners can feel free to mainly use lists instead. The primary use case for sets will be to remove duplicate values from a list (by converting the list to a set and back again to a list).


```python
letters = {"a", "b", "c", "d", "e"}
print(sorted(letters)) #> ['a', 'b', 'c', 'd', 'e']

letters.add("f")
print(sorted(letters)) #> ['a', 'b', 'c', 'd', 'e', 'f']

letters.add("c") #> ['a', 'b', 'c', 'd', 'e', 'f']
print(sorted(letters))
```

```python
print(letters[0]) 
#> TypeError: 'set' object does not support indexing
```

```python
# example of removing duplicates from a list:
my_list = [1,1,2,2,3,3,3,3,3,3,3,3]
print(list(set(my_list))) #> [1, 2, 3]
```






In [None]:
#
# EXPLORATION CELL (SETS)
#





# Dictionaries



Reference:

  + https://docs.python.org/3/library/stdtypes.html#dict
  + https://docs.python.org/3/library/stdtypes.html#dictionary-view-objects
  + https://docs.python.org/3/tutorial/datastructures.html#dictionaries
  + https://docs.python.org/3/tutorial/datastructures.html#looping-techniques
  + https://www.w3schools.com/python/python_dictionaries.asp

Many programming languages provide an "associative array" datatype which represents an object with named attributes. Associative arrays are said to have "key/value" pairs, where the "key" represents the name of the attribute and the "value" represents the attribute's value.

Python's implementation of the associative array concept is known as a "dictionary". A Python dictionary comprises curly braces (`{}`) containing one or more key/value pairs, with each key separated from its value by a colon (`:`) and each key/value pair separated by a comma (`,`).

Example dictionaries:

```python
{}

{"a": 1, "b": 2, "c": 3}

{"a": 1, "b": 2, "c": 3, "fruits": ["apple", "banana", "pear"]} # dictionaries can contain lists, or even other nested dictionaries

{"first_name": "Ophelia", "last_name": "Clark", "message": "Hello Again"}
```

A list of dictionaries is conceptually similar to the structure of a CSV file, spreadsheet, or database record, where each dictionary's "keys" represent the column names and its "values" represent the cell values:

city | name | league
--- | --- | ---
New York | Yankees | major
New York | Mets | major
Boston | Red Sox | major
New Haven | Ravens | minor


```python
[
    {"city": "New York", "name": "Yankees", "league":"major"},
    {"city": "New York", "name": "Mets", "league":"major"},
    {"city": "Boston", "name": "Red Sox", "league":"major"},
    {"city": "New Haven", "name": "Ravens", "league":"minor"}
]
```



## Dictionary Operations

Accessing dictionary values by their key, or attribute name:

```python
person = {
    "first_name": "Ophelia",
    "last_name": "Clarke",
    "message": "Hi, thanks for the ice cream!",
    "fav_flavors": ["Vanilla Bean", "Mocha", "Strawberry"]
}

person["first_name"] #> "Ophelia"
person["last_name"] #> "Clark"
person["message"] #> "Hi, thanks for the ice cream!"
person["fav_flavors"] #> ["Vanilla Bean", "Mocha", "Strawberry"]
person["fav_flavors"][1] #> "Mocha" (a list is still a list, even if it exists inside a dictionary!)
```

Adding, updating, or removing attributes from a dictionary:

```python
person = {
    "first_name": "Ophelia",
    "last_name": "Clarke",
    "message": "Hi, thanks for the ice cream!",
    "fav_flavors": ["Vanilla Bean", "Mocha", "Strawberry"]
}

# adding an attribute:
person["message"] = "New Message" # this is mutating

# updating an attribute:
person["fav_color"] = "blue" # this is mutating

# removing an attribute:
del person["fav_flavors"] # this is mutating

person #> {'first_name': 'Ophelia', 'last_name': 'Clark', 'message': 'New Message', 'fav_color': 'blue' }
```

Accessing a dictionary's keys and/or values seperately, and iterating through them:

```python
person = {
    "first_name": "Ophelia",
    "last_name": "Clarke",
    "message": "Hi, thanks for the ice cream!",
    "fav_flavors": ["Vanilla Bean", "Mocha", "Strawberry"]
}

# KEYS:
person.keys()
#> dict_keys(['first_name', 'last_name', 'message', 'fav_flavors'])
list(person.keys())
#> ['first_name', 'last_name', 'message', 'fav_flavors']

# VALUES:
person.values()
#> dict_values(['Ophelia', 'Clark', 'Hi, thanks for the ice cream!', ["Vanilla Bean", "Mocha", "Strawberry"]])
list(person.values())
#> ['Ophelia', 'Clark', 'Hi, thanks for the ice cream!', ["Vanilla Bean", "Mocha", "Strawberry"]]

# PAIRS:
person.items()
#> dict_items([('first_name', 'Ophelia'), ('last_name', 'Clark'), ('message', 'Hi, thanks for the ice cream!'), ('fav_flavors', ["Vanilla Bean", "Mocha", "Strawberry"])])

for k, v in person.items():
    print("KEY:", k, "... VALUE:", v)

#> KEY: first ... VALUE: Ophelia
#> KEY: last ... VALUE: Clark
#> KEY: message ... VALUE: Hi, thanks for the ice cream!
#> KEY: fav_flavors ... VALUE: ["Vanilla Bean", "Mocha", "Strawberry"]
```

In [None]:
#
# EXPLORATION CELL (DICTIONARIES)
#





# Lists



Reference:
  + https://docs.python.org/3/library/stdtypes.html#lists
  + https://docs.python.org/3/tutorial/datastructures.html?highlight=lists#more-on-lists
  + https://www.w3schools.com/python/python_lists.asp

A "list" represents a numbered, ordered collection of items. A list may contain zero or more items. A list can contain items of any datatype, but as a best practice, all items in a list should share a datatype and structure:

```python
# DO:

[]

[1, 2, 3, 4]

[100, 75, 33]

["fun", "times", "right?"]

[{"a": 1, "b": 2}, {"a": 5, "b": 6}] # lists can contain dictionaries

[[1, 2, 3], [4, 5, 6], [7, 8, 9]] # lists can be "nested" inside other lists

# DON'T:

[100, "fun"] # mixed datatypes

[{"a": 1, "b": 2}, {"x": 5, "z": 6}] # non-standard dictionary keys

```



## List Operations

Accessing a list item by specifying its numeric index:

```python
arr = ["a", "b", "c", "d"]
arr[0] #> "a"
arr[1] #> "b"
arr[2] #> "c"
arr[3] #> "d"
arr[4] #> IndexError: list index out of range
```

> NOTE: List item indices are zero-based, meaning the index of the first list item is 0.

Getting the index for a given item (be careful if your list has duplicates though):

```python
arr.index("a") #> 0
arr.index("b") #> 1
arr.index("c") #> 2
arr.index("z") #> -1 (applies to any item not found in the list)
```

Adding an element to the end of a list:

```python
arr = ["a", "b", "c", "d"]
arr.append("e") # this is a mutating operation
arr #> ["a", "b", "c", "d", "e"]
```

Removing an element from a list, by specifying the index of the item you would like to remove:

```python
arr = ["a", "b", "c", "d"]
del arr[1] # this is a mutating operation
arr #> ['a', 'c', 'd']
```

Concatenating two lists:

```python
arr = ["a", "b", "c", "d"]
arr2 = ["x", "y", "z"]
arr3 = arr + arr2
arr3 #> ["a", "b", "c", "d", "x", "y", "z"]
```

Performing aggregations, like counting the number of items in a list:

```python
arr = [6,3,9,7]

len(arr) #> 4 (COUNT)
min(arr) #> 3 (MINIMUM)
max(arr) #> 9 (MAXIMUM)
```

Equality operators apply:

```python
[1,2,3] == [1,2,3] #> True
[1,2,3] == [3,2,1] #> False
```

Inclusion operators apply:

```python
arr = [1,2,3,4,5]

3 in arr #> True
3 not in arr #> False
```

In [None]:
#
# EXPLORATION CELL (LISTS)
#




### Sorting Lists




Sort a list:

```python
arr = [6,3,8]
arr.sort() # this is mutating
arr #> [3, 6, 8]

arr.reverse() # this is mutating
arr #> [8, 6, 3]
```

If you have a list of dictionaries, you should be able to sort it based on dictionary values:

```python
teams = [
    {"city": "New York", "name": "Yankees"},
    {"city": "New York", "name": "Mets"},
    {"city": "Boston", "name": "Red Sox"},
    {"city": "New Haven", "name": "Ravens"}
]

def team_name(team):
    return team["name"]

def sort_by_hometown(team):
    return team["city"]

def sort_special(team):
    return team["city"] + "-" + team["name"]

teams2 = sorted(teams, key=team_name)
teams2 #> [{'city': 'New York', 'name': 'Mets'}, {'city': 'New Haven', 'name': 'Ravens'}, {'city': 'Boston', 'name': 'Red Sox'}, {'city': 'New York', 'name': 'Yankees'}]

teams3 = sorted(teams, key=sort_by_hometown)
teams3 #> [{'city': 'Boston', 'name': 'Red Sox'}, {'city': 'New Haven', 'name': 'Ravens'}, {'city': 'New York', 'name': 'Yankees'}, {'city': 'New York', 'name': 'Mets'}]

teams4 = sorted(teams, key=sort_special)
teams4 #> [{'city': 'Boston', 'name': 'Red Sox'}, {'city': 'New Haven', 'name': 'Ravens'}, {'city': 'New York', 'name': 'Mets'}, {'city': 'New York', 'name': 'Yankees'}]
```

Alternatively for simple attribute-based sorting, you could use the [`operator` module's](https://docs.python.org/3/library/operator.html) `itemgetter()` function, for example:

```python
import operator

teams = [
    {"city": "New York", "name": "Yankees"},
    {"city": "New York", "name": "Mets"},
    {"city": "Boston", "name": "Red Sox"},
    {"city": "New Haven", "name": "Ravens"}
]

teams = sorted(teams, key=operator.itemgetter('city'))
teams #> [{'city': 'Boston', 'name': 'Red Sox'}, {'city': 'New Haven', 'name': 'Ravens'}, {'city': 'New York', 'name': 'Yankees'}, {'city': 'New York', 'name': 'Mets'}]
```


In [None]:
#
# EXPLORATION CELL (SORTING LISTS)
#




### Iterating Lists 




Reference:

  + https://docs.python.org/3/tutorial/datastructures.html?#list-comprehensions
  + https://docs.python.org/3/tutorial/datastructures.html?highlight=lists#list-comprehensions
  + https://docs.python.org/3/tutorial/controlflow.html#for-statements
  + https://www.w3schools.com/python/python_for_loops.asp


A list can be iterated, or "looped" using a `for ... in ...` statement:

```python
for letter in ["a", "b", "c", "d"]:
    print(letter)

#> a
#> b
#> c
#> d
```

> TIP: If it helps, you can vocalize this like "for each item in the list of items, do something with that item"




In [None]:
#
# EXPLORATION CELL (ITERATING LISTS)
#




### Mapping Lists



The simplest way to map, or transform, each value in a list is to initialize an empty list, then loop through the original list and append a transformed version of each item into the new list:

```python
numbers = [1, 2, 3, 4]

bigger_numbers = []
for n in numbers:
    bigger_numbers.append(n * 100)
 
print(bigger_numbers) #> [100, 200, 300, 400]
```


Lists can be transformed "in-place" using Python's built-in `map()` function. The `map()` function takes two parameters. The first parameter is the name of a pre-defined function to perform on each item in the list. The function should accept a single parameter representing a single list item. The second parameter is the actual list to be operated on:

```python
arr = [1, 2, 3, 4]

def enlarge(num):
    return num * 100

arr2 = map(enlarge, arr)
arr2 #> <map object at 0x10c62e710>
list(arr2) #> [100, 200, 300, 400]
```


#### List Comprehensions

Another way of mapping is to use a List Comprehension:

```python
arr = [1, 2, 3, 4]

[i * 100 for i in arr] #> [100, 200, 300, 400]
```

```python
teams = [
    {"city": "New York", "name": "Yankees"},
    {"city": "New York", "name": "Mets"},
    {"city": "Boston", "name": "Red Sox"},
    {"city": "New Haven", "name": "Ravens"}
]

[team["name"] for team in teams] #> ['Yankees', 'Mets', 'Red Sox', 'Ravens']
```












In [None]:
#
# EXPLORATION CELL (MAPPING LISTS)
#




### Filtering Lists
















Reference: https://docs.python.org/3/library/functions.html#filter.

Use the `filter()` function to select a subset of items from a list - only those items matching a given condition. The filter function accepts the same parameters as the `map()` fuction:

```python
arr = [1,2,4,8,16]

def all_of_them(i):
    return True # same as ... return i == i

def equals_two(i):
    return i == 2

def greater_than_two(i):
    return i > 2

def really_big(i):
    return i > 102

filter(all_of_them, arr) #> <filter at 0x103fa71d0>
list(filter(all_of_them, arr)) #> [1, 2, 4, 8, 16]
list(filter(equals_two, arr)) #> [2]
list(filter(greater_than_two, arr)) #> [4, 8, 16]
list(filter(really_big, arr)) #> []
```

> Note: depending on how many items matched the filter condition, the resulting filtered list may be empty, or it may contain one item, or it may contain multiple items

When using the filter function, observe this alternative filtering syntax involving the keyword `lambda`:

```python
arr = [1,2,4,8,16]
list(filter(lambda i: i > 2, arr)) #> [4, 8, 16]
```

If your list is full of dictionaries, you can `filter()` based on their attribute values:

```python
teams = [
    {"city": "New York", "name": "Yankees"},
    {"city": "New York", "name": "Mets"},
    {"city": "Boston", "name": "Red Sox"},
    {"city": "New Haven", "name": "Ravens"}
]

def yanks(obj):
    return obj["name"] == "Yankees"

def from_new_york(obj):
    return obj["city"] == "New York"

def from_new_haven(obj):
    return obj["city"] == "New Haven"

def from_new_something(obj):
    return "New" in obj["city"]

list(filter(yanks, teams)) #> [{...}]
list(filter(from_new_york, teams)) #> [{...}, {...}]
list(filter(from_new_haven, teams)) #> [{...}]
list(filter(from_new_something, teams)) #> [{...}, {...}, {...}]
```

If you need to implement complex filtering conditions, consider using a list comprehension, or "lambda" syntax, or consider writing out your function the long way:

```python
teams = [
    {"city": "New York", "name": "Yankees"},
    {"city": "New York", "name": "Mets"},
    {"city": "Boston", "name": "Red Sox"},
    {"city": "New Haven", "name": "Ravens"}
]

# using a list comprehension
def teams_from(city):
  return [team for team in teams if team["city"] == city]

# using "lambda" syntax
def teams_from2(city):
  return list(filter(lambda team: team["city"] == city, teams))

# the long way
def teams_from3(city):
  matches = []
  for team in teams:
      if team["city"].upper() == city.upper():
          matches.append(team)
  return matches

print(teams_from("New York")) #> [{'city': 'New York', 'name': 'Yankees'}, {'city': 'New York', 'name': 'Mets'}]
print(teams_from2("New York")) #> [{'city': 'New York', 'name': 'Yankees'}, {'city': 'New York', 'name': 'Mets'}]
print(teams_from3("New York")) #> [{'city': 'New York', 'name': 'Yankees'}, {'city': 'New York', 'name': 'Mets'}]
```


In [None]:
#
# EXPLORATION CELL (FILTERING LISTS)
#






### Grouping Lists

Reference the [`itertools` module](https://github.com/prof-rossetti/intro-to-python/blob/master/notes/python/modules/itertools.md) for grouping operations. If you use this approach, it may require sorting the list before grouping.

```python
import itertools
from operator import itemgetter

teams = [
    {"city": "New York", "name": "Yankees"},
    {"city": "New York", "name": "Mets"},
    {"city": "Boston", "name": "Red Sox"},
    {"city": "New Haven", "name": "Ravens"}
]

sorted_teams = sorted(teams, key=itemgetter("city")) # sort by some attribute

teams_by_city = itertools.groupby(sorted_teams, key=itemgetter("city")) # group by the sorted attribute
#> <itertools.groupby object at 0x10339dc50>

for city, teams in teams_by_city:
    print("----------------------------")
    print(city.upper() + ":")
    for team in teams:
        print("  + " + team["name"])

#> ----------------------------
#> BOSTON:
#>   + Red Sox
#> ----------------------------
#> NEW HAVEN:
#>   + Ravens
#> ----------------------------
#> NEW YORK:
#>   + Yankees
#>   + Mets
```

In [None]:
#
# EXPLORATION CELL (GROUPING LISTS)
#



