# [CptS 215 Data Analytics Systems and Algorithms](https://github.com/gsprint23/cpts215)
[Washington State University](https://wsu.edu)

[Gina Sprint](http://eecs.wsu.edu/~gsprint/)
# Built-in Data Structures

Learner objectives for this lesson:
* Work with commonly used built-in Python data structures
    * Strings
    * Lists
    * Tuples
    * Dictionaries
* Learn about object aliasing
* Pass arguments into programs via command line arguments

Content used in this lesson is based upon information in the following sources:
* None to report

## Review of Strings
Recall a string is a *sequence of characters*. In Python, we can have sequences of items other than characters. For example, we can have sequences of:
* Numbers
    * Integers
    * Floats
* Objects
    * Strings
    * Files
    * Turtles
    * Our own objects we define ourselves (to be learned later, stay tuned!)

## Lists
A list is a *sequence of items*. In a string, the items are characters. In a list, they can be any type. Items in a list are also called *elements*.

We declare a sequence of items as a list with hard brackets: `[<comma separated list items>]`

In [1]:
list_ints = [0, 1, 10, 20]
print(list_ints)

list_floats = [0.2, 0.4, 0.6, 1.0]
print(list_floats)

# types can be mixed in a list
list_numbers = [0, 0.0, 1, 1.0, -2]
print(list_numbers)

list_strings = ["cat", "dog", "bird"]
print(list_strings)

[0, 1, 10, 20]
[0.2, 0.4, 0.6, 1.0]
[0, 0.0, 1, 1.0, -2]
['cat', 'dog', 'bird']


Note: the data types in a list need not all be the same.

### List Indexing
Just like with strings, list indices are 0-based. We can index into a list to access a list item just like how we indexed into a string to get an individual character:

In [4]:
print(list_ints[0])

0


### List Length
We can also use then `len()` function to determine the number of items in a list:

In [5]:
print(len(list_strings))
print(list_strings[len(list_strings) - 1])

3
bird


### The Empty List
Just like how we can have an empty string (`""`), a string with no characters, we can have an empty list (`[]`). An empty list has no items.

In [10]:
empty_list = []
print(len(empty_list))

0


### Nested Lists
We can have lists of lists!

In [4]:
nested_list = [[0, 1], [2], [3], [4, 5], []]
print(nested_list)

[[0, 1], [2], [3], [4, 5]]


Note: the sub-lists can be of unequal lengths.

Now, consider the following nested list:

`matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]`

Logically, `matrix` looks like the following:

|Column Index:||||
|-|-|-|-|
||**0**|**1**|**2**|
|**Row Index**||||
|**0:**|1|2|3|
|**1:**|4|5|6|
|**2:**|7|8|9|

To access an item in a 2-dimensional nested list, we index into the `<nested list variable>` twice: `<nested list variable>[row_index][column_index]` by row first then column. For example: 

In [6]:
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(matrix)
# the first element in the first list
print(matrix[0][0])
# the last element in the last list
print(matrix[2][2])
# the middle element in the last list (8)
print(matrix[2][1])

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
1
9
8


### Lists are Mutable
Unlike strings, we can change the items in a list:

In [8]:
buildings = ["Sloan", "EME", "Dana", "ETRL"]
print(buildings)

# modify the list
buildings[2] = "Carpenter"
print(buildings)

['Sloan', 'EME', 'Dana', 'ETRL']
['Sloan', 'EME', 'Carpenter', 'ETRL']


Note: We still cannot change a string. Strings are immutable!

## Looping Through List Items
Just like with strings, we can use the `in` operator or indices to iterate through items in a list:

In [6]:
candies = ["twix", "reeses", "oreos", "snickers"]

for candy in candies:
    print(candy)
    
i = 0
while i < len(candies):
    print(candies[i])
    i += 1
    
i = 0
for i in range(len(candies)):
    print(candies[i])

twix
reeses
oreos
snickers
twix
reeses
oreos
snickers
twix
reeses
oreos
snickers


## List Operators
### List Concatenation
Just like with strings, we can use the concatenation `+` operator to add lists together:

In [23]:
candies = ["twix", "reeses", "oreos", "peach rings"]

print(candies)
candies += ["m&ms", "starburst"]
print(candies)

['twix', 'reeses', 'oreos', 'peach rings']
['twix', 'reeses', 'oreos', 'peach rings', 'm&ms', 'starburst']


### List Repetition
Just like with strings, we can repeat items in a list with the repetition `*` operator:

In [24]:
bag_o_candies = 5 * ["twix"]
print(bag_o_candies)

bag_o_candies += 3 * ["peach rings"]
print(bag_o_candies)

['twix', 'twix', 'twix', 'twix', 'twix']
['twix', 'twix', 'twix', 'twix', 'twix', 'peach rings', 'peach rings', 'peach rings']


### List Slicing
Just like with strings, we can use the slice operator `:` with lists:

In [8]:
print(candies[1:3])
# returns a copy
print(candies[:])

['reeses', 'oreos']
['twix', 'reeses', 'oreos', 'snickers']
False
True


However, since lists are mutable, we can now change multiple items in a list at a time using slices:

In [22]:
candies = ["twix", "reeses", "oreos", "peach rings"]
print(candies)
candies[3:] = ["butterfinger", "heath", "swedish fish"]
print(candies)
candies[0:2] = ["carmello", "airheads"]
print(candies)

['twix', 'reeses', 'oreos', 'peach rings']
['twix', 'reeses', 'oreos', 'butterfinger', 'heath', 'swedish fish']
['carmello', 'airheads', 'oreos', 'butterfinger', 'heath', 'swedish fish']


## List Methods
Just like with strings, lists are objects that have methods we can utilize. 

### `append()`
For example, since lists are mutable, there is an `append(<new item>)` method to add an item to the end of a list:

In [3]:
cities = ["Pullman", "Spokane"]
print(cities)

# adds the string as an item
cities.append("Seattle")
print(cities)

# adds the list as an item
cities.append("Moscow")
print(cities)

['Pullman', 'Spokane']
['Pullman', 'Spokane', 'Seattle']
['Pullman', 'Spokane', 'Seattle', 'Moscow']


As review, how could we achieve the same functionality as `append()` without using `append()`?

In [None]:
cities = ["Pullman", "Spokane"]
print(cities)

# adds the strings as an item
cities += ["Seattle"]
print(cities)

### `extend()`
`extend()` is similar to `append()`; however, `extend()` takes a list as an argument and adds each item to the list:

In [4]:
cities = ["Pullman", "Spokane"]
print(cities)

# adds each string in the list as an item
cities.extend(["Seattle", "Couer d'Alene"])
print(cities)

['Pullman', 'Spokane']
['Pullman', 'Spokane', 'Seattle', "Couer d'Alene"]


What would happen if we used `append()` instead of `extend()` in the above code?

In [5]:
cities = ["Pullman", "Spokane"]
print(cities)
cities.append(["Seattle", "Couer d'Alene"])
print(cities)

['Pullman', 'Spokane']
['Pullman', 'Spokane', ['Seattle', "Couer d'Alene"]]


`cities` becomes a nested list!

### `sort()`
Many applications require lists of items to be sorted. In CptS121, you will learn how to write your own sorting algorithms. For now, we will use the `sort()` list method:

In [6]:
cities = ["Pullman", "Spokane", "Seattle", "Couer d'Alene"]
print(cities)

# ascending order
cities.sort()
print(cities)

['Pullman', 'Spokane', 'Seattle', "Couer d'Alene"]
["Couer d'Alene", 'Pullman', 'Seattle', 'Spokane']


How would you sort a list in descending order? Try using `help(cities.sort)` to find out:

In [7]:
help(cities.sort)

Help on built-in function sort:

sort(...) method of builtins.list instance
    L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE*



In [8]:
print(cities)
cities.sort(reverse=True)
print(cities)

["Couer d'Alene", 'Pullman', 'Seattle', 'Spokane']
['Spokane', 'Seattle', 'Pullman', "Couer d'Alene"]


## Deleting Items in a List
Since lists are mutable, we can delete items in a list. 

### Single Item Deletes
We have two list methods to delete a *single* item in a list
1. When you know the *index* of the item to delete
    * `pop(<index>)`
1. When you know the *value* of the item to delete
    * `remove(<item>)`

In [9]:
cities = ["Pullman", "Spokane", "Seattle", "Couer d'Alene"]

# pop returns the item removed
city = cities.pop(2)
print(city)
print(cities)

# remove does not return the item removed
cities.remove("Spokane")
print(cities)

Seattle
['Pullman', 'Spokane', "Couer d'Alene"]
['Pullman', "Couer d'Alene"]


### `del` Keyword and Multiple Item Deletes
Alternatively, we can delete an object using the `del` reserved keyword:

In [10]:
cities = ["Pullman", "Spokane", "Seattle", "Couer d'Alene"]
print(cities)

# del is not a function
del cities[1]
print(cities)

['Pullman', 'Spokane', 'Seattle', "Couer d'Alene"]
['Pullman', 'Seattle', "Couer d'Alene"]


We may want to delete multiple items at a time. We can do this with a slice and `del`:

In [11]:
cities = ["Pullman", "Spokane", "Seattle", "Couer d'Alene"]
print(cities)

del cities[0:3]
print(cities)

['Pullman', 'Spokane', 'Seattle', "Couer d'Alene"]
["Couer d'Alene"]


### Relationship Between Strings and Lists
A list of single character strings is not a string:

In [12]:
my_list = ["c", "p", "t", "s", "1", "1", "1"]
print("%s" %(my_list))

['c', 'p', 't', 's', '1', '1', '1']


### `join()` (string method)
However, we can turn a list of strings into a string with the `join()` string method. We need to specify a "delimiter" string to use to concatenate the individual strings in a list into a single string:

In [1]:
my_list = ["c", "p", "t", "s", "2", "1", "5"]
delimiter = '' # empty string
my_string = delimiter.join(my_list)
print("%s" %(my_string))

delimiter = ':)'
my_string = delimiter.join(my_list)
print("%s" %(my_string))

cpts215
c:)p:)t:)s:)2:)1:)5


### `list()` (function)
To convert the string back into a list, we can type cast the string into a list with `list()`:

In [14]:
my_string = "cpts215"
my_list = list(my_string)
print(my_list)

['c', 'p', 't', 's', '2', '1', '5']


### `split()` (string method)
`split(<string delimiter>)` breaks a string into pieces at each `<string delimiter>`. The pieces are returned as a list: 

In [15]:
sentence = "hello how are you"
pieces = sentence.split(" ")
print(pieces)

['hello', 'how', 'are', 'you']


## Aliasing
When we declare a list variable, as in `list1 = [0, 1, 2, 3]`, a list *object* is created. We say the variable `list1` is a *reference* to the list object `[0, 1, 2, 3]`. In memory, this looks like the following:
![](https://raw.githubusercontent.com/gsprint23/cpts215/master/lessons/figures/reference_example.png)

If we declare another list variable, `list2 = [0, 1, 2, 3]`, `list2` refers to a *different* list object, even though both objects that `list1` and `list2` refer to contain the same items:
![](https://raw.githubusercontent.com/gsprint23/cpts215/master/lessons/figures/references_multiple_example.png)

We can test if `list1` and `list2` refer to lists that contain the same elements:

In [2]:
list1 = [0, 1, 2, 3]
list2 = [0, 1, 2, 3]
print(list1 == list2)

True


To test if `list1` and `list2` *refer* to the same list object, we can use the Python reserved keyword, `is`. `is` tests whether two variables refer to the same object: 

In [3]:
list1 = [0, 1, 2, 3]
list2 = [0, 1, 2, 3]
print(list1 is list2)

False


Note: Python is intelligent! Since strings are immutable, only one object is created in the following code:

In [4]:
string1 = "hello"
string2 = "hello"
print(string1 == string2)
print(string1 is string2)

True
True


In the above code, both `string1` and `string2` refer to the same string object. This phenomenon is called *aliasing*. 

Let's return to our list example and see aliasing at work. 

If instead of assigning `list2` to a new list object, we assign `list2` to `list1`: `list2 = list1`, `list2` refers to the same object as `list1`.
![](https://raw.githubusercontent.com/gsprint23/cpts215/master/lessons/figures/alias_example.png)

We now say the object is *aliased*, because it has more than one reference, or alias.

If the aliased object is mutable, either reference can modify the object:

In [5]:
# same object aliased by list1 and list2
list1 = [0, 1, 2, 3]
list2 = list1
print(list1)
print(list2)
list2[2] = 100
print(list1)
print(list2)
print("\n")

# compared to creating two separate objects list1 and list2
list1 = [0, 1, 2, 3]
list2 = [0, 1, 2, 3]
print(list1)
print(list2)
list2[2] = 100
print(list1)
print(list2)

[0, 1, 2, 3]
[0, 1, 2, 3]
[0, 1, 100, 3]
[0, 1, 100, 3]


[0, 1, 2, 3]
[0, 1, 2, 3]
[0, 1, 2, 3]
[0, 1, 100, 3]


Aliasing is important to keep in mind, especially when passing lists as arguments.

## Lists Arguments
We can pass lists into functions as arguments:

In [None]:
def pretty_print_list(list_to_print):
    '''
    
    '''
    for value in list_to_print:
        print(value, end=" ")

numbers = [0.0, 0.2, 0.4]
pretty_print_list(numbers)

When a list is passed as an argument to a function, the function parameter variable is a *reference* to the list, making the list *aliased*. This means that if we modify a list in our function, the change to the object persists and the calling code will see the change.

In the example above, `numbers` and `list_to_print` are aliases to the list object `[0.0, 0.2, 0.4]`. If `pretty_print_list()` can use `list_to_print` to modify the object. 

Let's write a new function, `add_one()`, that adds one to each value in a list:

In [None]:
def add_one(list_arg):
    '''
    
    '''
    for i in range(len(list_arg)):
        list_arg[i] += 1

numbers = [0.0, 0.2, 0.4]
print(numbers)
add_one(numbers)
print(numbers)

## Returning Lists
We can write functions that return lists. Consider a function that returns a list of numbers from arguments `start_index` to `end_index + 1`:

In [None]:
def create_sequence(start_index, end_index):
    '''
    
    '''
    sequence = []
    
    for i in range(start_index, end_index):
        sequence.append(i)
    return sequence

first_ten_nums = create_sequence(0, 10)
print(first_ten_nums)

## Command Line Arguments
We can pass arguments into our Python programs. The arguments will be stored in a list, referenced by `sys.argv`. Note: we will have to `import sys` to get access to `sys.argv`.

The first argument is always the name of the script, and is counted in the total number of command line arguments:

In [1]:
import sys

print(sys.argv)
print(len(sys.argv))

['C:\\Anaconda3\\lib\\site-packages\\ipykernel\\__main__.py', '-f', 'C:\\Users\\gsprint\\AppData\\Roaming\\jupyter\\runtime\\kernel-1e668b0e-6bd8-4985-b65d-41430143d283.json']
3


## Tuples
Tuples are immutable lists. They are declared as a comma separated list, with or without parentheses:

In [6]:
my_tuple = "x", "y", "z"
print(my_tuple)
print(type(my_tuple))

# need a comma after a single element initialization
my_tuple2 = (1, )
print(my_tuple2)

# need a comma after a single element initialization
not_a_tuple = ("a")
print(not_a_tuple)
print(type(not_a_tuple))

# creating an empty tuple
empty_tuple = tuple()
print(empty_tuple)
print(type(empty_tuple))

('x', 'y', 'z')
<class 'tuple'>
(1,)
a
<class 'str'>
()
<class 'tuple'>


Tuple indexing and slicing works the same as for lists:

In [7]:
my_tuple = ("x", "y", "z")
print(my_tuple[1])
print(my_tuple[0:2])

y
('x', 'y')


HOWEVER, tuples are immutable, so you cannot modify them. The follow code demonstrates the immutability of tuples:

In [1]:
my_tuple = ("x", "y", "z")
# crashes! tuples are immutable, you cannot change them
my_tuple[2] = "a"

TypeError: 'tuple' object does not support item assignment

## Key-Value Pairs
Consider the following set of items:
* Your student ID number
* Your checking account number
* The VIN number on your car
* Your social security number

What do all of the above items have in common? They are all *unique* identifiers for something. For example, there may be several students named "John Smith" at WSU. How is the university to distinguish academic records for multiple John Smiths? They assign a unique *key* to identify each individual student:

|ID Number|Last name|First name|
|-|-|-|
|28905|Smith|Jane|
|19485|Smith|John|
|28450|Smith|John|
|25543|Smith|John|
|17834|Smith|Justin|

For the other examples, your checking account number is a key that uniquely identifies your account, the VIN is a key that uniquely identifies your car, and your SSN is a key that uniquely identifies you for government purposes.

Keys are useful because they *map* keys to values. In the example above, a student ID number of 28905 maps to the academic records of Jane Smith at WSU. The academic record of Jane Smith is called the *value* that the *key* (ID number) maps to. Together, the ID number (28905) and the record (Jane Smith's academic record) form a *key-value pair*. 

Keys can be represented as a list of unique values (no duplicates). Values can be represented as a list as well (can have duplicates). A single data structure that combines key lists and value lists is called a *dictionary*.

## Dictionaries
A *dictionary is a list with keys as indices*. Keys can be integers, strings, file objects, etc. Keys cannot be lists. To declare a dictionary, we use the curly braces `{ }`:

In [8]:
# declares an empty dictionary
my_dict = {}
print(my_dict)
# can also use dict()
my_second_dict = dict()
print(my_second_dict)

{}
{}


We can initialize a dictionary with values using comma separated `key:value` pairs:

In [9]:
state_capitals = {'washington': 'olympia', 'idaho': 'boise', 'oregon': 'portland'}
print(state_capitals)

{'idaho': 'boise', 'washington': 'olympia', 'oregon': 'portland'}


We can create a dictionary from a list of tuples, where each tuple in the list is a key-value pair:

In [10]:
# roman numerals
key_values = [("I", 1), ("V", 5), ("X", 10), ("L", 50)]
roman_numerals = dict(key_values)
print(roman_numerals)

{'V': 5, 'L': 50, 'I': 1, 'X': 10}


We can also convert a dictionary back to a list of tuples with the dictionary method `items()` and the built-in function `list()`:

In [11]:
list_of_tuples = list(roman_numerals.items())
print(list_of_tuples)

[('V', 5), ('L', 50), ('I', 1), ('X', 10)]


### Compatible Dictionary Data Types
#### Keys
Dictionary keys can be integers, strings, files, tuples, etc.. Lists cannot be keys.

#### Values
Values can be any type. For example, we can have string keys and list values:

In [12]:
fruit_colors = {'kiwi': ['brown', 'green'], 'banana': ['yellow'], 'watermelon': ['green', 'red']}
print(fruit_colors)

{'watermelon': ['green', 'red'], 'banana': ['yellow'], 'kiwi': ['brown', 'green']}


### Dictionary Indexing
We can access an item via a key using hard brackets `[ ]` (similar to indexing into a list):

In [13]:
state_capitals = {'washington': 'olympia', 'idaho': 'boise', 'oregon': 'portland'}
print("The capital of idaho is %s" %(state_capitals['idaho']))

The capital of idaho is boise


### Adding Key-Value Pairs
Since dictionaries are *mutable*, we can add key-value pairs to the dictionary using hard brackets `[ ]`:

In [14]:
state_capitals = {'washington': 'olympia', 'idaho': 'boise', 'oregon': 'portland'}
print(state_capitals)

state_capitals['montana'] = 'helena'
print(state_capitals)

{'idaho': 'boise', 'washington': 'olympia', 'oregon': 'portland'}
{'idaho': 'boise', 'washington': 'olympia', 'montana': 'helena', 'oregon': 'portland'}


Note: keys in a dictionary are not sorted in any particular order.

### Dictionary Length with `len()`
We can still determine the number of items (key-value pairs) in a dictionary with `len()`:

In [15]:
state_capitals = {'washington': 'olympia', 'idaho': 'boise', 'oregon': 'portland'}
print(len(state_capitals))

3


### Existence of a Key
We can also test if a key is a valid key in the dictionary with the `in` keyword:

In [16]:
state_capitals = {'washington': 'olympia', 'idaho': 'boise', 'oregon': 'portland'}

print('california' in state_capitals)
print('idaho' in state_capitals)
print('olympia' in state_capitals)

False
True
False


## Looping through a Dictionary
We can traverse a dictionary easily with a `for` loop that walks through each key in the dictionary:

In [17]:
sides = {'square': 4, 'triangle': 3, 'pentagon': 5, 'rectangle': 4}

for side in sides:
    print(side, sides[side], sep= ": ")

triangle: 3
rectangle: 4
square: 4
pentagon: 5


## Example Problem: Letter Frequencies
Suppose we want to keep track of the frequency of letters in a word. For example, the word "hello" has 4 letters with the following frequencies:
* h: 1
* e: 1
* l: 2
* o: 1

Let's write a program to prompt the user to enter a word. Our program will tell the user the frequency of each letter in the word. We could solve this problem with either a list or a dictionary:
* List solution
    1. Create a list with 26 zeros
    1. Write a function to convert a letter into an integer in the range [0-25] to index into the list. We can do this with the `ord(<character>)` function and ASCII codes...
    1. Walk through the word and increment the corresponding list position for each letter
    1. Convert the index of non-zero list entries back to characters using `char(<integer>)` to print out the histogram results
* Dictionary solution
    1. Create an empty dictionary
    1. Walk through the word and add the letter to the dictionary with a count of zero if the letter is not already a key, increment otherwise.
    
The dictionary solution lends itself more suitable to this problem because we do not have to allocate space for all letters ahead of time and we don't have to perform a character to integer conversion to index into the data structure.

In [None]:
def compute_letter_frequencies(word):
    '''
    
    '''
    histogram = {}
    
    for letter in word:
        if letter in histogram:
            histogram[letter] += 1
        else:
            histogram[letter] = 1
    return histogram

print(compute_letter_frequencies("hello"))
print(compute_letter_frequencies("mississippi"))

Compared to the list solution:

In [None]:
def letter_to_index(letter):
    '''
    
    '''
    ascii_val = ord(letter)
    index = ascii_val - ord('a')
    return index

def index_to_letter(index):
    '''
    
    '''
    ascii_val = index + ord('a')
    letter = chr(ascii_val)
    return letter
    
def compute_letter_frequencies_list(word):
    '''
    
    '''
    histogram = [0] * 26
    
    word.lower()
    for letter in word:
        index = letter_to_index(letter)
        histogram[index] += 1
    return histogram

def pretty_print(histogram):
    '''
    
    '''
    for i in range(len(histogram)):
        if histogram[i] != 0:
            letter = index_to_letter(i)
            print("%s: %d" %(letter, histogram[i]), end=" ")
    print("")

histogram = compute_letter_frequencies_list("hello")
pretty_print(histogram)

Note: We have now seen lists of tuples, lists of lists, dictionaries of lists, etc. In general, we can have sequences of sequences. The types of sequences that can be nested and the number of nesting levels is up to you, the programmer!