# Data Structures


## What's a data structure?
As its name implies, a data structure is a container that holds data. Just like some post office boxes hold packages and others hold letters, Python's built-in data structures have different purposes and uses. Use data structures to organize and perform operations on data. Python has the following built-in data structures: Lists, Dictionaries, Sets, and Tuples. Each container has different attributes and is used for a different purpose.

sources: (W3Schools, RealPython.com)

<img src="./images/post_office_boxes.jpg" align="middle">

## Comparing Built-in Data Structures
Below is a comparsion of four built-in data structures in Python. 

<img src="./images/structures.jpg" align="middle">

# Tuples
What is the proper pronunication of "tuple"? Answer: either TEW-pull or Tupple (like the 'u' sound in pup). 

* Ordered sequence of elements
* Tuples are immutable
* Parentheses denote a tuple

In [None]:
t = 99

In [None]:
# Create an empty tuple

t = ()
t = (4, "hello", True, 3.1)

In [None]:
print(t[3])

In [None]:
# Concatenation with a tuple

print(t)
print("Concatenate '7'")
print((t) + (7,)) # Note the comma--the comma tells Python that this is a tuple and not an int

In [None]:
# Iterating a tuple
for v in t:
    print(v)

In [None]:
def tip_options(amount):
    # Use a tuple to return more than one value
    return(amount, amount*1.10, amount*1.15, amount*1.2)

print(tip_options(30))
print(type(tip_options(30)))

In [None]:
# Use a tuple to swap values
y = 5
x = 10
print('x =', x, 'y =', y)
(x, y) = (y, x)
print('x =', x, 'y =', y)

# Lists

A list is an ordered sequence of *items*. Lists are similar to arrays in other languages. One difference is that Lists can contain different types of data.

## Creating Lists
 
Lists are created using several methods.

In [None]:
#Use square brackets to make list

cities = [] # an empty list

In [None]:
print(cities)

In [None]:
type(cities)

In [None]:
cities = ["Dallas","Chicago","Miami","Grand Rapids" ]
print(cities," is of type ", type(cities))

In [None]:
# Get the length of your list using the len() function
len(cities)

In [None]:
#Ordered -- accessible via index
print(cities[2]) # Print the third item in a list

In [None]:
# Print the last item in a list
print(cities[-1])

In [None]:
print(cities[1:3]) # print the second and third items. Second value is not inclusive.

In [None]:
print(cities[1:]) # print the second item through the end of the list.

In [None]:
print(cities[1::2]) # Start at 2nd item (index=1) and print every other (increment=2)

In [None]:
# Reverse a list (Step -1)
print(cities[::-1])

In [None]:
# Lists are not limited to containing only values of a single type
# A list may contain objects such as another list 
my_list = [True, 0, "Greg Bott", 3.14159, ["steak","eggs","donuts"]]
print(my_list)

## Lists are Mutable

In [None]:
print(cities)
cities[2] = "San Antonio" # Replace the third entry ('Miami') with 'San Antonio'
print(cities) # mutated list object

## Adding items to a list
* Use append to add elements to the end of a list. This operation *mutates* the list.
```Python
<list>.append(element)
``` 

Combine lists using the extend() function or + operator
```Python
<list>.extend(element)

<list> + <list>
``` 

Insert an item at specified index location using insert().



In [None]:
# Use the append() method to add an item to the list
print(cities)
print('Adding Columbia...')
cities.append("Columbia")
print(cities)

In [None]:
# Using append() to add multiple items results in a list within a list
more_cities = ['St. Louis', 'Tempe', 'Atlanta']

# append() accepts only 1 argument (an interable)
cities.append(more_cities) 
print(cities)

In [None]:
cities[5][2]

In [None]:
# Use insert() to add an item to specific location in the list. 
print(cities)
cities.insert(1, 'Austin') # Insert Austin in the second position
print(cities)

In [None]:
# Reset the list to the original cities
cities = ['Dallas', 'Austin', 'Chicago', 'Miami', 'Grand Rapids', 'Columbia']

In [None]:
more_cities = ['St. Louis', 'Tempe', 'Atlanta']

# Use extend() when you want to add multiple values to a list
cities.extend(more_cities)
print(cities)

## Copying a list
If you use the following expression to create a new list, what you have is two references to a single object, NOT two lists.
```Python
list_a = list_b
```

In [None]:
list_a = [1,2,3,4,5]

list_b = list_a
print("a=",list_a)
print("b=",list_b)

In [None]:
# Change ONLY list_b
list_b[0] = 'Protein bar'
print("a=",list_a)
print("b=",list_b)

Even though I did not change list a, changing list b also changed list a because the two lists are not two separate objects. They are two references pointing to the same object.

In [None]:
list_a = [1,2,3,4,5]

# To make a *copy* of the object instead of referencing the same object, use copy()
list_b = list_a.copy()
print("a=",list_a)
print("b=",list_b)

In [None]:
# Change ONLY list_b
list_b[0] = 'Protein bar'
print("a=",list_a)
print("b=",list_b)

This time, list a did not change. Now we have TWO separate objects. Therefore, changing list b does not affect list a and vice versa.

## Splitting and Joining Lists
Using the split() function to separate a string using a delimter (e.g., a comma) creates a list object.

In [None]:
states = "Missouri, Alabama, Texas, Washington, Florida"
print("States is of ", type(states))

states = states.split(",")
print(states, "this is of",type(states))

In [None]:
print(states[2])

In [None]:
print(type(states))
states = ','.join(states)
print(states, "this is of",type(states))

In [None]:
# Remember that you may split on any character.
#   Here is an example of splitting an email address
#   into the user name (string prior to the '@' symbol)
#   and the domain (the part following the '@' symbol.')

addr = 'monty@python.org'
uname, domain = addr.split('@')
split_email = addr.split('@' )
print(f"user name = {uname}")
print(f"domain name = {domain}")
print(split_email)

## List Comprehensions

List comprehensions are a compact method to build lists using a single line of code.

Basic syntax
```python
[ expr for item in iterable ]
```

Using a conditional:
```python
my_list = [ expr(element) for item in iterable if condition ] 
```

For example, if I wanted to iterate over a range of numbers and make a list of the even ones, I could use this code:

In [None]:
even_number_list = []
for x in range(20):
    if x % 2 == 0:
        even_number_list.append(x)
print(even_number_list)

Using a list comprehension I can do the same with a single line of code.

In [None]:
even_number_list = [x for x in range(20) if x % 2 == 0]
print(even_number_list)

In [None]:
even_number_list = [x for x in range(20) if x % 2 == 0]

In [None]:
num_list = [y for y in range(100) if (y % 2 == 0) or (y % 5 == 0)]
print(num_list)

## Removing Items from a List

In [None]:
# Remove elements by index
t = ['a', 'b', 'c', 'd', 'e']
del(t[1])
print(t)

In [None]:
# Delete the last item on the list. Returns the item deleted. Mutates list.
x = t.pop()

print('Deleted ', x)
print('New list is ', t)

In [None]:
# Remove specific element (e.g., remove 'Chicago'), mutates the list.
print(cities)

cities.remove('Chicago')
print(cities)

In [None]:
# ERROR: If not in list, error.
cities.remove('St. Louis')

Be careful when removing items from a list. If you attempt to remove items while iterating over the same list, items may be skipped. 

In [None]:
my_list = [1,2,3,4,5,6,7,8,9,10]

In [None]:
# The intent of this code is to remove numbers greater than 5 from my list.

for item in my_list:
    if item > 5:
        my_list.remove(item)

# ERROR: However, 7 and 9 remain because they were skipped as items were removed.
print(my_list)

One solution is to use a list comprehension.

In [None]:
my_list = [1,2,3,4,5,6,7,8,9,10]

# Only keep items in the list that are less than 6
my_list = [item for item in my_list if item < 6]

print(my_list)

Another solution is to reverse the list. That way, if the last item (item 9) is deleted, it doesn't alter the indexes of the rest of the list.

In [None]:
my_list = [1,2,3,4,5,6,7,8,9,10]

# Reverse the list and delete items greater than 5
for item in reversed(my_list):
    if item > 5:
        my_list.remove(item)
        
print(my_list)

## Testing for membership

Use the in keyword to test for list membership.

In [None]:
print(cities)
print("Dallas" in cities)
print("Tuscaloosa" in cities)

## Enumerating a list

In [None]:
# WITHOUT using enumerate()
count = 0
for city in cities:
    print(count, city)
    count += 1

In [None]:
for x in range(len(cities)):
    print(x, cities[x])
                  

In [None]:
# printing the tuples in object directly
for city in enumerate(cities):
    print (city)

In [None]:
# Change start of index
for count,ele in enumerate(cities,10):
    print (count,ele)

## Iterating a list

Use a for loop to iterate a list.

In [None]:
# Loop through list
for city in cities:
    print(city)

Use the len() function to determine how many items are in the list and use that within a range() function.

In [None]:
for i in range(len(cities)):
    print(cities[i],end=" ")

## List Concatenation


In [None]:
# Use the '+' operator to concatenate lists
a = [1,2,3]
b = [4,5,6]
c = a + b # does not mutate 'a' or 'b'

print("c = ", c)

print("a = ", a)
print("b = ", b)

## Extending a list

In [None]:
print("list 'a' = ", a)
print("list 'b' =", b)

a.extend(b) # This combines a and b, mutates a but not b

print("list 'a' =", a)
print("list 'b' =", b)
print(c)

In [None]:
# Use the '*' operator to repeat items
print(a * 3)

##Slicing Lists
You can return parts of a list using slicing operators. Other objects (e.g., strings and tuples) can also be sliced.

In [None]:
# Slicing operations

t = ['a', 'b','c','d','e','f','g']

# return the 2nd and 3rd elements in t
print(t[1:3])

In [None]:
# Omitting the first parameter tells the intepreter to start at the beginning
print(t[:3])

In [None]:
# Omitting the second paramter tells the interpreter to continue to the end
# start with the third element and return all elements to the end of the list
print(t[3:])

In [None]:
# Get the last item
print("Last item in the list = ", t[-1])

In [None]:
# Use Negative slicing to replace the last item in the ist
t[-1] = 'watermelon'
print(t)

## Zipping a list

The zip() function takes iterables as its parameter and returns a zip object. Think of zip like a physical zipper that brings together two different iterable objects. 

In [None]:
# zip function example

students = 'Darlene','Desmond','Billy'
grades = [100, 99, 72] 
zipped_students = list(zip(students, grades))

print(zipped_students)

## Zipping unequal length lists

The zip() function expects iterables of **equal lengths**. Trailing items are ignored.

To work with iterables of different lengths, Python provides the zip_longest() function as part of the **itertools** module.

```Python
zip_longest( iterable1, iterable2, fillvalue)
```

In [None]:
import itertools

# zip_longest function example

students = 'Darlene','Desmond','Billy', 'Macy'
grades = [100, 99, 72] 
zipped_students = list(itertools.zip_longest(students, grades, fillvalue = None))

# Notice that Macy does not appear in the list of student-grade pairs
print(zipped_students)

In [None]:
# zip function example

students = 'Darlene','Desmond','Billy', 'Macy'
grades = [100, 99, 72] 
zipped_students = list(zip(students, grades))

# Notice that Macy does not appear in the list of student-grade pairs
print(zipped_students)

## Sorting Lists

In [None]:
# Use sorted() to display a sorted list but not mutate it.
my_letters = ['n','r','y','x','a','w']

print("sorted list = ", sorted(my_letters))
print("my_letters = ", my_letters)

In [None]:
# Use the sort() method to sort the items in a list
my_letters.sort()
print(my_letters)

In [None]:
# Don't do this...sort() returns "None"
my_letters = my_letters.sort() 
print(my_letters)

In [None]:
# Reverse a list
my_letters = ['n','r','y','x','a','w']
my_letters.sort(reverse=True)
print(my_letters)

## Use a list to return more than one value from a function

In [None]:
# Use a list to return more than one value from a function
def tip_options(amount):
    # Use a list to return three tipping options (10%, 15%, 20%)
    return[amount, amount*1.10, amount*1.15, amount*1.2]

print(tip_options(30))
print(type(tip_options(30)))

## Working With Nested Lists

In [None]:
my_list = [['Ford','Chevrolet','Volkswagen'],
           ['F150','Suburban','Passat'],
           ['Big Bang Theory','Young Sheldon','Mindhunter']]

print(my_list[0][1]) # row zero, item 2 (index 1)
print(my_list[2][2]) # row 3, item 3 (index 2)


In [None]:
# Use a list to swap values
y = 5
x = 10

print('x =', x, 'y =', y)

[x, y] = [y, x]

print('x =', x, 'y =', y)

# Sets
* Sets are unordered.
* Set elements are unique. Duplicate elements are not allowed.
* You may add or remove items from the set, but you cannot edit an item in a set.
* Accessing items by index (e.g., myset[1]) is NOT supported.
* Sets are denoted by curly braces.
* Membership tests are more efficient using sets than lists or tuples.

You can define a set using the set() function.b
```python
x = set(<iter>)
```

In [None]:
my_list = ['a','b',1, 'c', 1]
set2 = set(my_list)
print(set2)
print(my_list)

You can also create a set using curly braces {}. However, you cannot create an empty set using a pair of curly braces like you can for a list.

In [None]:
# INCORRECT
my_set = {}  # <-- results in a dictionary, NOT a set
print(type(my_set))

In [None]:
# Instead use the set constructor
my_set = set()
print(type(my_set))

In [None]:
# Use curly braces to create a set
my_set = {1,1,6,7,3,5,5,5,5,5, 'red'}
print(type(my_set))
print(my_set)

In [None]:
# Error - sets are unordered and not accessible by subscript
my_set[0]

## Why do I care about sets?
Sets in Python provide the same benefits as sets in mathematics. Sets contain a well-defined collection of distinct objects called elements. Using the set object enables you to efficiently perform set operations such as union and intersection.

![](images/data_science_diagram.png) <br>
(image source: https://towardsdatascience.com)

## Creating sets
Use curly braces to denote a set or use the set() constructor. If you use set(), you must provide an iterable as the argument.

In [None]:
# Persons with expertise in specific areas
cs_expertise = {"Bill", "Matt", "Alexandra", "Joe", "Dexter"}
stats_expertise = set(["Dexter", "Subha", "Brad", "Bruce"])
business_expertise = {"Kay","Jonathan","Dexter","Suzanne", "Matt"}

In [None]:
# Who might be suited for Data Science (intersection of three topics)
data_scientists = cs_expertise.intersection(stats_expertise, business_expertise)
print(data_scientists)

You can also use the set() method to create a set. The argument for the set method must be an iterable.

In [None]:
#Use the set() method to create a set, parameter must be <iter> (an iterable --e.g., a list)
my_set2 = set(['foo', 'bar', 3.141, 'bar'])
print(my_set2)

In [None]:
#Error creating tropical_fruits set using set() contructor...why?
tropical_fruits = set(["Guava", "Dragon Fruit", "Banana","Banana"])
temperate_fruits = {"Apple", "Peach", "Plum"}

all_fruit = tropical_fruits.union(temperate_fruits)
print(all_fruit)

In [None]:
#Empty sets are evaluated as False
loch_ness_monsters = set()
print("The set of Loch Ness Monsters is " + str(bool(loch_ness_monsters)))
print()

In [None]:
#You can add, update, and remove items, but you cannot change items in a set
loch_ness_monsters.add("Marvin")
print("Added Marvin to monster set...")
print("The set of Loch Ness Monsters is " + str(bool(loch_ness_monsters)), loch_ness_monsters)
print("The length of the monster set is " + str(len(loch_ness_monsters)))

In [None]:
# Reduce this list of grades to only have unique values
grades = {81,100,81,89,99,99,99,76,94,93,86,75,88,96,76,87,90,81,78,99,83,94,75,83,92,96,81,99,89,99,98,100,95,84,94,97,100,92,97,98,92,95,88,90,98,87,86,95,86,84,91,87,88,83,89,84,98,75,90,100,79,83,94,89,93,84,83,94,84,93,97,75,81,91,84,78,89,96,97,99,90,98,83,93,96,98,91,77,98,97,76,98,75,89,92,81,83,84,82,94,89,77,96,94,100,86,79,87,78,83,86,89,99,77,96,88,91,86,89,99,82,83,92,91,84,83,76,89,90,82,75,84,83,81,96,87,90,82,93,76,86,100,81,88,100,94,84,99,77,91,92,98,88,90,83,88}
print(grades)

In [None]:
c_and_higher = set(range(75,101))

missing_grades = grades.symmetric_difference(c_and_higher)

print("What grades are missing from 75-100?: " + str(missing_grades))

In [None]:
# Add set lookup differences and symmetric_differene vs. difference

# Dictionaries

A Python Dictionary object uses key-value pairs. Just like you would use a traditional dictionary to find the definition (value) of a word (key), a Python dictionary uses keys and values. A key is a unique identifier used to find data (the value). You can think of dictionaries like a list, but with a flexible index. List indexes are integers (e.g., MyList[2]), but the index or keys used to associate values in a dictionary can be any immutable data type (e.g., employee['2334']).

Dictionaries are **unordered** and use key-value pairs to store and retrieve data. In other languages this structure might be called an *associative array* or a *hashmap*.

\*Python 3.6+ supports using a ordered (insert order) dict.


## Creating dictionaries
Use curly braces and a colon to indicate to the interpreter that you are creating a dictionary data structure. A key can be any immutable data type.

In [1]:
# Pretty Print is a module that displays dictionaries in a more human-readable format.
import pprint as pp

# The employee ID is associated with the employee name
ua_directory = {'netID':'gjbott', 'name':'Gregory J. Bott', 'role':'faculty','courses':['MIS501', 'MIS460/561', 'MIS598']}

print("Without pretty print:")
print(ua_directory,'\n')

print("With pretty print:")
pp.pprint(ua_directory)

Without pretty print:
{'netID': 'gjbott', 'name': 'Gregory J. Bott', 'role': 'faculty', 'courses': ['MIS501', 'MIS460/561', 'MIS598']} 

With pretty print:
{'courses': ['MIS501', 'MIS460/561', 'MIS598'],
 'name': 'Gregory J. Bott',
 'netID': 'gjbott',
 'role': 'faculty'}


### Creating an empty dictionary
If you attempt to reference a dictionary object without creating it first, you'll receive a NameError.

In [2]:
# ERROR you cannot create a key-value pair in a non-existent dict
person[1000] = {'first_name':'Greg', 'last_name':'Bott', 'spouse':'Amy', 'children':['John Davis', 'Piper', 'Will', 'Truett'], 'pets':{'Bama':'dog', 'TJ':'cat'}}

NameError: name 'person' is not defined

To assign a dict object to a variable, use a set of empty curly braces.

In [3]:
# Create an empty dictionary
person = {}

#Display the type of the 'person' variable
print(type(person))

<class 'dict'>


In [5]:
a = {'first_name':'Greg', 'last_name':'Bott', 'spouse':'Amy'}
print(a)

{'first_name': 'Greg', 'last_name': 'Bott', 'spouse': 'Amy'}


In [6]:
# Attempting to add values to a dict like this REPLACES the dict
a = {'first_name':'Joe', 'last_name':'Devlin', 'spouse':'Suzanne'}
print(a)

{'first_name': 'Joe', 'last_name': 'Devlin', 'spouse': 'Suzanne'}


### Adding values to a dictionary

Instead, add a key (e.g., 1000, 1001) and provide the dict as the value. So, person has nested dictionaries and a nested list.

1000, 1001 are keys. The dict containing first_name, last_name, etc. is the value. Within that dict are additional key-value pairs (first name --> Greg, etc.). 

In [7]:
person[1000] = {'first_name':'Greg', 'last_name':'Bott', 'spouse':'Amy', 'children':['John Davis', 'Piper', 'Will', 'Truett'], 'pets':{'Bama':'dog', 'TJ':'cat'}}
person[1001] = {'first_name':'Joe', 'last_name':'Devlin', 'spouse':'Suzanne', 'children':['CK', 'Alan', 'Devin', 'Tom'], 'pets':{'Orangey':'gold fish', 'Hammer':'turtle'}}

pp.pprint(person)

{1000: {'children': ['John Davis', 'Piper', 'Will', 'Truett'],
        'first_name': 'Greg',
        'last_name': 'Bott',
        'pets': {'Bama': 'dog', 'TJ': 'cat'},
        'spouse': 'Amy'},
 1001: {'children': ['CK', 'Alan', 'Devin', 'Tom'],
        'first_name': 'Joe',
        'last_name': 'Devlin',
        'pets': {'Hammer': 'turtle', 'Orangey': 'gold fish'},
        'spouse': 'Suzanne'}}


### Accessing data in a dictionary
Use ```dict[key]``` to return the value from the key-value pair. If the value doesn't exist, an exception is thrown. Use the get() method to retrieve keys and handle missing keys more gracefully.

You can also use ```keys()``` to list the keys in your dictionaries, ```values``` to access the values of the key-value pair or ```items()``` to access both.

In [12]:
person.keys()

dict_keys([1000, 1001, 'a'])

In [9]:
person.values()

dict_values([{'first_name': 'Greg', 'last_name': 'Bott', 'spouse': 'Amy', 'children': ['John Davis', 'Piper', 'Will', 'Truett'], 'pets': {'Bama': 'dog', 'TJ': 'cat'}}, {'first_name': 'Joe', 'last_name': 'Devlin', 'spouse': 'Suzanne', 'children': ['CK', 'Alan', 'Devin', 'Tom'], 'pets': {'Orangey': 'gold fish', 'Hammer': 'turtle'}}])

In [None]:
person.items() # The key-value pair is an item. 

In [13]:
person[[] = {'name':'green'}

TypeError: unhashable type: 'list'

In [10]:
# Print the first_name attribute of the person 1001 key.
print(person[1001]['first_name'])

Joe


### Using the get() method
Use the get() method to access keys. Using get() avoids a KeyError if the desired key does not exist. Instead, Python returns the None value.

In [14]:
# Display the person dictionary
pp.pprint(person)

# ERROR - this key does NOT exist
person[9999]

{1000: {'children': ['John Davis', 'Piper', 'Will', 'Truett'],
        'first_name': 'Greg',
        'last_name': 'Bott',
        'pets': {'Bama': 'dog', 'TJ': 'cat'},
        'spouse': 'Amy'},
 1001: {'children': ['CK', 'Alan', 'Devin', 'Tom'],
        'first_name': 'Joe',
        'last_name': 'Devlin',
        'pets': {'Hammer': 'turtle', 'Orangey': 'gold fish'},
        'spouse': 'Suzanne'},
 'a': {'name': 'green'}}


KeyError: 9999

In [15]:
# Attempting to retrieve a non-existent dict value using get() returns None instead of a KeyError
print(person.get(1000).get('pets').get('bama'.title()))

dog


In [18]:
pp.pprint(person.get(1001))

{'children': ['CK', 'Alan', 'Devin', 'Tom'],
 'first_name': 'Joe',
 'last_name': 'Devlin',
 'pets': {'Hammer': 'turtle', 'Orangey': 'gold fish'},
 'spouse': 'Suzanne'}


In [19]:
# You can also provide a default value if a key does not exist
print(person.get('ss_number', 'no SS# provided'))

no SS# provided


In [20]:
# Use a List as values a dictionary
make_model = {"Ford":["Mustang","Explorer","Focus"],"Volkswagen":["Passat","Jetta","Beetle"]}
print(make_model["Ford"])

['Mustang', 'Explorer', 'Focus']


In [21]:
# Replace values using a key
pp.pprint(person[1001])

{'children': ['CK', 'Alan', 'Devin', 'Tom'],
 'first_name': 'Joe',
 'last_name': 'Devlin',
 'pets': {'Hammer': 'turtle', 'Orangey': 'gold fish'},
 'spouse': 'Suzanne'}


In [22]:
person[1001]['pets'] = {'Rocky':'flying squirrel'}
pp.pprint(person[1001])

{'children': ['CK', 'Alan', 'Devin', 'Tom'],
 'first_name': 'Joe',
 'last_name': 'Devlin',
 'pets': {'Rocky': 'flying squirrel'},
 'spouse': 'Suzanne'}


### Removing an items using del or pop
You can use ```del``` or ```pop``` to remove items from a dictionary. 

In [23]:
pp.pprint(person[1001])

{'children': ['CK', 'Alan', 'Devin', 'Tom'],
 'first_name': 'Joe',
 'last_name': 'Devlin',
 'pets': {'Rocky': 'flying squirrel'},
 'spouse': 'Suzanne'}


In [24]:
# Remove the pets key and values
del person[1001]['pets']
pp.pprint(person[1001])

{'children': ['CK', 'Alan', 'Devin', 'Tom'],
 'first_name': 'Joe',
 'last_name': 'Devlin',
 'spouse': 'Suzanne'}


Just as in a list, ```pop``` returns the value deleted that you can store in a variable.

In [25]:
pp.pprint(person[1000])
children = person[1000].pop('children')

{'children': ['John Davis', 'Piper', 'Will', 'Truett'],
 'first_name': 'Greg',
 'last_name': 'Bott',
 'pets': {'Bama': 'dog', 'TJ': 'cat'},
 'spouse': 'Amy'}


In [26]:
pp.pprint(person[1000])

{'first_name': 'Greg',
 'last_name': 'Bott',
 'pets': {'Bama': 'dog', 'TJ': 'cat'},
 'spouse': 'Amy'}


In [27]:
children

['John Davis', 'Piper', 'Will', 'Truett']

### Clear a dictionary
Use the clear() method to empty the contents of a dictionary.

In [28]:
print(person)
person.clear()
print(person)

{1000: {'first_name': 'Greg', 'last_name': 'Bott', 'spouse': 'Amy', 'pets': {'Bama': 'dog', 'TJ': 'cat'}}, 1001: {'first_name': 'Joe', 'last_name': 'Devlin', 'spouse': 'Suzanne', 'children': ['CK', 'Alan', 'Devin', 'Tom']}, 'a': {'name': 'green'}}
{}


### Using a loop to examine a dictionary
Although looping through a dictionary fails to take advantage the speed of a dictionary, sometimes you may find it useful. Remember that dictionaries contain a key / value pair and that you must loop through them differently than you would a list.

In [29]:
person[1000] = {'first_name':'Greg', 'last_name':'Bott', 'spouse':'Amy', 'children':['John Davis', 'Piper', 'Will', 'Truett'], 'pets':{'Bama':'dog', 'TJ':'cat'}}
person[1001] = {'first_name':'Joe', 'last_name':'Devlin', 'spouse':'Suzanne', 'children':['CK', 'Alan', 'Devin', 'Tom'], 'pets':{'Orangey':'gold fish', 'Hammer':'turtle'}}

In [30]:
# Attempting to interate through a dictionary as you would a list 
#   will yield only the keys
for x in person:
    print(x)

1000
1001


In [31]:
for v in person.values():
    print(v)
    for val in v.values():
        pp.pprint(val)
    print('-'*30)

{'first_name': 'Greg', 'last_name': 'Bott', 'spouse': 'Amy', 'children': ['John Davis', 'Piper', 'Will', 'Truett'], 'pets': {'Bama': 'dog', 'TJ': 'cat'}}
'Greg'
'Bott'
'Amy'
['John Davis', 'Piper', 'Will', 'Truett']
{'Bama': 'dog', 'TJ': 'cat'}
------------------------------
{'first_name': 'Joe', 'last_name': 'Devlin', 'spouse': 'Suzanne', 'children': ['CK', 'Alan', 'Devin', 'Tom'], 'pets': {'Orangey': 'gold fish', 'Hammer': 'turtle'}}
'Joe'
'Devlin'
'Suzanne'
['CK', 'Alan', 'Devin', 'Tom']
{'Hammer': 'turtle', 'Orangey': 'gold fish'}
------------------------------


In [34]:
# Instead, use the items() method to return both the key and the value
for k, v in person.items(): # x = key; y = value
    print(k, v) 

1000 {'first_name': 'Greg', 'last_name': 'Bott', 'spouse': 'Amy', 'children': ['John Davis', 'Piper', 'Will', 'Truett'], 'pets': {'Bama': 'dog', 'TJ': 'cat'}}
1001 {'first_name': 'Joe', 'last_name': 'Devlin', 'spouse': 'Suzanne', 'children': ['CK', 'Alan', 'Devin', 'Tom'], 'pets': {'Orangey': 'gold fish', 'Hammer': 'turtle'}}


In [37]:
list(person.items())[-1]

(1001,
 {'first_name': 'Joe',
  'last_name': 'Devlin',
  'spouse': 'Suzanne',
  'children': ['CK', 'Alan', 'Devin', 'Tom'],
  'pets': {'Orangey': 'gold fish', 'Hammer': 'turtle'}})

### Using a loop to add values to a dictionary.
So far we have manually added items to a dictionary. Most often you will add items progamatically (e.g., using a loop) rather than manually. Below is part of the code I use find duplicate files using an MD5 hash. A hash is a one-way algorithm applied to an object that results in a fixed-length string that uniquely identifies that object.

We'll use the os module to access the file system and the hashlib to apply the MD5 algorithm to the files and then store them in a dictionary using the MD5 value as the key.

In [38]:
import os
import hashlib
import pprint as pp

# Create a blank dictionary
os_files = {}

# Hash Function
def hashfile(path, blocksize=65536):
    
    # Get the file, read binary
    file_to_hash = open(path, 'rb')
    
    # Create an MD5 hasher object
    hasher = hashlib.sha256()
    
    # Load part of the file (the size of the block)
    #    We divide the file into smaller parts to avoid using all the 
    #    available RAM
    buf = file_to_hash.read(blocksize)
    
    # Add blocks of the file until the buffer is empty
    while len(buf) > 0:
        hasher.update(buf)
        buf = file_to_hash.read(blocksize)
    file_to_hash.close()
    
    # Return the MD5 hex digest of the file
    return hasher.hexdigest()

for file in os.listdir():
    # Use error handling to avoid errors (e.g., file permission issues)
    try:
        # Use the hash value (hex digest) as the key and use the file name as the value
        os_files[hashfile(file)] = file
    except:
        pass
    
# Display the Dictionary
pp.pprint(os_files)

{'2e91abec96a61bbcc85d97f119de46ea227fac20e7d5880e34020be259e8b474': '.gitignore',
 '421661dbb13d699ac75da6753d072b7dc5bfabb33740df49c7f05275bd232832': 'Core '
                                                                     'Python-Regex.ipynb',
 '589d75e6c083bdaeac23babcc11848e1e1f3285f46f7eec63c3ef3aef5da7a0b': 'Core '
                                                                     'Python-Pathlib-File '
                                                                     'Operations.ipynb',
 '79e6d7d31d3441c3a8288d40f690a8b470e94560bb322333101c3a24397d4fbd': 'Core '
                                                                     'Python-Selenium4.ipynb',
 '79fe49f5a365a62f42e73377e4d9d734252d477836ae8c0a11e2574c3a19fe96': 'Core '
                                                                     'Python-Functions-Strings.ipynb',
 '7c48d430e62b28bed5e827172b4289363047d33851f0915b0c7ddb378653d1d6': 'Core '
                                                              

### Check for Values in a Dictionary

To determine if a value is present within a key, us the *in* keyword.

In [None]:
make_model

In [None]:
# "Focus" is a value in the dict
print("Focus" in make_model["Ford"])

In [None]:
make_model["Ford"]

In [None]:
# "Explorer II" is not in the dict
print("Explorer II" in make_model["Ford"])

### Check for Keys in a Dictionary

In [None]:
search_key = "Ford"
if search_key in make_model:
    print(f"'{search_key}' key found in dictionary!")
else:
    print(f"'{search_key}' key NOT found in dictionary.")

### Attempting to Access Keys that Don't exist
If you attempt to access a key that does not exist within the dictionary, Python will raise an exception


In [None]:

person[1000] = {'first_name':'Greg', 'last_name':'Bott', 'spouse':'Amy', 'children':['John Davis', 'Piper', 'Will', 'Truett'], 'pets':{'Bama':'dog', 'TJ':'cat'}}
person[1001] = {'first_name':'Joe', 'last_name':'Devlin', 'spouse':'Suzanne', 'children':['CK', 'Alan', 'Devin', 'Tom'], 'pets':{'Orangey':'gold fish', 'Hammer':'turtle'}}


In [None]:
pp.pprint(person)

In [None]:
# ERROR: KeyError (key does not exist in the dictionary)
print(person[1000])
print(person['first_name']) # first_name IS a key in the dict, why the error?

### Creating a dictionary from a JSON file

In [42]:
import pathlib
import pprint as pp
import json
# create Path object for file path (path to the file, filename)
fb_game_path = pathlib.Path('/home/gjbott/Dropbox/research/github/gregbott/mis501/2023_501/files/2017_Alabama_v_LSU.json')
fb_game_path.absolute()

PosixPath('/home/gjbott/Dropbox/research/github/gregbott/mis501/2023_501/files/2017_Alabama_v_LSU.json')

In [44]:
if fb_game_path.exists():
    # open the data filepath
    with open(fb_game_path, 'r') as fp:

        # read the bytes from the datafile
        json_bytes = fp.read()

        # decode the bytes to utf-8
        #json_str = json_bytes.decode('utf-8')

        # load the decoded bytes as JSON
        game_data = json.loads(json_bytes)
        
        print(f'The game file includes {len(game_data)} records.\n The game file is a {type(game_data)}.')
else:
    print('{fb_game_path.name} does not exist')

The game file includes 8 records.
 The game file is a <class 'dict'>.


In [45]:
# Let's examine what's in the dictionary
# Most often, dictionaries will be too long to just print them. Even just printing the keys can be problematic, but since 
#    we already know the length of the dict is 8, let's print the keys.
game_data.keys()

dict_keys(['scoringPlays', 'videos', 'drives', 'teams', 'id', 'competitions', 'season', 'week'])

One method to view the contents of a dictionary is to first convert the dictionary items (aka, keys and pairs) to a list and then slice the list. 

In [None]:
pp.pprint(game_data['videos'][0])

In [48]:
pp.pprint(game_data['videos'][0]['links']['source']['full']['href'])

'http://media.video-cdn.espn.com/motion/2017/1104/dm_171104_NCF_One-Play_Alabama_TD_pass/dm_171104_NCF_One-Play_Alabama_TD_pass_360p30_1464k.mp4'


In [None]:
len(game_data['drives']['previous'][0]['plays'])

In [None]:
for play in game_data['drives']['previous'][0]['plays']:
    pp.pprint(play['text'])
    #pp.pprint(play['start']['yardLine'])
    #pp.pprint(play['statYardage'])

In [None]:
# First let's determine what type of object 'scoringPlays' is
print(type(game_data['scoringPlays']))

In [None]:
# Since it's a list, we can slice it
# Here are the first three items
plays = game_data['scoringPlays']
plays

In [None]:
# Plays is a list of dicts
print(type(plays))
print(type(plays[0]))

In [None]:
# Since each item of the list is a dict, we can get its keys
plays[0]

In [None]:
# How do we access objects of interest within the dict?
# We can get a play by play by looping through plays
for play in plays:
    print(play['clock']['displayValue'], play['text'])

In [None]:
list(game_data['drives']['previous'])[1]

In [None]:
# Textual information can also be found in drives
#    Note that each series of plays ends with a drive termination event
#    (e.g., TD/FG/Safety, punt, turnover)
for play in game_data['drives']['previous'][5]['plays']:
    print(play['text'])

### Updating dictionary values
Assigning a value to an existing key/value pair will replace the value.

You can also use the update method to replace multiple values in a dictionary and add new values.

```update()``` takes a dictionary as its parameter.

To update/add multiple values at the same time, provide the keys to add/update and their valuesTo update/add multiple values at the same time, provide the keys to add/update and their values.

In [None]:
person = {}

In [None]:
person[1000] = {'first_name':'Greg', 'last_name':'Bott', 'spouse':'Amy', 'children':['John Davis', 'Piper', 'Will', 'Truett'], 'pets':{'Bama':'dog', 'TJ':'cat'}}
person[1001] = {'first_name':'Joe', 'last_name':'Devlin', 'spouse':'Suzanne', 'children':['CK', 'Alan', 'Devin', 'Tom'], 'pets':{'Orangey':'gold fish', 'Hammer':'turtle'}}

pp.pprint(person)

In [None]:
# Update first_name ('Greg' to 'Gregory'), add ss_number and a middle name
person[1000].update({'first_name':'Gregory','ss_number':'123-45-6789','middle':'Hamilton'})
pp.pprint(person)

In [None]:
# Use the append() method of a key to append a value
#   Here we are adding Sally to Joe Devlin's children
person[1001]['children'].append('Sally')
pp.pprint(person[1001])

# JSON

load() vs loads()

![alt text](./images/load_vs_loads.jpg "Title")

dump() vs dumps()

![alt text](./images/dump-vs-dumps.jpg "Title")

## Loading JSON

In [None]:
json_file = """
{'people': [{'children': ['John Davis', 'Piper', 'Will', 'Truett'],
             'first_name': 'Greg',
             'last_name': 'Bott',
             'pets': {'Bama': 'dog', 'TJ': 'cat'},
             'spouse': 'Amy'},
            {'children': ['CK', 'Alan', 'Devin', 'Tom'],
             'first_name': 'Joe',
             'last_name': 'Devlin',
             'pets': {'Hammer': 'turtle', 'Orangey': 'gold fish'},
             'spouse': 'Suzanne'}]}

"""

In [None]:
pp.pprint(data)

In [None]:
with open('files/directory.json', 'r') as f:
    data = json.loads(f.read())

In [None]:
type(data)

<table class="docutils align-default" id="json-to-py-table">
<colgroup>
<col style="width: 44%">
<col style="width: 56%">
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p>JSON</p></th>
<th class="head"><p>Python</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>object</p></td>
<td><p>dict</p></td>
</tr>
<tr class="row-odd"><td><p>array</p></td>
<td><p>list</p></td>
</tr>
<tr class="row-even"><td><p>string</p></td>
<td><p>str</p></td>
</tr>
<tr class="row-odd"><td><p>number (int)</p></td>
<td><p>int</p></td>
</tr>
<tr class="row-even"><td><p>number (real)</p></td>
<td><p>float</p></td>
</tr>
<tr class="row-odd"><td><p>true</p></td>
<td><p>True</p></td>
</tr>
<tr class="row-even"><td><p>false</p></td>
<td><p>False</p></td>
</tr>
<tr class="row-odd"><td><p>null</p></td>
<td><p>None</p></td>
</tr>
</tbody>
</table>

## Dumping JSON

In [None]:
import json

with open('files/directory.json', 'w') as f:
    json.dump(person, f)

In [None]:
import pandas as pd
directory_df = pd.DataFrame.from_dict(person, orient='index')

In [None]:
directory_df.to_json('files/directory.json', orient='records' )