# COSC 210: Introduction to Python Programming
### Dictionaries and Sets

In this notebook, we will cover
- defining Dictionaries and Sets
- functions that are useful for working with dictionaries and sets
- Creating barcharts from datasets

#### Some notes about dictionaries and sets

- Dictionaries and Sets are collection data types just like Lists and Tuples 
- However, they are Unordered meaning that we cannot use index subscribing to get specific elements. 
- Dictionaires are composed of two components: keys and values
- Keys are unique and cannot be duplicated within the same dictionary
- Each key has an assocated value
- Sets are unordered collections of unique values
- Sets are iterable but we cannot use indexing or slicing on them

### Dictionary

In [None]:
AL_Central = {
    'Kansas City': 'Royals', 'Minnesota': 'Twins', 'Cleveland': 'Indians', 'Chicago': 'White Sox', 'Detroit': 'Tigers'}

In [None]:
# How many key-value pairs
len(AL_Central)

In [None]:
# Unpack into tuples
list(AL_Central.items())

In [None]:
# Getting cetrain values by entering in the key
AL_Central['Minnesota']

In [None]:
AL_Central['Boston']

In [None]:
# Another way to do it
AL_Central.get('Minnesota')

In [None]:
test_dic = {'a':1}
test_dic.get('b') * 1

In [None]:
# Will not get an error if the key doesn't exist
AL_Central.get('Boston')

In [None]:
# Check if a key is in a dictionary
'Cleveland' in AL_Central

In [None]:
# Check to see if it is not in there
'Arizona' in AL_Central

In [None]:
# List out all the Keys in the dictionary
list(AL_Central.keys())

In [None]:
# List out all the values in the dictionary
list(AL_Central.values())

In [None]:
# Add an entry to the dictionary
AL_Central.update(Milwaukee = 'Brewers')

In [None]:
AL_Central

In [None]:
# How to delete an entry
del AL_Central['Milwaukee']

In [None]:
# How to update an entry
AL_Central.update(Cleveland = 'Guardians')

In [None]:
AL_Central

#### A script for storing scores

In [None]:
# fig06_01.py
"""Using a dictionary to represent an instructor's grade book."""
grade_book = {            
    'Susan': [92, 85, 100], 
    'Eduardo': [83, 95, 79],
    'Azizi': [91, 89, 82],  
    'Pantipa': [97, 91, 92] 
}

all_grades_total = 0
all_grades_count = 0

for name, grades in grade_book.items():
    total = sum(grades)
    print(f'Average for {name} is {total/len(grades):.2f}')
    all_grades_total += total
    all_grades_count += len(grades)
    
print(f"Class's average is: {all_grades_total / all_grades_count:.2f}")


#### A script for tokenizing a string and counting words

In [None]:
# fig06_02.py
"""Tokenizing a string and counting unique words."""

text = ('this is sample text with several words '
       'this is more sample text with some different words')
word_counts = {}

# count occurrences of each unique word
for word in text.split():
    if word in word_counts:
        word_counts[word] += 1  # update existing key-value pair
    else:
        word_counts[word] = 1  # insert new key-value pair

print(f'{"WORD":<12}COUNT')

for word, count in sorted(word_counts.items()):
    print(f'{word:<12}{count}')

print('\nNumber of unique words:', len(word_counts))

In [None]:
text = ('this is sample text with several words '
       'this is more sample text with some different words')
text.split()

In [None]:
from collections import Counter
text = ('this is sample text with several words '
       'this is more sample text with some different words')
Counter(text.split())

#### The Counter function

In [None]:
from collections import Counter
text = ('this is sample text with several words '
        'this is more sample text with some different words')

counter = Counter(text.split())

print(f'{"WORD":<12}COUNT')

for word, count in sorted(counter.items()):
    print(f'{word:<12}{count}')

print('Number of unique keys:', len(counter.keys()))

### Sets

In [None]:
states = {'Minnesota','Wisconsin','Minnesota','California','Minnesota','South Dakota'}

In [None]:
states

In [None]:
len(states)

In [None]:
# We can convert a list to a set using the set() function
a = list(range(10))+list(range(5))
b = set(a)
print(a)
print(b)

#### Subsets and Supersets

In [None]:
a = {1,3,5,7,9,11}
b = {3,9,11}

In [None]:
a.issubset(b)

In [None]:
a.issuperset(b)

#### Unions

![image.png](attachment:image.png)

In [None]:
x = {1,2,3,4}
y = {4,5,6,7,8}
x | y

In [None]:
# Alternativelty
x.union(y)

#### Intersections

In [None]:
x & y

In [None]:
# Alternatively
x.intersection(y)

#### Difference

In [None]:
x - y

In [None]:
y.difference(x)

In [None]:
x ^ y

In [None]:
y.symmetric_difference(x)

#### Disjoint

In [None]:
x.isdisjoint(y)

In [None]:
y = {'apple','banana','orange'}
x.isdisjoint(y)

#### Adding and Deleting from a Set

In [None]:
# Add Elements
x.add(19)
x

In [None]:
x.remove(4)
x

In [None]:
x.discard(13)

In [None]:
x.remove(13)

In [None]:
x.discard(2)
x

In [None]:
x.pop()
x

In [None]:
x.clear()
x

### Exercie 1

Take the following message, tokenize it with the .split() method, and then extract all of the unique words.

### Exercise 2
Given the sets {10, 20, 30} and {5, 10, 15, 20}, use the mathematical set operators to produce the following sets:

a. {30}

b. {5, 15, 30}

c. {5, 10, 15, 20, 30}

d. {10, 20}

In [None]:
x = {10,20,30}
y = {5,10,15,20}

### Exercise 3
Associate each value in the English alphabet with the number of the letter and put them in a dictionary.
For example:
- {'a':1, 'b':2, 'c':3, 'd':4...'z':26}

You must NOT just type in each key, pair value. I suggest using a looping structure. Use the subscription operator to add key-value pairs to the dictionary.

### Data Analysis with Dictionaries

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random 
import seaborn as sns
from collections import Counter

In [None]:
transmission = ['Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Automatic','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Manual','Automatic','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Manual','Automatic','Manual','Manual','Manual','Manual',
                'Automatic','Manual','Manual','Manual','Automatic','Automatic','Automatic','Automatic','Automatic',
                'Manual','Automatic','Manual','Manual','Manual','Automatic','Manual','Manual','Automatic','Automatic',
                'Automatic','Manual','Automatic','Manual','Manual','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Automatic','Automatic','Automatic','Manual','Manual','Automatic',
                'Manual','Automatic','Manual','Manual','Manual','Manual','Manual','Manual','Automatic',
                'Automatic','Manual','Automatic','Manual','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Automatic','Manual','Manual','Manual','Automatic','Manual','Manual',
                'Manual','Manual','Manual','Automatic','Manual','Manual','Manual','Manual','Manual','Manual',
                'Manual','Automatic','Manual','Manual','Manual','Automatic','Automatic','Manual','Manual',
                'Manual','Manual','Manual','Automatic','Automatic','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Manual','Automatic','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Automatic','Manual',
                'Manual','Automatic','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Automatic',
                'Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Automatic',
                'Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Automatic','Manual','Manual','Manual','Manual','Manual','Manual','Manual',
                'Manual','Manual','Automatic','Manual','Automatic','Manual','Manual','Manual','Manual',
                'Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual','Manual']

#### We can use the Counter() function to get a dictionary of the counts

In [None]:
list(Counter(transmission).keys())

Use the .keys() and .values() methods will give us the groups and counts respectively

In [None]:
groups = list(Counter(transmission).keys())
counts = list(Counter(transmission).values())

Creating the barchart

In [None]:
sns.set_theme(style='darkgrid')
ax = sns.barplot(x = groups, y = counts)
ax.set_title('Transmission Types')
ax.set(xlabel='Transmission',ylabel='Frequency')
plt.show()

### Try it with some imported data

Make sure the 'churn.csv' dataset is in your working directory in order to load it in. This dataset is full of customers who have phone and internet plans with a telecommunications company. From this dataset we can create some bar charts.

In [None]:
df = pd.read_csv('churn.csv')

When we load in the file, we save it's contents as a datafram so we save the info under the name "df". We can use the .head(x) method to peek at the top x rows of the dataset.

In [None]:
df.head(5)

To access a column (think of this as a list) we use the subscription operator. Start with the dataset name (df), then square brackets, and inside the square brackets the name of the variable we want to grab. Let's start with 'Phone_service'

In [None]:
df['Phone_service']

We can use the Counter() function to count up the number of yes and no in this list. Let's save this dictionary as 'counter'

In [None]:
counter = Counter(df['Phone_service'])
counter

We will save the keys (group names) and the values (counts) for use in our barchart.

In [None]:
groups = list(counter.keys())
counts = list(counter.values())

In [None]:
sns.set_theme(style='darkgrid')
ax = sns.barplot(x = groups, y = counts)
ax.set_title('Customers with Phone Service')
ax.set(xlabel='Does the Customer Have Phone Service',ylabel='Frequency')
plt.show()

### Exercise 4

Create a barchart for the type of 'Internet_service' a customer has.