Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = ""
COLLABORATORS = ""

---

_HDS5210 Programming for Data Science_

# Dictionaries, Sets, and Tuples

In [1]:
# We create an empty dictionary using curly braces:
d = {}

In [2]:
d

{}

In [3]:
# Or we can initialize it with key / value pairs:
v = { 
    'a': 'Three', 
    'b': 3 }

In [4]:
v

{'a': 'Three', 'b': 3}

In [5]:
v = { 1: 'one', 2:'two'}

In [6]:
v

{1: 'one', 2: 'two'}

In [7]:
# The values within a dictionary can be of any type:
elements = { 
    'H' : {
        'name': 'Hydrogen',
        'number': 1,
        'isotopes': [1,2,3]
    },
    'He' : {
        'name': 'Helium',
        'number': 2,
        'isotopes': [2,3,4]
    }
}

In [8]:
elements

{'H': {'name': 'Hydrogen', 'number': 1, 'isotopes': [1, 2, 3]},
 'He': {'name': 'Helium', 'number': 2, 'isotopes': [2, 3, 4]}}

In [9]:
elements['H']['name']

'Hydrogen'

In [None]:
elements = {
    1 : {
        'symbol': 'H',
        'name': 'Hydrogen',
        'number': 1
    }
}

In [None]:
elements[1]['symbol']

In [10]:
# Note that the keys within a dictionary can only be used once.  
# The following isn't illegal, but it may not do what you expect it to do.

duplicate = {
    'name': 'Paul Boal',
    'name': 'Eric Westhus'
}

In [11]:
duplicate

{'name': 'Eric Westhus'}

## Accessing Keys/Values of the Dictionary

**Note that the ordering of keys is ARBITRARY, not necessarily how they are entered or alphabetically**

In [None]:
# __getitem__ == [] 
help(dict.__getitem__)

In [None]:
help(str.format)

In [None]:
elements = { 
    'H' : {
        'name': 'Hydrogen',
        'number': 1,
        'isotopes': [1,2,3]
    },
    'He' : {
        'name': 'Helium',
        'number': 2,
        'isotopes': [2,3,4]
    }
}

# for key in dictionary
for e in elements:
    print(e)
    print(elements[e])
    print("{:s} is short for {:s}".format(e, elements[e]['name']))
    print(e + " is short for " + elements[e]['name'])

In [None]:
for k, v in elements.items():
    print("{:s} is short for {:s}".format(k, v['name']))

## Other ways of creating dictionaries

In [None]:
keys = ['one', 'two', 'three']
vals = [1,     2,     3      ]


In [None]:
d = dict(zip(keys, vals))

In [None]:
d

If key names are really simple, we can use a different syntax:

In [None]:
d = dict(one=1, two=2, three=3)

In [None]:
d

In [12]:
keys = ['one', 'two', 'three']
vals = [1,2,3]
dict(zip(keys, vals))

{'one': 1, 'two': 2, 'three': 3}

In [13]:
d = {}

In [14]:
for index in range(len(keys)):
    d[vals[index]] = keys[index]

In [15]:
d

{1: 'one', 2: 'two', 3: 'three'}

In [16]:
d = dict(zip(keys, vals))

## Other ways of looping over dictionaries

In [None]:
for abbr, info in elements.items():
    print("{:s} = {:s}".format(str(abbr),str(info)))

In [None]:
alpha = { 'a': 1, 'b': 2, 'd': 4, 'c': 3}
alpha

In [None]:
for letter in alpha:
    print(letter)

In [None]:
for letter in sorted(alpha):
    print(letter)

## Ways to use dictionaries

In [None]:
Asprin100 = dict( drug="Aspirin", amount=100, mass_unit="mg", time_unit="hr")
Asprin100

In [None]:
dosages = [
    dict( drug="Aspirin", amount=100, mass_unit="mg", time_unit="hr"),
    dict( drug="Digoxin", amount=50,  mass_unit="mg", time_unit="hr")
]

In [None]:
dosages

In [None]:
for d in dosages:
    print("{:s} {:d} {:s}/{:s}".format(d["drug"],d["amount"],d["mass_unit"],d["time_unit"]))

In [None]:
Asprin100 = dict( drug="Aspirin", amount=100, mass_unit="mg", time_unit="hr")
Asprin100

In [None]:
Asprin100['drug name'] = Asprin100['drug']
del(Asprin100['drug'])
Asprin100

In [None]:
Asprin100 = {'drug name': 'Asprin', 'amount':100, 'mass_unit': 'mg', 'time_unit': 'hr'}
Asprin100

# Inverting Dictionaries

The setup here is that we have a dictionary of patients that are keyed off of Subject ID.  We want to rekey them based off of Patient ID...

In [None]:
subjects = {
    "A1": { "PatientID": "E143291", "Age": 19 },
    "B2": { "PatientID": "E872839", "Age": 32 },
    "C3": { "PatientID": "E878198", "Age": 19 },
    "D4": { "PatientID": "E871111", "Age": 21 },
    "E5": { "PatientID": "E143299", "Age": 3  },
    "F6": { "PatientID": "E123332", "Age": 21 },
    "H7": { "PatientID": "E989891", "Age": 19 }
} 

In [None]:
patients = {}
for subjectID, subject in subjects.items():
    patientID = subject['PatientID']
    age = subject['Age']
    patients[patientID] = { 'SubjectID': subjectID, 'Age': age }

patients

Here's we're going to regroup instead of just reorganize!

In [None]:
patient_ages = {
    "E143291": 19,
    "E872839": 32,
    "E878198": 19,
    "E871111": 21,
    "E143299": 3,
    "E123332": 21,
    "E989891": 19
} 

In [None]:
age_counts = {}
for pat_id, age in patient_ages.items():
    if age in age_counts:
        age_counts[age] += 1
    else:
        age_counts[age] = 1
        
age_counts

In [None]:
age_counts = {}
for pat_id, age in patient_ages.items():
    age_counts.setdefault(age, 0)
    age_counts[age] += 1
    
age_counts

In [None]:
age_patients = {}
for pat_id, age in patient_ages.items():
    print(pat_id, age)
    
    if age in age_patients:
        print(age, "is already here")
        age_patients[age].append(pat_id)
    else:
        print(age, "is not here")
        age_patients[age] = [pat_id]

age_patients

In [None]:
age_patients = {}
for pat_id, age in patient_ages.items():
    print(pat_id, age)
    
    age_patients.setdefault(age,[])
    age_patients[age].append(pat_id)

age_patients

In [None]:
age_patients = {}
for pat_id, age in patient_ages.items():
    print(pat_id, age)

    if age not in age_patients:
        age_patients[age] = []
    
    age_patients[age].append(pat_id)
    print("Age {:d} has {:d} patients".format(age, len(age_patients[age])))

age_patients

for age, pat_list in age_patients.items():
    print("-- Age {:d} has {:d} patients".format(age, len(pat_list)))
    

# Sets

Sets are a special kind of list that is always a unique set of values - there won't be any duplicates.

You can also think about it as just a dictionary with only keys - remember that dictionaries can only have one entry for each key.

In [None]:
sex = {'M', 'F', 'U', 'O'}
sex

In [None]:
sex = {'M', 'F', 'U', 'M', 'F', 'O'}
sex

In [None]:
sex.add('T')
sex

You can also do all kinds of other set operations: See Chapter 11, p202

# Tuples

There's another special kind of ordered list called a tuple.  What's special about tuples is that they can't be altered once they're created, even though the objects inside them can be.  Weird.

One thing you can do with tuples, is assign to multiple variables at once.

In [None]:
(a, b) = (1, 2) 
a

In [None]:
b