# Week-12: Python Dictionary and Machine Learning

## 0. Recap from HW6

<font size='4'>

* TA reported that many of the students had different results for HW6 Q1-Q2 even with the same random seed.
* To create a reproducible workflow, please click Run -> Run All Cells to refresh from scratch.

In [2]:
import numpy as np
import sklearn

## 1. Dictionary

### 1.1. Initialize a dictionary in Python

<font size='4'>

* A *dictionary* in Python is a collection of items accessed by a specific *key* rather than by an *index*.
* It is usually initialized by `{}` or `dict()`.

In [3]:
# 1.1.1
dict_a = {'fruit': 'apple',
          'vegetable': 'lettuce',
          'dessert': 'tiramisu'}
print(dict_a)

{'fruit': 'apple', 'vegetable': 'lettuce', 'dessert': 'tiramisu'}


<font size='4'>

* <img src="figures/dict_mapping.png" alt="drawing" width="600"/>
* `Key` connects with the `Value`, hence, creating a map-like structure.
    * If you remove `keys` from the picture, all you are left with is a data structure containing a sequence of numbers.
    * Therefore, `dict` in Python holds a `key:value` pair at each position.
* Note that
    * `key` should always be a string;
    * There is a colon `:` between `key` and `value`;
    * After specifying one `key:value` pair, add a comma `,` to separate;
    * The entire dict should be covered by `{ }` or `dict()`.
* If you would like to access the first element, use the syntax `dict_name['key_name']`

In [4]:
# 1.1.2
print(dict_a['fruit'])

apple


In [5]:
# 1.1.3
print(dict_a[0])
# will give you an error because its not accessed by index but by key

KeyError: 0

### 1.2.Data types in Python Dictionary

<font size='4'>

* The `value` in Python dictionary can have any data type.

In [6]:
# 1.2.1
dict_b = {'one': 1, 'two': 'to', 'three': 3.0, 'four': [4, 4.0]}
print(dict_b)

{'one': 1, 'two': 'to', 'three': 3.0, 'four': [4, 4.0]}


<font size='4'>

* You can update the value associated with a particular `key`.
* For example, we would like to update the value of the key `one` from `1` to `1.0`.

In [9]:
# 1.2.2
dict_b['one'] = 1.0
print(dict_b)
print(dict_b['one'])

{'one': 1.0, 'two': 'to', 'three': 3.0, 'four': [4, 4.0]}
1.0


<font size='4'>

* You can delete one key-value pair using `del` function.
* For example, we would like to delete the content associated with the key `one` from `dict_b`.

In [10]:
# 1.2.3
del dict_b['one']
print(dict_b)

{'two': 'to', 'three': 3.0, 'four': [4, 4.0]}


<font size='4'>

* You can remove all key-value pairs using `.clear()` method.
* You delete the dictionary `dict_b` using `del` keyword.
* Think about the difference of the following two commands.
    * `del` will yield a `NameError`. 

In [11]:
# 1.2.4
dict_b.clear()
print(dict_b)

{}


In [13]:
# 1.2.5
dict_b = {'one': 1, 'two': 'to', 'three': 3.0, 'four': [4, 4.0]}
del dict_b
print(dict_b)

NameError: name 'dict_b' is not defined

<font size='4'>

* Remember that a key has to be **unique** in a dictionary; no duplicates are allowed.
* However, in presence of duplicated keys, rather than giving an error, Python will only keep the last instance of the key-value pair.

In [17]:
# 1.2.6
dict_c = {'a1':'cake', 'a2': 'cookie', 'a1': 'ice-cream'}
print(dict_c)

{'a1': 'ice-cream', 'a2': 'cookie'}


## 2. Python Dict Comprehension

<font size='4'>

* Dictionary comprehension is a method to transform one dictionary into another.
* Items within the original dictionary can be conditionally included in the new dictionary, and each item can be transformed as needed.
* The way to perform dictionary comprehension is to access `key` objects and `values` objects of a dictionary.
* I will walk through with a couple of examples.

In [19]:
# 2.0.1
dict_c = {'a': 1, 'b': [2,2], 'c': [3,3,3], 'd': [4,4,4,4]}
print(dict_c.keys())
print(list(dict_c.keys()))
# access the key

dict_keys(['a', 'b', 'c', 'd'])
['a', 'b', 'c', 'd']


In [20]:
# 2.0.2
dict_c.values()
print(dict_c.values())
print(list(dict_c.values()))
# put all values saved in `dict_c` in a list and returns the list

dict_values([1, [2, 2], [3, 3, 3], [4, 4, 4, 4]])
[1, [2, 2], [3, 3, 3], [4, 4, 4, 4]]


### 2.1. Basic value transformation

<font size='4'>

* Let's start with a simple dictionary comprehension.

In [23]:
# 2.1.1
dict_simple = {'a':1, 'b': 2, 'c': 3, 'd': 4}

In [24]:
# 2.1.2 double the value
double_dict_simple = {key_iter: value_iter*2 for (key_iter, value_iter) in dict_simple.items()}
print(double_dict_simple)

{'a': 2, 'b': 4, 'c': 6, 'd': 8}


In [25]:
# 2.1.3 "double" the key
# What is the change of the key_iter?
double_dict_simple_2 = {key_iter*2: value_iter for (key_iter, value_iter) in dict_simple.items()}
print(double_dict_simple_2)

{'aa': 1, 'bb': 2, 'cc': 3, 'dd': 4}


In [26]:
'a' * 2

'aa'

### 2.2. Using the `.items()` method

<font size='4'>

* You can access each key-value pair within a dictionary using the `.items()` method.

In [28]:
# 2.2.1
dict_simple.items()
print(dict_simple.items())
print(list(dict_simple.items()))

dict_items([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]


<font size='4'>

* The general template for dictionary comprehension in Python:
* `dict_variable = {key: value for (key, value) in dictionary.items()}`.
* We can make it more complex to achieve advanced goals.

### 2.3. Using the `.fromkeys()` method

<font size='4'>

* This method allows you to create dictionaries within a uniform value for a specified set of keys.
* It is useful when initializing dictionaries with default values.

In [29]:
# 2.3.1
dict_d = dict.fromkeys(range(5), True)
print(dict_d)

{0: True, 1: True, 2: True, 3: True, 4: True}


## 3. Why Use Dictionary Comprehension

<font size='4'>

* Dictionary comprehension is a powerful concept that can be used to substitute `for` loops and `lambda` functions.
* Not all for loops can be written as a dictionary comprehension, but all dictionary comprehension can be written with a for loop.
* Let's look at the following example.

In [32]:
# 3.0.1
num_vec = range(10)
new_dict_for = {}
for n_iter in num_vec:
    if n_iter % 2 == 0:
        new_dict_for[n_iter] = n_iter**2
print(new_dict_for)

{0: 0, 2: 4, 4: 16, 6: 36, 8: 64}


<font size='4'>

* We can simplify it using dictionary comprehension.

In [33]:
# 3.0.2
new_dict_comp = {n: n**2 for n in num_vec if n % 2 ==0}
print(new_dict_comp)

{0: 0, 2: 4, 4: 16, 6: 36, 8: 64}


### 3.1. A detailed python dictionary comprehension example (converting F to C)

<font size='4'>

1. You need to define a mathematical formula that does the conversion from Fahrenheit to Celsius, which is achieved by `lambda` function.
2. You pass this function as an argumet to the `map()` function, which then applies the operation to every item in the `fahrenheit_dict.values()` list.
    * Recall that `.values()` method returns a list containing the item stored in the dictionary.
3. We now have a list `celsius_val` containing the corresponding temperature value in celsius.
4. Finally, we convert it to a dictionary using `zip()` function.
    * In particular, the zip function aggregates the item from `fahrenheit_dict.keys()` and the `celsius` list, resulting in a key-value pair after applying `dict` function.

In [36]:
# 3.1.1
# Step 0: initialize fahrenheit dict
f_dict = {'t1': -30, 't2': -20, 't3': -10, 't4': 0}
# Step 1: convert it to celsius
convert_f_to_c_fn = lambda x: (x-32)/1.8
# Step 2-3: 
c_val = list(map(convert_f_to_c_fn, f_dict.values()))
print(c_val)
# Step 4: create celsius dictionary
c_dict = dict(zip(f_dict.keys(), c_val))
print(c_dict)


[-34.44444444444444, -28.88888888888889, -23.333333333333332, -17.77777777777778]
{'t1': -34.44444444444444, 't2': -28.88888888888889, 't3': -23.333333333333332, 't4': -17.77777777777778}


In [39]:
# 3.1.2
# Practice (converting C to F)
# Write down your own code
c_dict = {'t1': 0, 't2': 5, 't3': 10, 't4': 15}
convert_c_to_f_fn = lambda x: x * 1.8 + 32
f_val = list(map(convert_c_to_f_fn, c_dict.values()))
print(f_val)
f_dict = dict(zip(c_dict.keys(), f_val))
print(f_dict)
# C = F * 1.8 + 32

[32.0, 41.0, 50.0, 59.0]
{'t1': 32.0, 't2': 41.0, 't3': 50.0, 't4': 59.0}


## 4. Adding conditionals to dictionary comprehension
### 4.1. If conditions

<font size='4'>

* Suppose that you create a new dictionary given an existing one but would like to include items that are greater than 2. 

In [40]:
# 4.1.1
dict_1 = {'a': 1, 'b':2, 'c': 3, 'd': 4, 'e': 5}
dict_1_cond = {k:v for (k,v) in dict_1.items() if v>2}
print(dict_1_cond)

{'c': 3, 'd': 4, 'e': 5}


### 4.2. Multiple if conditions

<font size='4'>

* What if you want to get items larger than 2 and check if they are even numbers simultaneously?
* The consecutive `if` statements work as if they had `and` clauses between them.

In [43]:
# 4.2.1
dict_1_double_cond = {k:v for (k,v) in dict_1.items() if v>2 if v%2 == 0}
print(dict_1_double_cond)

{'d': 4}


In [45]:
# 4.2.2 another example
dict_2 = {'a': 1, 'b':2, 'c': 3, 'd': 4, 'e': 5, 'f': 6}
dict_1_triple_cond = {k:v for (k,v) in dict_2.items() if v>2 if v%2 == 0 if v%3 == 0}
print(dict_1_triple_cond)

{'f': 6}


In [46]:
# 4.2.3 In a for loop, the above procedure corresponds to:
dict_2_triple_cond = {}
for (k,v) in dict_2.items():
    if (v>2 and v%2 ==0 and v%3==0):
        dict_2_triple_cond[k] = v
print(dict_2_triple_cond)

{'f': 6}


### 4.3. If-else conditions

<font size='4'>

* Suppose that you create a new dictionary to determine whether the value is an odd or even number.
* Treat `('even' if v%2 == 0 else 'odd')` as a whole thing.

In [47]:
# 4.3.1
dict_3 = {'a': 1, 'b':2, 'c': 3, 'd': 4, 'e': 5, 'f': 6}
# identify odd and even values
dict_3_if_else = {k:('even' if v%2==0 else 'odd') for (k,v) in dict_3.items()}
print(dict_3_if_else)

{'a': 'odd', 'b': 'even', 'c': 'odd', 'd': 'even', 'e': 'odd', 'f': 'even'}


### 4.4. Nested dictionary comprehension

<font size='4'>

* Define `outer_k`, `outer_v`, `inner_k`, and `inner_v` first!

In [None]:
# 4.4.1


<font size='4'>

* `outer_k` refers to `'first'` and `'second'`.
* `outer_v` is another dictionary, i.e., `{'a': 1}`.
* `inner_k` refers to `'a'` and `'b'`.
* `inner_v` is `1` and `2`.

In [None]:
# 4.4.2
# Since inner_k is not used, we can replace it with _.


## 5. Intro to Machine Learning
### 5.1. What is Machine Learning?
<font size='4'>

* You may hear the keywords "Machine Learning" numerous times.
* Machine learning is a subfield of artificial intelligence (AI) devoted to understanding and building models to imitate the way human beings learn.
<img src="figures/ml_category.png" alt="drawing" width="800"/>

* It includes the use of algorithms and data to improve the performance on some set of tasks and often fall into one of three common types of learning: Supervised, Unsupervised, and Reinforcement Learning.

* Supervised Learning: Learn the relationship between input and output, i.e., regression vs classification
    * *Classification* techniques predict **categorical** responses, for example, whether the drug allievates the symptoms, whether a tumor is benign, etc. Classification models classify input data into categories. Typical applications include medical imaging, imaging and speech recognition, and credit scoring.
        * You will perform a binary classification using the P300 data in HW7(8) using various ML methods.
    * *Regression* techniques predict **continuous** responses, for example, changes in biomarkers or psychiatric scores (GAD-7, PHQ-9).
    * For most of your existing work, regression technique is to serve the "association" purpose (or we call it "statistical inferences"). However, you can also apply regression to achieve the goal of "prediction", which has different procedures.
* Unsupervised Learning: Learn the underlying structure or pattern of an unlabelled dataset (only input data), i.e., clustering.
  * *Clustering* is the most common unsupervised learning technique. It is usually used for exploratory data analysis to find hidden patterns or groupings in data. 
* Optional: Reinforcement Learning: (Software `agent`) learn to perform certain `actions` in an environment with certain `rewards` (usually to maximize the rewards).
    * A typical example is personalized recommendation system. The searching engine or social media push certain contents (videos, reels, advertisements) to you based on your past selection, browsing history.
    * The physician decides a subsequent treatment to patient based on the previous treatment response.

### 5.2. Selecting the right algorithm.

<font size='4'>

* Although there is no best method or one size fits all, finding the right algorithm is partly based on trial and error.
* Highly flexible models (neural networks) tend to overfit data by modeling minor variations that could be noise.
* Simple models are easier to interpret but may lead to poor accuracy.
* However, you can borrow the following flowchart to try the proper methods.
<img src="figures/ml_method_types_flowchart.png" alt="drawing" width="1000"/>


### 5.3. Workflow of machine learning.

<font size='4'>

<img src="figures/ml_workflow.png" alt="drawing" width="800"/>

* If your goal is to predict, i.e., reduce the mis-classification rates (classification) or minimize the mean squared difference between predicted and observed values (regression), you need training/testing split. We train/fit the model on the training set and evaluate the performance of the model on the (separate) testing set.
* If your goal is to examine the association or find the hidden patterns, you do not need to do it.
<img src="figures/ml_training_test_split.png" alt="drawing" width="1200"/>


### 5.4. `Scikit-learn` package in Python

<font size='4'>

* We will go over an example in `sklearn` to demonstrate the pipeline of classification in Python next time.
* Make sure you have installed `scikit-learn` on your PyCharm properly.
* `import sklearn`