# 90-803 Machine Learning Foundations with Python
### Spring 2025 / Lab-1  Our Python environment and two ML tools

#### Your name: `Joanna Chang`
#### Your Andrew Id: `joannac2`

#### Instructions
1. Rename this notebook to `lab-1-andrewid.ipynb
2. Complete the code for `one_hot_encoding` and `confusion_matrix`
3. Submit the notebook via the Canvas box for lab-1

## Problem 1: One Hot Encoding

A recommended way of encoding nominal values (e.g, colors, models of cars etc) as numbers is termed one hot encoding.  The idea is simple: if you have 5 nominal values, then each value is encoded by 5 bits, where only 1 bit is set for each unique value. The nominal values are sorted alphabetically before doing the encoding.

For example,
```
> one_hot_encoding(['alpha','beta', 'gamma'])

[['beta', [1, 0, 0]], ['alpha', [0, 1, 0]], ['gamma', [0, 0, 1]]]

> one_hot_encoding(['ford','honda', 'toyota', 'mazda', 'subaru'])

[['honda', [1, 0, 0, 0, 0]],
 ['subaru', [0, 1, 0, 0, 0]],
 ['mazda', [0, 0, 1, 0, 0]],
 ['toyota', [0, 0, 0, 1, 0]],
 ['ford', [0, 0, 0, 0, 1]]]

In [18]:
def one_hot_encoding(nominals):
    '''
    Convert a set of nominals to a set of binary variables
    one_hot_encoding(['alpha','beta', 'gamma'] -> 
    [['alpha', [1, 0, 0]], ['beta', [0, 1, 0]], ['gamma', [0, 0, 1]]]
    '''
    nominals_sorted = sorted(nominals)  # Sort the categories alphabetically
    encoding = {}
    
    # Generate one-hot encoding 
    for i in range(len(nominals_sorted)):
        one_hot = [0] * len(nominals_sorted)
        one_hot[i] = 1  # Set the corresponding position to 1
        encoding[nominals_sorted[i]] = one_hot
    
    # Create the final list with key-value pairs
    result = []
    for key in nominals_sorted:
        result.append([key, encoding[key]])
    
    return result

# Example usage
print(one_hot_encoding(['alpha', 'beta', 'gamma']))

[['alpha', [1, 0, 0]], ['beta', [0, 1, 0]], ['gamma', [0, 0, 1]]]


In [19]:
one_hot_encoding(['ford','honda', 'toyota', 'mazda', 'subaru'])



[['ford', [1, 0, 0, 0, 0]],
 ['honda', [0, 1, 0, 0, 0]],
 ['mazda', [0, 0, 1, 0, 0]],
 ['subaru', [0, 0, 0, 1, 0]],
 ['toyota', [0, 0, 0, 0, 1]]]

### Problem 2: Generating a Confusion Matrix

In [2]:
import random

In [3]:
N = 10

In [4]:
random.seed(10)
actual = [random.choice([0,1]) for i in range(N)]
actual

[0, 1, 1, 0, 0, 1, 1, 1, 0, 0]

In [5]:
random.seed(20)
def flip(x):
    if random.random() > 0.9:
        if x==1:
            return 0
        else:
            return 1
    else:
        return x
        
predicted = [ flip(x) for x in actual]
predicted

[1, 1, 1, 1, 0, 1, 0, 1, 0, 0]

In [6]:
def show(lst):
    for x in lst:
        print("%4d" % x, end='')
    print()

show(predicted)
show(actual)


   1   1   1   1   0   1   0   1   0   0
   0   1   1   0   0   1   1   1   0   0


In [15]:
def confusion_matrix(actual, predicted):
    '''
    Given the actual values and predicted values generate a confusion matrix
               Actual
   Predicted   TN  FP
               FN  TP
 
        [[ TN  FP] [ FN  TP]]    
    '''
 # Initialize the counts for TN, FP, FN, TP
    TN = FP = FN = TP = 0
    
    # Iterate over the actual and predicted values
    for a, p in zip(actual, predicted):
        if a == 1 and p == 1:
            TP += 1
        elif a == 0 and p == 0:
            TN += 1
        elif a == 1 and p == 0:
            FN += 1
        elif a == 0 and p == 1:
            FP += 1
    return [[TN, FP], [FN, TP]]

In [16]:
cm = confusion_matrix(actual, predicted )
cm

[[3, 2], [1, 4]]

In [17]:
def print_confusion_matrix(cm):
    (TN, FP), (FN, TP) = cm
    print("%4d %4d" % (TN, FP))
    print("%4d %4d" % (FN, TP))
    
print_confusion_matrix(cm)

   3    2
   1    4
