<a href="https://colab.research.google.com/github/gnoejh/ict1022/blob/main/Components/coding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Coding Techniques in Neural Networks

## Common Coding Techniques

This notebook demonstrates various coding techniques used in neural networks:
1. One-hot encoding
2. Label encoding
3. Binary encoding
4. Ordinal encoding

## One-Hot Encoding

One-hot encoding converts categorical variables into a binary vector representation where:
- Each category becomes a column
- Only one column has value 1, others are 0

Example:
- 'red' → [1, 0, 0]
- 'blue' → [0, 1, 0]
- 'green' → [0, 0, 1]

In [3]:
import torch
import torch.nn.functional as F
import numpy as np
from sklearn.preprocessing import OneHotEncoder

# PyTorch one-hot encoding
def torch_one_hot_example():
    # Create sample data
    labels = torch.tensor([0, 2, 1, 3])
    num_classes = 4
    
    # One-hot encode
    one_hot = F.one_hot(labels, num_classes=num_classes)
    print("PyTorch one-hot encoding:")
    print(one_hot)

# Sklearn one-hot encoding for categorical data
def sklearn_one_hot_example():
    # Sample categorical data
    data = np.array(['red', 'blue', 'green', 'red']).reshape(-1, 1)
    
    # Initialize and fit the encoder
    encoder = OneHotEncoder(sparse_output=False)
    one_hot = encoder.fit_transform(data)
    
    print("\nSklearn one-hot encoding:")
    print(one_hot)

torch_one_hot_example()
sklearn_one_hot_example()

PyTorch one-hot encoding:
tensor([[1, 0, 0, 0],
        [0, 0, 1, 0],
        [0, 1, 0, 0],
        [0, 0, 0, 1]])

Sklearn one-hot encoding:
[[0. 0. 1.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


## Label Encoding

Label encoding converts categorical values into numerical values.

Example:
- 'red' → 0
- 'blue' → 1
- 'green' → 2

In [4]:
from sklearn.preprocessing import LabelEncoder

# Label encoding example
def label_encoding_example():
    # Sample data
    categories = ['red', 'blue', 'green', 'red', 'blue']
    
    # Initialize and fit the encoder
    encoder = LabelEncoder()
    encoded = encoder.fit_transform(categories)
    
    print("Original categories:", categories)
    print("Encoded values:", encoded)
    print("Mapping:", dict(zip(encoder.classes_, encoder.transform(encoder.classes_))))

label_encoding_example()

Original categories: ['red', 'blue', 'green', 'red', 'blue']
Encoded values: [2 0 1 2 0]
Mapping: {'blue': 0, 'green': 1, 'red': 2}


## Binary Encoding

Binary encoding represents categories as binary numbers, then converts them to columns.

Example for 8 categories:
- Need log2(8) = 3 bits
- Category 5 → 101 → [1, 0, 1]

In [5]:
def binary_encoding_example():
    # Sample data (numbers 0-7)
    data = np.array([0, 3, 5, 7])
    
    # Convert to binary representation (3 bits)
    binary = np.unpackbits(data.reshape(-1, 1).view(np.uint8), axis=1)[:, -3:]
    
    print("Original values:", data)
    print("Binary encoded:")
    print(binary)

binary_encoding_example()

Original values: [0 3 5 7]
Binary encoded:
[[0 0 0]
 [0 0 0]
 [0 0 0]
 [0 0 0]]


## Ordinal Encoding

Ordinal encoding is used when categories have a meaningful order.

Example (education level):
- 'primary' → 1
- 'secondary' → 2
- 'bachelor' → 3
- 'master' → 4

In [6]:
from sklearn.preprocessing import OrdinalEncoder

def ordinal_encoding_example():
    # Sample data
    education = [['primary'], ['secondary'], ['bachelor'], ['master']]
    
    # Define the ordering
    ordering = [['primary', 'secondary', 'bachelor', 'master']]
    
    # Initialize and fit the encoder
    encoder = OrdinalEncoder(categories=ordering)
    encoded = encoder.fit_transform(education)
    
    print("Original values:", [x[0] for x in education])
    print("Ordinal encoded:", encoded.flatten())

ordinal_encoding_example()

Original values: ['primary', 'secondary', 'bachelor', 'master']
Ordinal encoded: [0. 1. 2. 3.]
