# **Problem Statement**  
## **8. Implement one-hot encoding and label encoding from scratch.**

Implement Label Encoding and One-Hot Encoding from scratch without using libraries such as sklearn.

- Label Encoding converts categorical values into numeric labels.
- One-Hot Encoding converts categorical values into binary vectors.

### Constraints & Example Inputs/Outputs

#### Constraints

- Input is a list of categorical values (strings or hashable types)
- Categories may repeat
- Order of encoding should be deterministic
- No external ML libraries allowed


#### Example Input:
```python
data = ["red", "blue", "green", "blue", "red"]
```

#### Expected Output:

Label Encoding
```python
{'blue': 0, 'green': 1, 'red': 2}
Encoded Data: [2, 0, 1, 0, 2]
```

One-Hot Encoding
```python
[
 [0, 0, 1],
 [1, 0, 0],
 [0, 1, 0],
 [1, 0, 0],
 [0, 0, 1]
]
```

### Solution Approach

### Label Encoding 
1. Extract unique categories
2. Assign each category a unique integer
3. Replace original values with corresponding integers

### One-Hot Encoding 
1. Identify all unique categories
2. Assign each category an index
3. For each value, create a binary vector
4. Set the corresponding index to 1

#### Why Both Are Needed
- Label Encoding → Tree-based models
- One-Hot Encoding → Linear & distance-based models

### Solution Code

In [1]:
# Approach 1: Label Encoding – Brute Force
def label_encoding_bruteforce(data):
    unique_categories = sorted(set(data))
    encoding_map = {}
    
    for idx, category in enumerate(unique_categories):
        encoding_map[category] = idx
    
    encoded_data = []
    for value in data:
        encoded_data.append(encoding_map[value])
    
    return encoding_map, encoded_data


In [2]:
# Approach 1: One-Hot Encoding – Brute Force
def one_hot_encoding_bruteforce(data):
    unique_categories = sorted(set(data))
    index_map = {cat: idx for idx, cat in enumerate(unique_categories)}
    
    one_hot_encoded = []
    
    for value in data:
        vector = [0] * len(unique_categories)
        vector[index_map[value]] = 1
        one_hot_encoded.append(vector)
    
    return index_map, one_hot_encoded


### Alternative Solution

In [3]:
# Approach 2: Optimized Version (Using NumPy, Logic Still Manual)
import numpy as np

def label_encoding_optimized(data):
    unique_categories = np.unique(data)
    encoding_map = {cat: idx for idx, cat in enumerate(unique_categories)}
    encoded = np.array([encoding_map[x] for x in data])
    return encoding_map, encoded


def one_hot_encoding_optimized(data):
    unique_categories = np.unique(data)
    index_map = {cat: idx for idx, cat in enumerate(unique_categories)}
    
    one_hot = np.zeros((len(data), len(unique_categories)), dtype=int)
    
    for i, value in enumerate(data):
        one_hot[i][index_map[value]] = 1
    
    return index_map, one_hot


### Alternative Approaches

- Target Encoding
- Binary Encoding
- Hash Encoding
- Using sklearn.preprocessing (not allowed here)

### Test the Code

In [4]:
# Test Case 1: Basic Categories

data = ["red", "blue", "green", "blue", "red"]

print("Label Encoding (Brute):")
print(label_encoding_bruteforce(data))

print("\nOne-Hot Encoding (Brute):")
print(one_hot_encoding_bruteforce(data))


Label Encoding (Brute):
({'blue': 0, 'green': 1, 'red': 2}, [2, 0, 1, 0, 2])

One-Hot Encoding (Brute):
({'blue': 0, 'green': 1, 'red': 2}, [[0, 0, 1], [1, 0, 0], [0, 1, 0], [1, 0, 0], [0, 0, 1]])


In [5]:
# Test Case 2: Optimized Version

print("Label Encoding (Optimized):")
print(label_encoding_optimized(data))

print("\nOne-Hot Encoding (Optimized):")
print(one_hot_encoding_optimized(data))


Label Encoding (Optimized):
({np.str_('blue'): 0, np.str_('green'): 1, np.str_('red'): 2}, array([2, 0, 1, 0, 2]))

One-Hot Encoding (Optimized):
({np.str_('blue'): 0, np.str_('green'): 1, np.str_('red'): 2}, array([[0, 0, 1],
       [1, 0, 0],
       [0, 1, 0],
       [1, 0, 0],
       [0, 0, 1]]))


In [6]:
# Test Case 3: Single Category

data = ["apple", "apple", "apple"]

print(label_encoding_bruteforce(data))
print(one_hot_encoding_bruteforce(data))


({'apple': 0}, [0, 0, 0])
({'apple': 0}, [[1], [1], [1]])


In [7]:
# Test Case 4: Numeric Categories

data = [1, 2, 3, 2, 1]

print(label_encoding_bruteforce(data))
print(one_hot_encoding_bruteforce(data))


({1: 0, 2: 1, 3: 2}, [0, 1, 2, 1, 0])
({1: 0, 2: 1, 3: 2}, [[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 1, 0], [1, 0, 0]])


## Complexity Analysis

### Label Encoding
- Time: O(n)
- Space: O(k) (k = unique categories)

### One-Hot Encoding
- Time: O(n × k)
- Space: O(n × k)

#### Thank You!!