# ðŸ“Œ Data Encoding (Veri Kodlama)

Makine Ã¶ÄŸrenmesi ve derin Ã¶ÄŸrenme algoritmalarÄ± **sayÄ±sal verilerle** Ã§alÄ±ÅŸÄ±r.  
Bu yÃ¼zden, kategorik deÄŸiÅŸkenleri (**string, metin tabanlÄ± sÄ±nÄ±flar**) modele uygun hale getirmek iÃ§in **Data Encoding** yÃ¶ntemleri kullanÄ±lÄ±r.  

---

## ðŸ”¹ 1. Nominal Encoding  
Kategorilerin **herhangi bir sÄ±ralamasÄ± yoktur** (Ã¶rn: ÅŸehir isimleri, renkler).  
Bu tÃ¼r deÄŸiÅŸkenlerde **One-Hot Encoding (OHE)** veya **Dummy Encoding** kullanÄ±lÄ±r.

### âœ… One-Hot Encoding (OHE)  
- Her kategori iÃ§in **ayrÄ± bir sÃ¼tun** oluÅŸturur.  
- O kategoriye ait olan Ã¶rneÄŸe `1`, diÄŸerlerine `0` atanÄ±r.  

Ã–rnek:  

Color = ["Red", "Blue", "Green"]

OHE:
- Red Blue Green
- 1     0     0
- 0     1     0
- 0     0     1

---

## ðŸ”¹ 2. Label Encoding  
- Her kategoriye bir **tam sayÄ± etiketi** atanÄ±r.  
- AvantajÄ±: Tek sÃ¼tun kalÄ±r, veri boyutu artmaz.  
- DezavantajÄ±: **Algoritmalar bu sayÄ±larÄ± sÄ±ralÄ± gibi algÄ±layabilir** (yanlÄ±ÅŸ sonuÃ§lara yol aÃ§abilir).

Ã–rnek:  

Color = ["Red", "Blue", "Green"]

Label Encoding:
- Red -> 0
- Blue -> 1
- Green -> 2

---

## ðŸ”¹ 3. Ordinal Encoding  
- Kategorilerin **mantÄ±ksal bir sÄ±rasÄ±** varsa (Ã¶rn: eÄŸitim seviyesi, dÃ¼ÅŸÃ¼k-orta-yÃ¼ksek), sayÄ±lar bu sÄ±rayÄ± temsil edecek ÅŸekilde atanÄ±r.  

Ã–rnek:  

Education = ["Primary", "High School", "Bachelor", "Master", "PhD"]

Ordinal Encoding:
- Primary -> 1
- High School -> 2
- Bachelor -> 3
- Master -> 4
- PhD -> 5

---

## ðŸ”¹ 4. Target Guided Ordinal Encoding  
- Kategoriler, **hedef deÄŸiÅŸken (target)** ile iliÅŸkisine gÃ¶re sÄ±ralanÄ±r.  
- Ã–rneÄŸin, her kategori iÃ§in **ortalama hedef deÄŸer** alÄ±nÄ±r ve buna gÃ¶re sÄ±ralama yapÄ±lÄ±r.  

Ã–rnek (Hedef: SatÄ±n alma olasÄ±lÄ±ÄŸÄ±):  
City Purchase Rate
- Istanbul 0.70
- Ankara 0.50
- Izmir 0.20

Encoding:

- Istanbul -> 3
- Ankara -> 2
- Izmir -> 1


---

# ðŸ“Š Ã–zet Tablo

| Encoding TÃ¼rÃ¼               | Ne Zaman KullanÄ±lÄ±r?                              | DezavantajÄ± |
|-----------------------------|--------------------------------------------------|-------------|
| One-Hot Encoding (OHE)      | SÄ±rasÄ±z kategorik deÄŸiÅŸkenler (nominal)          | Boyut artar |
| Label Encoding              | SÄ±ra yok ama basitlik iÃ§in kullanÄ±labilir        | YanlÄ±ÅŸ sÄ±ralama algÄ±sÄ± yaratabilir |
| Ordinal Encoding            | DoÄŸal sÄ±ralÄ± kategoriler                         | Yok |
| Target Guided Ordinal       | Hedef ile iliÅŸkili sÄ±ralama yapÄ±lmak istendiÄŸinde | Overfitting riski |

---

ðŸ‘‰ Ã–zet:  
- **Nominal â†’ OHE**  
- **Ordinal â†’ Ordinal Encoding**  
- **Kategorilerin target ile iliÅŸkisi Ã¶nemliyse â†’ Target Guided Ordinal**


---

---

---

#### One Hot Encoder

In [2]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

colors = np.array(['red','green','blue','yellow','skyblue'])
df = pd.DataFrame({
    "color" : colors
})
df

Unnamed: 0,color
0,red
1,green
2,blue
3,yellow
4,skyblue


In [3]:
df

Unnamed: 0,color
0,red
1,green
2,blue
3,yellow
4,skyblue


In [4]:
encoder = OneHotEncoder()

In [8]:
encoder.fit_transform(df[['color']]).toarray()

array([[0., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0.],
       [1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1.],
       [0., 0., 0., 1., 0.]])

---

--- 

---

#### Label Encoding

In [13]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder

colors = np.array(['red','green','blue','yellow','blue','skyblue'])
df = pd.DataFrame({
    "color" : colors
})
lbl_encoder = LabelEncoder()


In [14]:
lbl_encoder.fit_transform(df[['color']])

  y = column_or_1d(y, warn=True)


array([2, 1, 0, 4, 0, 3])

In [15]:
lbl_encoder.transform([['red']])

  y = column_or_1d(y, dtype=self.classes_.dtype, warn=True)


array([2])

---

---

---

#### Ordinal Encoding

In [16]:
from sklearn.preprocessing import OrdinalEncoder

df = pd.DataFrame({
    'size' : ['small','medium','large','medium','small','large']
})

In [17]:
df

Unnamed: 0,size
0,small
1,medium
2,large
3,medium
4,small
5,large


In [21]:
encoder = OrdinalEncoder(categories=[['small','medium','large']])

In [22]:
encoder.fit_transform(df[['size']])

array([[0.],
       [1.],
       [2.],
       [1.],
       [0.],
       [2.]])

In [25]:
encoder.transform([['small']])



array([[0.]])