# **Basic Concept of OneHotEncoder**
Technique for converting categorical variables into a form that could be provided to machine learning algorithms to do a better job at predicting. In essence, it takes a column with categorical data, which has been labeled or encoded as integers, and then converts it into a binary matrix. This is crucial because many machine learning algorithms can only read numerical values.

# **Usage and Key Aspects**
1. Transforming Categorical data
2. Creating Dummy Variables
3. Preventing Algorithm Misinterpretation
4. Compatbility with ML Models

# **Importing OneHotEncoder**

In [19]:
from sklearn.preprocessing import OneHotEncoder

In [20]:
import pandas as pd
import numpy as np

df = pd.read_csv('../Data/house_competition/train.csv')

# **OneHotEncoding a Series**

### **Getting the Series**

In [21]:
# Selecting categorical data
street = df.select_dtypes(include=['object']).Street

### **Reshaping the Series**
OneHotEncoder requires a 2D array, so you need to reshapge the series

In [22]:
street_reshaped = street.values.reshape(-1,1)

### **Apply OneHotEncoder**

In [23]:
OneHotEncoded = OneHotEncoder(sparse=False)
street_encoded = OneHotEncoded.fit_transform(street_reshaped)
street_encoded



array([[0., 1.],
       [0., 1.],
       [0., 1.],
       ...,
       [0., 1.],
       [0., 1.],
       [0., 1.]])

### **Convert the Encoded Data Back to a DataFrame**
The result from encoding is a numpy array. Convert it back to a pandas DataFrame for better readability and further processing.

In [24]:
encoded_data = pd.DataFrame(street_encoded, columns=OneHotEncoded.get_feature_names_out())
encoded_data

Unnamed: 0,x0_Grvl,x0_Pave
0,0.0,1.0
1,0.0,1.0
2,0.0,1.0
3,0.0,1.0
4,0.0,1.0
...,...,...
1455,0.0,1.0
1456,0.0,1.0
1457,0.0,1.0
1458,0.0,1.0
