# Task 4: Feature Encoding & Scaling
## Adult Income Dataset

This notebook performs feature encoding and scaling as required for the AI & ML Internship Task 4.

In [None]:

# Libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, StandardScaler


## Load Dataset

In [None]:

columns = [
    'age', 'workclass', 'fnlwgt', 'education', 'education-num',
    'marital-status', 'occupation', 'relationship', 'race',
    'sex', 'capital-gain', 'capital-loss', 'hours-per-week',
    'native-country', 'income'
]

df = pd.read_csv("adult.csv", names=columns)
df.head()


## Identify Categorical & Numerical Features

In [None]:

categorical_features = df.select_dtypes(include='object').columns
numerical_features = df.select_dtypes(exclude='object').columns

categorical_features, numerical_features


## Label Encoding (Target Column)

In [None]:

le = LabelEncoder()
df['income'] = le.fit_transform(df['income'])
df['income'].value_counts()


## One-Hot Encoding

In [None]:

categorical_features = categorical_features.drop('income')
df_encoded = pd.get_dummies(df, columns=categorical_features)
df_encoded.head()


## Feature Scaling

In [None]:

scaler = StandardScaler()
df_encoded[numerical_features] = scaler.fit_transform(df_encoded[numerical_features])
df_encoded[numerical_features].head()


## Save Processed Dataset

In [None]:

df_encoded.to_csv("adult_processed.csv", index=False)
print("Processed dataset saved successfully.")



## Interview Questions (Short Answers)

- **Label vs One-Hot Encoding**: Label for ordered data, One-Hot for unordered.
- **Why Scaling?**: To bring features to same range.
- **Normalization**: Scaling data between 0 and 1.
- **Algorithms needing scaling**: KNN, SVM, Logistic Regression, K-Means.
- **Feature Engineering**: Transforming raw data into useful features.
