# Slide 1: Introduction to Pandas for ML/DL

**Pandas** is a Python library essential for data manipulation and analysis. It provides two primary data structures:
- `Series`: One-dimensional labeled array.
- `DataFrame`: Two-dimensional labeled data structure.

In Machine Learning and Deep Learning workflows, Pandas is used for:
- Data cleaning
- Exploratory Data Analysis (EDA)
- Feature engineering
- Data transformation before feeding into models

In [1]:
import pandas as pd

# Creating a simple DataFrame
data = {
    'Age': [25, 30, 22, 35],
    'Salary': [50000, 60000, 48000, 80000],
    'Purchased': [0, 1, 0, 1]
}
df = pd.DataFrame(data)
df

Unnamed: 0,Age,Salary,Purchased
0,25,50000,0
1,30,60000,1
2,22,48000,0
3,35,80000,1


# Slide 2: Data Inspection and Cleaning

Before training models, inspect and clean data:
- `.head()`, `.info()`, `.describe()`, `.isnull()`, `.dropna()`, `.fillna()`

In [2]:
# Inspecting the data
print(df.head())
print(df.info())
print(df.describe())

# Check for missing values
print(df.isnull().sum())

   Age  Salary  Purchased
0   25   50000          0
1   30   60000          1
2   22   48000          0
3   35   80000          1
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   Age        4 non-null      int64
 1   Salary     4 non-null      int64
 2   Purchased  4 non-null      int64
dtypes: int64(3)
memory usage: 228.0 bytes
None
             Age        Salary  Purchased
count   4.000000      4.000000    4.00000
mean   28.000000  59500.000000    0.50000
std     5.715476  14640.127504    0.57735
min    22.000000  48000.000000    0.00000
25%    24.250000  49500.000000    0.00000
50%    27.500000  55000.000000    0.50000
75%    31.250000  65000.000000    1.00000
max    35.000000  80000.000000    1.00000
Age          0
Salary       0
Purchased    0
dtype: int64


# Slide 3: Data Selection and Transformation

Use Pandas for selecting and transforming data:
- Selecting rows/columns using `.loc[]`, `.iloc[]`
- Feature engineering using `.apply()`, `.map()`, and arithmetic operations

In [3]:
# Select specific columns
features = df[['Age', 'Salary']]
labels = df['Purchased']

# Creating a new feature
df['Salary_in_K'] = df['Salary'] / 1000
df

Unnamed: 0,Age,Salary,Purchased,Salary_in_K
0,25,50000,0,50.0
1,30,60000,1,60.0
2,22,48000,0,48.0
3,35,80000,1,80.0


# Slide 4: Useful Operations for ML Preprocessing

Pandas simplifies preprocessing:
- Handling categorical variables: `.astype()`, `pd.get_dummies()`
- Normalization: `(df - df.mean()) / df.std()`

In [4]:
# Convert 'Purchased' to categorical (if not already)
df['Purchased'] = df['Purchased'].astype('category')

# One-hot encoding example
df_encoded = pd.get_dummies(df, columns=['Purchased'], drop_first=True)

# Normalize numerical columns
df_encoded[['Age', 'Salary']] = (df_encoded[['Age', 'Salary']] - df_encoded[['Age', 'Salary']].mean()) / df_encoded[['Age', 'Salary']].std()
df_encoded

Unnamed: 0,Age,Salary,Salary_in_K,Purchased_1
0,-0.524891,-0.648901,50.0,False
1,0.349927,0.034153,60.0,True
2,-1.049781,-0.785512,48.0,False
3,1.224745,1.400261,80.0,True
