# Data Preprocessing Techniques with Pandas DataFrames

Data preprocessing is a crucial step in the data analysis workflow. Below are some common data preprocessing techniques using Pandas DataFrames with examples.

## 1. Handling Missing Values

### Example: Filling Missing Values

In [None]:
import pandas as pd
import numpy as np

# Sample DataFrame with missing values
df = pd.DataFrame({
    'A': [1, 2, np.nan, 4],
    'B': [np.nan, 2, 3, 4],
    'C': [1, np.nan, np.nan, 4]
})

# Fill missing values with a specific value
df_filled = df.fillna(0)
print(df_filled)

### Example: Dropping Missing Values

In [None]:
# Drop rows with any missing values
df_dropped = df.dropna()
print(df_dropped)

## 2. Removing Duplicates

### Example: Dropping Duplicate Rows

In [None]:
# Sample DataFrame with duplicate rows
df = pd.DataFrame({
    'A': [1, 2, 2, 4],
    'B': [1, 2, 2, 4]
})

# Drop duplicate rows
df_no_duplicates = df.drop_duplicates()
print(df_no_duplicates)

## 3. Data Transformation

### Example: Applying a Function to a Column

In [None]:
# Sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [10, 20, 30, 40]
})

# Apply a function to a column
df['A'] = df['A'].apply(lambda x: x * 2)
print(df)

## 4. Encoding Categorical Variables

### Example: One-Hot Encoding

In [None]:
# Sample DataFrame with categorical variable
df = pd.DataFrame({
    'A': ['cat', 'dog', 'cat', 'bird']
})

# One-hot encode the categorical variable
df_encoded = pd.get_dummies(df, columns=['A'])
print(df_encoded)

## 5. Normalization and Scaling

### Example: Min-Max Scaling

In [None]:
from sklearn.preprocessing import MinMaxScaler

# Sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [10, 20, 30, 40]
})

# Apply Min-Max Scaling
scaler = MinMaxScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
print(df_scaled)

These are some of the basic data preprocessing techniques using Pandas DataFrames. Each technique can be further customized based on the specific requirements of your data analysis workflow.