# Pandas

### Pandas

### **1. Basics: Creating and Inspecting DataFrames and Series**

``` python
import pandas as pd
import numpy as np

# Create a Series
series = pd.Series([10, 20, np.nan, 40], index=['a', 'b', 'c', 'd'])
print("Series with NaN:\n", series)
print("Check for NaN:\n", series.isna())
print("Fill NaN with 0:\n", series.fillna(0))

# Create a DataFrame with missing values
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, np.nan, 35, 40],
    'Score': [85.5, 90.0, np.nan, 92.0]
}
df = pd.DataFrame(data)
print("\nDataFrame with NaN:\n", df)
print("Drop rows with NaN:\n", df.dropna())
print("Fill NaN with mean:\n", df.fillna(df.mean(numeric_only=True)))

# From NumPy array (unchanged)
array = np.array([[1, 2], [3, 4], [5, 6]])
df_array = pd.DataFrame(array, columns=['Col1', 'Col2'], index=['Row1', 'Row2', 'Row3'])
print("\nDataFrame from NumPy:\n", df_array)
```

------------------------------------------------------------------------

### **2. Data Manipulation: Indexing, Filtering, Grouping, and Merging**

``` python
import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'Score': [85.5, 90.0, 78.5, 92.0]
}
df = pd.DataFrame(data)

# Indexing (unchanged)
print("First row:\n", df.iloc[0])
print("Name column:\n", df['Name'])
print("Row where Name is Bob:\n", df.loc[df['Name'] == 'Bob'])

# Filtering (unchanged)
high_scorers = df[df['Score'] > 85]
print("\nScores > 85:\n", high_scorers)

# Adding a column (unchanged)
df['Pass'] = df['Score'] > 80
print("\nWith Pass column:\n", df)

# Grouping (unchanged)
grouped = df.groupby('Pass').mean()
print("\nGrouped by Pass:\n", grouped)

# Merging (unchanged)
df2 = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Eve'],
    'Grade': ['A', 'B', 'C']
})
merged = pd.merge(df, df2, on='Name', how='left')
print("\nMerged DataFrame:\n", merged)

# NEW: Data Cleaning
df = df.rename(columns={'Score': 'Test_Score'})  # Rename column
df['Age'] = df['Age'].astype(float)  # Change data type
print("\nAfter cleaning:\n", df)

# NEW: Sorting
sorted_df = df.sort_values('Test_Score', ascending=False)
print("\nSorted by Test_Score:\n", sorted_df)

# NEW: Working with Dates
dates = pd.date_range('2023-01-01', periods=4, freq='D')
df['Date'] = dates
print("\nWith Dates:\n", df)
```

------------------------------------------------------------------------

### **3. Mathematical Operations: Aggregations and Computations**

``` python
import pandas as pd
import numpy as np

# Sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [10, 20, 30, 40]
})

# Basic operations (unchanged)
print("Sum of columns:\n", df.sum())
print("Mean of rows:\n", df.mean(axis=1))

# Element-wise operations (unchanged)
df['A_plus_B'] = df['A'] + df['B']
print("\nWith A_plus_B:\n", df)

# Apply a function (unchanged)
df['C_squared'] = df['C'].apply(lambda x: x ** 2)
print("\nWith C_squared:\n", df)

# Aggregations (unchanged)
agg_results = df.agg({'A': 'mean', 'B': 'sum', 'C': 'max'})
print("\nAggregations:\n", agg_results)

# NEW: Descriptive Statistics
print("\nDescribe:\n", df.describe())
print("\nValue counts for A:\n", df['A'].value_counts())
```

### 4. Bonus: Input/Output (Reading/Writing Data)

``` python
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

# Write to CSV
df.to_csv('sample.csv', index=False)
print("Data written to sample.csv")

# Read from CSV
df_read = pd.read_csv('sample.csv')
print("\nData read from CSV:\n", df_read)
```

**4-selecting new**

``` python
import pandas as pd
import numpy as np

# Creating the DataFrame
df = pd.DataFrame({'value': [10, -5, 0, 20, -8, 7]})
print(df)
# Defining the conditions and choices
condition = [df['value'] > 10, (df['value'] > 0), df['value'] <= 0]
choice = ['High', 'medium', "low"]

# Creating a new column based on the conditions
df['category'] = np.select(condition, choice)

print(df)
```

### Pandas Separately

-   **Reading/Writing Data**

    \### âœ… **Read Data**

    ``` python
    python
    CopyEdit
    import pandas as pd

    # Read CSV file
    df = pd.read_csv("data.csv")

    # Read Excel file
    df = pd.read_excel("data.xlsx")
    ```

    \### âœ… **Write Data**

    ``` python
    python
    CopyEdit
    # Save to CSV
    df.to_csv("output.csv", index=False)

    # Save to Excel
    df.to_excel("output.xlsx", index=False)
    ```

    ------------------------------------------------------------------------

    \### **ðŸ“Œ Creating DataFrames**

    \### âœ… **From Dictionary**

    ``` python
    python
    CopyEdit
    data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
    df = pd.DataFrame(data)
    ```

    \### âœ… **From List of Lists**

    ``` python
    python
    CopyEdit
    data = [['Alice', 25], ['Bob', 30]]
    df = pd.DataFrame(data, columns=['Name', 'Age'])
    ```

    \### âœ… **Using Pandas Series**

    ``` python
    python
    CopyEdit
    s = pd.Series([10, 20, 30], name="Numbers")
    df = pd.DataFrame(s)
    ```

-   **Indexing & Selecting**

    \### **âœ… Indexing**

    | Method    | Description                                         |
    |-----------|-----------------------------------------------------|
    | `.loc[]`  | Label-based indexing (uses row/column labels)       |
    | `.iloc[]` | Position-based indexing (uses row/column positions) |
    | `.at[]`   | Fast access for a single value (label-based)        |
    | `.iat[]`  | Fast access for a single value (position-based)     |

    ``` python
    python
    CopyEdit
    import pandas as pd

    # Sample DataFrame
    data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'Score': [85, 90, 95]}
    df = pd.DataFrame(data, index=['a', 'b', 'c'])

    # Label-based indexing
    print(df.loc['a'])       # Select row with index 'a'
    print(df.loc['b', 'Age']) # Select 'Age' of index 'b'

    # Position-based indexing
    print(df.iloc[0])        # Select first row
    print(df.iloc[1, 1])     # Select second row, second column (30)
    ```

    \### **âœ… Column Selection**

    ``` python
    python
    CopyEdit
    # Select a single column
    print(df['Name'])

    # Select multiple columns
    print(df[['Name', 'Age']])
    ```

    \### **âœ… Row Selection**

    ``` python
    python
    CopyEdit
    # Select row by label
    print(df.loc['b'])

    # Select row by position
    print(df.iloc[2])
    ```

-   **Filtering & Conditional Selection**

    \### **âœ… Boolean Masking (Filtering based on conditions)**

    ``` python
    python
    CopyEdit
    import pandas as pd

    # Sample DataFrame
    data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
            'Age': [25, 30, 35, 40],
            'Score': [85, 90, 75, 95]}

    df = pd.DataFrame(data)

    # Filter rows where Age > 30
    print(df[df['Age'] > 30])

    # Filter rows where Score >= 85
    print(df[df['Score'] >= 85])
    ```

    \### **âœ… Multiple Conditions (`&` for AND, `|` for OR)**

    ``` python
    python
    CopyEdit
    # Filter rows where Age > 30 AND Score > 80
    print(df[(df['Age'] > 30) & (df['Score'] > 80)])

    # Filter rows where Age < 35 OR Score > 90
    print(df[(df['Age'] < 35) | (df['Score'] > 90)])
    ```

    \### **âœ… Filtering with `isin()` (Multiple values check)**

    ``` python
    python
    CopyEdit
    # Select rows where Name is either 'Alice' or 'Charlie'
    print(df[df['Name'].isin(['Alice', 'Charlie'])])
    ```

    \### **âœ… Filtering with `str.contains()` (For text-based
    filtering)**

    ``` python
    python
    CopyEdit
    # Select rows where Name contains "bo" (case insensitive)
    print(df[df['Name'].str.contains('bo', case=False)])
    ```

    \### **âœ… Filtering with `between()` (Range filtering)**

    ``` python
    python
    CopyEdit
    # Select rows where Age is between 30 and 40
    print(df[df['Age'].between(30, 40)])
    ```

-   **Sorting & Ranking**

    \### **âœ… Sorting Data**

    | Method                                      | Description                             |
    |------------------------------------|------------------------------------|
    | `df.sort_values(by='col')`                  | Sort by a column (ascending by default) |
    | `df.sort_values(by='col', ascending=False)` | Sort by a column in descending order    |
    | `df.sort_values(by=['col1', 'col2'])`       | Sort by multiple columns                |
    | `df.sort_index()`                           | Sort by index                           |

    ``` python
    python
    CopyEdit
    import pandas as pd

    # Sample DataFrame
    data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
            'Age': [25, 30, 35, 40],
            'Score': [85, 90, 75, 95]}

    df = pd.DataFrame(data)

    # Sort by Age (ascending)
    print(df.sort_values(by='Age'))

    # Sort by Score (descending)
    print(df.sort_values(by='Score', ascending=False))

    # Sort by multiple columns (Age ascending, Score descending)
    print(df.sort_values(by=['Age', 'Score'], ascending=[True, False]))
    ```

    ------------------------------------------------------------------------

    \### **âœ… Ranking Data**

    | Method                            | Description                                      |
    |------------------------------------|------------------------------------|
    | `df['col'].rank()`                | Assign ranks (default: average ranking for ties) |
    | `df['col'].rank(method='first')`  | Assign ranks based on first occurrence           |
    | `df['col'].rank(method='dense')`  | Dense ranking (no gaps in rank)                  |
    | `df['col'].rank(ascending=False)` | Rank in descending order                         |

    ``` python
    python
    CopyEdit
    # Rank by Score (default method = 'average')
    df['Rank'] = df['Score'].rank()
    print(df)

    # Rank with method='first' (resolves ties by order of appearance)
    df['Rank_First'] = df['Score'].rank(method='first')
    print(df)

    # Rank in descending order
    df['Rank_Desc'] = df['Score'].rank(ascending=False)
    print(df)
    ```

-   **Grouping & Aggregation**

    \### **âœ… Grouping (`groupby()`)**

    | Method                         | Description                   |
    |--------------------------------|-------------------------------|
    | `df.groupby('col')`            | Groups data based on a column |
    | `df.groupby(['col1', 'col2'])` | Groups by multiple columns    |
    | `df.groupby('col').size()`     | Counts rows per group         |

    ``` python
    python
    CopyEdit
    import pandas as pd

    # Sample DataFrame
    data = {'Department': ['IT', 'HR', 'IT', 'HR', 'IT'],
            'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
            'Salary': [60000, 55000, 70000, 50000, 65000]}

    df = pd.DataFrame(data)

    # Group by Department
    grouped = df.groupby('Department')

    # Display grouped data (not very readable)
    print(grouped.first())
    ```

    ------------------------------------------------------------------------

    \### **âœ… Aggregation (`agg()`, `sum()`, `mean()`, etc.)**

    | Method                                   | Description              |
    |------------------------------------------|--------------------------|
    | `df.groupby('col').sum()`                | Sum of values per group  |
    | `df.groupby('col').mean()`               | Mean of values per group |
    | `df.groupby('col').count()`              | Count of rows per group  |
    | `df.groupby('col').agg(['sum', 'mean'])` | Multiple aggregations    |

    ``` python
    python
    CopyEdit
    # Sum of salaries by Department
    print(df.groupby('Department')['Salary'].sum())

    # Average salary per department
    print(df.groupby('Department')['Salary'].mean())

    # Multiple aggregations
    print(df.groupby('Department')['Salary'].agg(['sum', 'mean', 'max']))
    ```

    ------------------------------------------------------------------------

    \### **âœ… Grouping with Multiple Columns**

    ``` python
    python
    CopyEdit
    # Group by multiple columns and count
    print(df.groupby(['Department', 'Salary']).size())
    ```

-   **Joining & Merging**

    \### **âœ… Merging (`merge()`)**

    | Method                                      | Description              |
    |---------------------------------------------|--------------------------|
    | `pd.merge(df1, df2, on='col')`              | Merge on a common column |
    | `pd.merge(df1, df2, how='left', on='col')`  | Left Join                |
    | `pd.merge(df1, df2, how='right', on='col')` | Right Join               |
    | `pd.merge(df1, df2, how='inner', on='col')` | Inner Join (default)     |
    | `pd.merge(df1, df2, how='outer', on='col')` | Outer Join               |

    ``` python
    python
    CopyEdit
    import pandas as pd

    # Sample DataFrames
    df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
    df2 = pd.DataFrame({'ID': [2, 3, 4], 'Score': [90, 85, 88]})

    # Inner Join (only matching IDs)
    print(pd.merge(df1, df2, on='ID', how='inner'))

    # Left Join (all from df1, matching from df2)
    print(pd.merge(df1, df2, on='ID', how='left'))

    # Right Join (all from df2, matching from df1)
    print(pd.merge(df1, df2, on='ID', how='right'))

    # Outer Join (all records, NaN where no match)
    print(pd.merge(df1, df2, on='ID', how='outer'))
    ```

    ------------------------------------------------------------------------

    \### **âœ… Joining (`join()`)**

    | Method                                            | Description               |
    |------------------------------------|------------------------------------|
    | `df1.join(df2, on='col')`                         | Join on index             |
    | `df1.set_index('col').join(df2.set_index('col'))` | Join on a specific column |

    ``` python
    python
    CopyEdit
    # Set index and join
    df1 = df1.set_index('ID')
    df2 = df2.set_index('ID')

    print(df1.join(df2, how='left'))
    ```

-   **Handling Missing Data**

    | Method                      | Description                                        |
    |-----------------------------|----------------------------------------------------|
    | `df.isnull()`               | Returns a DataFrame with `True` for missing values |
    | `df.notnull()`              | Returns `True` for non-missing values              |
    | `df.dropna()`               | Removes rows or columns with missing values        |
    | `df.fillna(value)`          | Fills missing values with a specified value        |
    | `df.interpolate()`          | Fills missing values using interpolation           |
    | `df.replace(np.nan, value)` | Replaces NaN with a specific value                 |

    ------------------------------------------------------------------------

    \### **âœ… Detecting Missing Values**

    ``` python
    python
    CopyEdit
    import pandas as pd
    import numpy as np

    # Sample DataFrame with Missing Values
    data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
            'Age': [25, np.nan, 35, 40],
            'Score': [85, 90, np.nan, 95]}

    df = pd.DataFrame(data)

    # Check for missing values
    print(df.isnull())

    # Count missing values per column
    print(df.isnull().sum())
    ```

    ------------------------------------------------------------------------

    \### **âœ… Removing Missing Values (`dropna()`)**

    ``` python
    python
    CopyEdit
    # Drop rows with missing values
    print(df.dropna())

    # Drop columns with missing values
    print(df.dropna(axis=1))
    ```

    ------------------------------------------------------------------------

    \### **âœ… Filling Missing Values (`fillna()`)**

    ``` python
    python
    CopyEdit
    # Fill missing values with a specific number
    print(df.fillna(0))

    # Fill missing values with column mean
    print(df.fillna(df.mean()))

    # Fill missing values with forward fill (previous value)
    print(df.fillna(method='ffill'))

    # Fill missing values with backward fill (next value)
    print(df.fillna(method='bfill'))
    ```

    ------------------------------------------------------------------------

    \### **âœ… Interpolation (Filling Missing Data Smartly)**

    ``` python
    python
    CopyEdit
    # Fill missing values using linear interpolation
    print(df.interpolate())
    ```

    ------------------------------------------------------------------------

    \### **âœ… Replacing NaN Values (`replace()`)**

    ``` python
    python
    CopyEdit
    # Replace NaN with a specific value
    print(df.replace(np.nan, 'Unknown'))
    ```

-   **Data Transformation**

    \### **âœ… 1. Applying Functions (`apply()`, `map()`, `applymap()`)**

    | Method                   | Description                                      |
    |------------------------------------|------------------------------------|
    | `df['col'].apply(func)`  | Applies a function to a column                   |
    | `df.apply(func, axis=1)` | Applies a function row-wise                      |
    | `df['col'].map(func)`    | Element-wise transformation for Series           |
    | `df.applymap(func)`      | Element-wise transformation for entire DataFrame |

    ``` python
    python
    CopyEdit
    import pandas as pd

    data = {'Name': ['Alice', 'Bob', 'Charlie'],
            'Age': [25, 30, 35],
            'Salary': [50000, 60000, 70000]}

    df = pd.DataFrame(data)

    # Apply function to a column (increase salary by 10%)
    df['Salary'] = df['Salary'].apply(lambda x: x * 1.1)

    # Apply function row-wise
    df['Age Group'] = df.apply(lambda row: 'Young' if row['Age'] < 30 else 'Old', axis=1)

    # Using map() on a Series
    df['Name Length'] = df['Name'].map(len)

    print(df)
    ```

    ------------------------------------------------------------------------

    \### **âœ… 2. Changing Data Types (`astype()`)**

    | Method                    | Description        |
    |---------------------------|--------------------|
    | `df['col'].astype(str)`   | Convert to string  |
    | `df['col'].astype(int)`   | Convert to integer |
    | `df['col'].astype(float)` | Convert to float   |

    ``` python
    python
    CopyEdit
    df['Age'] = df['Age'].astype(str)  # Convert Age to string
    ```

    ------------------------------------------------------------------------

    \### **âœ… 3. Renaming Columns (`rename()`)**

    | Method                                        | Description        |
    |-----------------------------------------------|--------------------|
    | `df.rename(columns={'old_name': 'new_name'})` | Rename column      |
    | `df.columns = ['new_col1', 'new_col2']`       | Rename all columns |

    ``` python
    python
    CopyEdit
    df.rename(columns={'Salary': 'Income'}, inplace=True)
    ```

    ------------------------------------------------------------------------

    \### **âœ… 4. Binning (Creating Ranges)**

    ``` python
    python
    CopyEdit
    bins = [20, 30, 40]
    labels = ['20-30', '30-40']
    df['Age Group'] = pd.cut(df['Age'].astype(int), bins=bins, labels=labels)
    ```

    ------------------------------------------------------------------------

    \### **âœ… 5. Encoding Categorical Data**

    ``` python
    python
    CopyEdit
    df['Gender'] = ['F', 'M', 'M']
    df['Gender_encoded'] = df['Gender'].map({'M': 0, 'F': 1})  # Label Encoding
    ```

    ------------------------------------------------------------------------

    \### **âœ… 6. Scaling & Normalization (`MinMaxScaler`,
    `StandardScaler`)**

    ``` python
    python
    CopyEdit
    from sklearn.preprocessing import MinMaxScaler

    scaler = MinMaxScaler()
    df[['Salary']] = scaler.fit_transform(df[['Salary']])
    ```

-   **Pivoting & Reshaping**

    \## **âœ… 1. Pivoting (`pivot()`, `pivot_table()`)**

    \### **ðŸ”¹ `pivot()` - Reshape DataFrame based on column values**

    | Method                             | Description                                         |
    |------------------------------------|------------------------------------|
    | `df.pivot(index, columns, values)` | Converts unique values in a column into new columns |

    ``` python
    python
    CopyEdit
    import pandas as pd

    # Sample DataFrame
    data = {'Date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02'],
            'City': ['NY', 'LA', 'NY', 'LA'],
            'Temperature': [30, 25, 28, 26]}

    df = pd.DataFrame(data)

    # Pivot: Make 'City' as columns, 'Date' as index, and 'Temperature' as values
    pivot_df = df.pivot(index='Date', columns='City', values='Temperature')

    print(pivot_df)
    ```

    ðŸ”¹ **Converts rows into columns** based on unique values in `City`.

    ------------------------------------------------------------------------

    \### **ðŸ”¹ `pivot_table()` - Aggregate data while pivoting**

    | Method                                            | Description                |
    |------------------------------------|------------------------------------|
    | `df.pivot_table(values, index, columns, aggfunc)` | Aggregates and pivots data |

    ``` python
    python
    CopyEdit
    # Pivot Table: Get the average temperature per city
    pivot_table_df = df.pivot_table(values='Temperature', index='Date', columns='City', aggfunc='mean')

    print(pivot_table_df)
    ```

    âœ… **Key Difference:**

    -   `pivot()` **fails if duplicate values exist** for an
        index-column combination.
    -   `pivot_table()` **aggregates values** to avoid duplication
        issues.

    ------------------------------------------------------------------------

    \## **âœ… 2. Reshaping (`melt()`, `stack()`, `unstack()`)**

    \### **ðŸ”¹ `melt()` - Convert Wide Data to Long Format**

    | Method                                               | Description                |
    |------------------------------------|------------------------------------|
    | `df.melt(id_vars, value_vars, var_name, value_name)` | Converts columns into rows |

    ``` python
    python
    CopyEdit
    # Sample Wide Data
    df_wide = pd.DataFrame({'Date': ['2024-01-01', '2024-01-02'],
                            'NY': [30, 28],
                            'LA': [25, 26]})

    # Melt: Convert 'NY' and 'LA' columns into a single 'City' column
    df_long = df_wide.melt(id_vars=['Date'], var_name='City', value_name='Temperature')

    print(df_long)
    ```

    âœ… **Melt is useful for converting wide data (multiple columns) into
    long format.**

    ------------------------------------------------------------------------

    \### **ðŸ”¹ `stack()` - Convert Columns to Rows**

    ``` python
    python
    CopyEdit
    df_stacked = df_wide.set_index('Date').stack()
    print(df_stacked)
    ```

    ðŸ“Œ **`stack()` moves columns into a hierarchical row index.**

    ------------------------------------------------------------------------

    \### **ðŸ”¹ `unstack()` - Convert Rows to Columns**

    ``` python
    python
    CopyEdit
    df_unstacked = df_stacked.unstack()
    print(df_unstacked)
    ```

    ðŸ“Œ **`unstack()` reverses `stack()`, moving rows back into
    columns.**

-   **Datetime Operations**

    \## **âœ… 1. Creating Datetime Objects (`to_datetime()`)**

    ``` python
    python
    CopyEdit
    import pandas as pd

    # Convert a column to datetime
    df = pd.DataFrame({'date': ['2024-03-01', '2024-03-02', '2024-03-03']})
    df['date'] = pd.to_datetime(df['date'])

    print(df.dtypes)  # Check data type
    ```

    ðŸ“Œ **`to_datetime()`** automatically converts string dates into
    proper datetime objects.

    ------------------------------------------------------------------------

    \## **âœ… 2. Extracting Date Components**

    | Attribute         | Description                |
    |-------------------|----------------------------|
    | `dt.year`         | Extracts the year          |
    | `dt.month`        | Extracts the month         |
    | `dt.day`          | Extracts the day           |
    | `dt.weekday`      | Day of the week (0=Monday) |
    | `dt.day_name()`   | Full weekday name          |
    | `dt.month_name()` | Full month name            |

    ``` python
    python
    CopyEdit
    df['year'] = df['date'].dt.year
    df['month'] = df['date'].dt.month
    df['day'] = df['date'].dt.day
    df['weekday'] = df['date'].dt.day_name()

    print(df)
    ```

    ------------------------------------------------------------------------

    \## **âœ… 3. Generating Date Ranges (`date_range()`)**

    ``` python
    python
    CopyEdit
    # Create a sequence of dates
    date_series = pd.date_range(start='2024-03-01', periods=5, freq='D')
    print(date_series)
    ```

    ðŸ“Œ **Common frequencies:**

    -   `'D'` â†’ Daily
    -   `'W'` â†’ Weekly
    -   `'M'` â†’ Monthly
    -   `'H'` â†’ Hourly

    ------------------------------------------------------------------------

    \## **âœ… 4. Date Arithmetic (Adding/Subtracting Time)**

    ``` python
    python
    CopyEdit
    df['next_day'] = df['date'] + pd.Timedelta(days=1)
    df['prev_week'] = df['date'] - pd.Timedelta(weeks=1)

    print(df)
    ```

    ------------------------------------------------------------------------

    \## **âœ… 5. Filtering & Conditional Selection with Dates**

    ``` python
    python
    CopyEdit
    # Filter data after a specific date
    df_filtered = df[df['date'] > '2024-03-01']
    print(df_filtered)
    ```

    ------------------------------------------------------------------------

    \## **âœ… 6. Setting Datetime as Index (`set_index()`)**

    ``` python
    python
    CopyEdit
    df.set_index('date', inplace=True)
    print(df)
    ```

    ðŸ“Œ **Useful for time-series analysis.**

    ------------------------------------------------------------------------

    \## **âœ… 7. Resampling Time-Series Data (`resample()`)**

    ``` python
    python
    CopyEdit
    # Example DataFrame with hourly data
    df = pd.DataFrame({'date': pd.date_range(start='2024-03-01', periods=10, freq='H'),
                       'value': range(10)})

    df.set_index('date', inplace=True)

    # Resample to daily data and get sum
    df_daily = df.resample('D').sum()
    print(df_daily)
    ```

-   **Window Functions**

    \## **âœ… 1. Rolling Window (`rolling()`)**

    Used for moving averages, smoothing, and aggregations over a defined
    window.

    | Function               | Description                   |
    |------------------------|-------------------------------|
    | `df.rolling(n).mean()` | Rolling mean (moving average) |
    | `df.rolling(n).sum()`  | Rolling sum                   |
    | `df.rolling(n).max()`  | Rolling maximum               |
    | `df.rolling(n).min()`  | Rolling minimum               |

    ``` python
    python
    CopyEdit
    import pandas as pd

    # Sample Data
    df = pd.DataFrame({'date': pd.date_range(start='2024-03-01', periods=7, freq='D'),
                       'sales': [100, 200, 300, 400, 500, 600, 700]})

    df.set_index('date', inplace=True)

    # 3-day moving average
    df['rolling_avg'] = df['sales'].rolling(window=3).mean()

    print(df)
    ```

    ðŸ“Œ **Rolling Window** calculates **average over the last 3 days**.

    ------------------------------------------------------------------------

    \## **âœ… 2. Expanding Window (`expanding()`)**

    Computes cumulative metrics from the start of the data to the
    current row.

    | Function                | Description    |
    |-------------------------|----------------|
    | `df.expanding().mean()` | Expanding mean |
    | `df.expanding().sum()`  | Expanding sum  |

    ``` python
    python
    CopyEdit
    df['cumulative_avg'] = df['sales'].expanding().mean()
    print(df)
    ```

    ðŸ“Œ **Expanding Window** grows over time, accumulating all previous
    data.

    ------------------------------------------------------------------------

    \## **âœ… 3. Cumulative Functions (`cumsum()`, `cumprod()`,
    `cummax()`, `cummin()`)**

    Used to track cumulative trends over time.

    ``` python
    python
    CopyEdit
    df['cumulative_sum'] = df['sales'].cumsum()
    df['cumulative_max'] = df['sales'].cummax()
    print(df)
    ```

    ðŸ“Œ **Cumulative Sum** keeps adding previous values.

    ------------------------------------------------------------------------

    \## **âœ… 4. Weighted Moving Average (`ewm()`)**

    Gives **more weight to recent values** (good for smoothing
    time-series data).

    | Function                | Description                         |
    |-------------------------|-------------------------------------|
    | `df.ewm(span=n).mean()` | Exponential weighted moving average |

    ``` python
    python
    CopyEdit
    df['ewm_avg'] = df['sales'].ewm(span=3, adjust=False).mean()
    print(df)
    ```

### Iris Load ,explore and clean using pd

Hereâ€™s how you can **load, explore, and clean** a dataset like **Iris or
Titanic** using `pandas`:

### **ðŸ“Œ Step 1: Load the Dataset**

``` python
python
CopyEdit
import pandas as pd
from seaborn import load_dataset  # Seaborn has built-in datasets

# Load the Titanic dataset
df = load_dataset("titanic")

# OR load Iris dataset
# df = load_dataset("iris")

print(df.head())  # Display first 5 rows
```

### **ðŸ“Œ Step 2: Explore the Data**

``` python
python
CopyEdit
print(df.info())  # Check data types & missing values
print(df.describe())  # Summary of numerical features
print(df.isnull().sum())  # Count missing values in each column
```

### **ðŸ“Œ Step 3: Handle Missing Values**

``` python
python
CopyEdit
# Fill missing Age values with median
df['age'].fillna(df['age'].median(), inplace=True)

# Fill missing Embarked with the most frequent value
df['embark_town'].fillna(df['embark_town'].mode()[0], inplace=True)

# Drop rows with too many missing values
df.dropna(subset=['deck'], inplace=True)
```

### **ðŸ“Œ Step 4: Encode Categorical Data**

``` python
python
CopyEdit
# Convert categorical columns to numerical using Label Encoding
df['sex'] = df['sex'].map({'male': 0, 'female': 1})

# One-Hot Encoding for "embark_town"
df = pd.get_dummies(df, columns=['embark_town'], drop_first=True)
```