# Pandas 

## Filtering Rows & Columns

`loc[]` vs `iloc[]` 
- df.loc[row_label, column_label]
- df.iloc[row_index, column_index]

```python
df.loc['row1']                          # Row with label 'row1'
df.loc[:, 'column1']                    # Entire column named 'column1'
df.loc[3:7]                             # Rows with label 3 to 7 (inclusive)
df.loc[df['age'] > 30]                 # Filter rows where age > 30
df.loc[df['column'] == 'value']        # Filter by column match


```python
df.iloc[0]          # First row
df.iloc[-1]         # Last row
df.iloc[:-5]        # Last five rows
df.iloc[:, 0]       # First column
df.iloc[:, 0:7]     # First seven columns
df.iloc[:, :-3]     # All columns except the last three
df.iloc[3:7]        # Rows 3–6 (excludes row 7)
df.iloc[:]          # All rows and columns
df.iloc[::]         # All rows and columns
df.iloc[::2]        # Every other row
df.iloc[3:7:2]      # Row 3, 5

```Python
# Single condition
df[df["product_price"] > 10.00]

# AND condition
df[(df["item_name"] == "Canned Soda") & (df["quantity"] > 1)]

# OR condition
df[(df["age"] == 90) | (df["native-country"] == "Hungary")]

# isin() for Multiple Matching
df[df["age"].isin([30, 40, 50])]

# Column Subsetting with Condition
df.loc[df["item_name"] == "Veggie Salad Bowl",["item_name", "quantity"]]

# Use .idxmax() with loc
# Get the quantity of the most expensive item
chipo.loc[chipo["item_price"].idxmax(), "quantity"]


## Index 
- `set_index()` set one or more columns as the index of a DataFrame 
    -  Eg. `df.set_index('CustomerID')`

- `sort_index()` sort a DataFrame by rows or columns 
    - `df.sort_index(axis=0, ascending=True, inplace=False)` 
    - axis = 0: sort rows
    - axis = 1: sort columns 

- `reset_index()` clean up after shuffling 
    - Use `drop=True` to discard the old index entirely

- `index.get_level_values` accesses a single index level (in a MultiIndex) and return a Series of index values at that level

```Python 
# Resetting Index After Filtering
filtered_chipo = filtered_chipo.loc[filtered_chipo["product_price"] > 10.00].reset_index(drop=True)

# Set index to age but keep age as a column
df_index_age = df_pd.set_index("age", drop=False)

# Select all rows where index is 30
df_index_age.loc[30].head()

## Merge

`concat()` 
- pd.concat([df1, df2], axis=0 or 1, ignore_index=True or False)

```python
# Vertical Concatenation
df_vertical = pd.concat([df1, df2], axis=0, ignore_index=True)
# Horizontal Concatenation
df_horizontal = pd.concat([df1, df2], axis=1)
```

`merge()` 
- pd.merge(df1, df2, how='inner'/'outer'/'left'/'right', on='key_column')

```python
# Inner Join   
df_inner = pd.merge(df1, df2, how='inner', on='key_column')
# Left Join   
df_left = pd.merge(df1, df2, how='left', on='key_column')
# Right Join   
df_right = pd.merge(df1, df2, how='right', on='key_column')
# Outer Join   
df_outer = pd.merge(df1, df2, how='outer', on='key_column)
```


**Difference between concat and merge**
- `concat()` is used to stack DataFrames either vertically (adding rows) or horizontally (adding columns) based on their index or order.
- `merge()` is used to combine DataFrames based on the values of one or more common columns, similar to SQL joins.


## Visualization

### Pandas `.plot()` — Quick & Easy Plotting
Pandas offers a fast wrapper around matplotlib for exploratory analysis.

**Common Plot Types (`kind=...`):**

| Kind       | Chart Type         | Use Case                           |
|------------|--------------------|-------------------------------------|
| `'line'`   | Line plot           | Time series, trends (default)       |
| `'bar'`    | Vertical bar chart  | Category comparison                 |
| `'barh'`   | Horizontal bar      | Better for long category names      |
| `'hist'`   | Histogram           | Single variable distribution        |
| `'box'`    | Box plot            | Spread, outliers, quartiles         |
| `'kde'`    | Density plot        | Smooth curve of data distribution   |
| `'area'`   | Area plot           | Stacked time series                 |
| `'pie'`    | Pie chart           | Proportion of categories (few only) |
| `'scatter'`| Scatter plot        | Correlation between 2 variables     |
| `'hexbin'` | Hexbin plot         | High-density scatter (use `gridsize`) |

**Useful Parameters:**
- `x=`, `y=`: specify columns for axes  
- `title='...'`: chart title  
- `legend=True/False`: toggle legend  
- `figsize=(w, h)`: figure size  
- `grid=True`: show gridlines  
- `color='red'`: line or bar color  
- `rot=45`: rotate x-axis labels  
- `subplots=True`: split plots per column  
- `stacked=True`: stack areas/bars

Pros:
- One-liner plots for EDA
- Built-in integration with DataFrame

Cons:
- Limited customization
- Not suited for advanced layouts or multi-axis charts

---

### `matplotlib.pyplot` — Full Control for Publication-Quality Charts
Matplotlib is the foundational library behind most Python visualizations.

**Core Functions:**

| Function             | Description                          |
|----------------------|--------------------------------------|
| `plt.plot()`         | Line plot                            |
| `plt.bar()` / `barh()` | Bar/horizontal bar plot            |
| `plt.hist()`         | Histogram                            |
| `plt.scatter()`      | Scatter plot                         |
| `plt.pie()`          | Pie chart                            |
| `plt.boxplot()`      | Box plot                             |
| `plt.subplots()`     | Layout multiple plots                |
| `plt.axhline()` / `axvline()` | Add horizontal/vertical lines  |
| `plt.annotate()`     | Text annotations                     |
| `plt.legend()`       | Add legend                           |
| `plt.title()` / `xlabel()` / `ylabel()` | Set labels/titles     |
| `plt.xticks(rotation=...)` | Rotate x-axis tick labels        |
| `plt.grid(True)`     | Show grid                            |
| `plt.tight_layout()` | Adjust spacing between plots         |

**Advanced Features:**
- Layout control: `subplot`, `gridspec`
- Dual axes: `twinx()`
- Save figures: `plt.savefig("file.png")`
- Style themes: `plt.style.use("seaborn")`
- Supports direct NumPy arrays

Pros:
- Maximum flexibility for professional plots
- Supports complex layouts, annotations, customization

Cons:
- Verbose code
- Steeper learning curve than `pandas.plot()`

---

### `seaborn` — Statistical Data Visualization Made Simple
Seaborn is built on top of matplotlib and integrates tightly with pandas. It offers beautiful default styles and easy high-level functions for common statistical plots.

**Common Plotting Functions:**

| Function                | Chart Type                        | Use Case                                  |
|-------------------------|-----------------------------------|--------------------------------------------|
| `sns.lineplot()`        | Line plot                         | Time series, trends with CI bands          |
| `sns.barplot()`         | Bar plot with error bars          | Category mean comparisons                  |
| `sns.countplot()`       | Bar plot for counts               | Frequency of categorical variables         |
| `sns.histplot()`        | Histogram                         | Data distribution                          |
| `sns.boxplot()`         | Box plot                          | Median, IQR, outliers                      |
| `sns.violinplot()`      | Violin plot                       | Box + KDE for richer distribution view     |
| `sns.kdeplot()`         | KDE (density) plot                | Smooth data distribution                   |
| `sns.scatterplot()`     | Scatter plot                      | Relationship between 2 variables           |
| `sns.regplot()`         | Regression + scatter plot         | Linear fit + confidence interval           |
| `sns.lmplot()`          | Regression plot with grouping     | Faceted linear models                      |
| `sns.heatmap()`         | Heatmap                           | Correlation matrices, pivot tables         |
| `sns.pairplot()`        | Matrix of scatterplots            | Multivariate relationship exploration      |
| `sns.clustermap()`      | Hierarchical clustering heatmap   | Cluster analysis + heatmap                 |
| `sns.catplot()`         | Figure-level categorical plot     | Wrapper for `box`, `violin`, `strip` plots |

**Common Parameters Across Plots:**
- `data=df`: DataFrame to use  
- `x=`, `y=`, `hue=`: Variables to map to axes and color  
- `palette='Set2'`: Color theme  
- `col=` / `row=`: Create subplots by category  
- `ci=95`: Confidence interval for error bars  
- `kind='box'` / `'violin'` / `'bar'`: Plot type in multi-function APIs like `catplot()`

Pros:
- Clean, beautiful default styles
- Built-in support for grouping (`hue`, `col`, `row`)
- Excellent for statistical exploration
- Great integration with pandas

Cons:
- Less customizable than raw matplotlib
- Requires reshaping data sometimes (e.g. `melt()` for heatmaps)

Tip: Use `sns.set_theme()` or `sns.set_style('whitegrid')` at the top of your notebook for consistent aesthetics across plots.
