<a href="https://colab.research.google.com/github/sreesanthrnair/DSA_Notes/blob/main/JOIN_Operations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


##  1. JOIN Operations

Joins combine rows from two or more tables based on a related column.

###  Types of Joins

| Join Type     | Description | Example |
|---------------|-------------|---------|
| **Inner Join** | Returns matching rows from both tables | Customers with orders |
| **Left Join**  | All rows from left + matching from right | All customers, even if no orders |
| **Right Join** | All rows from right + matching from left | All orders, even if customer missing |
| **Full Join**  | All rows from both tables | Combine all data, fill missing with NULL |
| **Cross Join** | Cartesian product of both tables | Every customer with every product |

###  SQL Example
```sql
SELECT a.name, b.order_id
FROM customers a
JOIN orders b ON a.customer_id = b.customer_id;
```

###  pandas Example
```python
pd.merge(customers, orders, on='customer_id', how='inner')
```

###  PySpark Example
```python
customers.join(orders, on='customer_id', how='inner')
```

---

##  2. GROUP BY

Used to aggregate data based on one or more columns.

###  Common Aggregations
- `COUNT()`
- `SUM()`
- `AVG()`
- `MAX()`, `MIN()`

###  SQL Example
```sql
SELECT region, SUM(sales) AS total_sales
FROM orders
GROUP BY region;
```

###  pandas Example
```python
orders.groupby('region')['sales'].sum().reset_index()
```

###  PySpark Example
```python
orders.groupBy('region').agg({'sales': 'sum'}).show()
```

---

##  3. Window Functions

Window functions perform calculations across a set of rows related to the current row—without collapsing the result like `GROUP BY`.

###  Use Cases
- Ranking (e.g., `RANK()`, `DENSE_RANK()`)
- Running totals (`SUM() OVER`)
- Lag/Lead analysis (`LAG()`, `LEAD()`)

###  SQL Example
```sql
SELECT name, sales,
       RANK() OVER (PARTITION BY region ORDER BY sales DESC) AS rank
FROM orders;
```

###  PySpark Example
```python
from pyspark.sql.window import Window
from pyspark.sql.functions import rank

windowSpec = Window.partitionBy("region").orderBy("sales")
orders.withColumn("rank", rank().over(windowSpec)).show()
```

---

##  Tips for Mastery

- Use **JOIN** to enrich data, **GROUP BY** to summarize, and **WINDOW** to analyze trends or patterns.
- In **pandas**, chaining `.groupby()` with `.agg()` or `.apply()` gives flexibility.
- In **PySpark**, window functions are powerful for scalable analytics—especially in time-series or ranking problems.

