## 1285. Find the Start and End Number of Continuous Ranges
### Table: Logs

| Column Name | Type |
|-------------|------|
| log_id      | int  |

id is the primary key for this table.  
Each row of this table contains the ID in a log Table.

Since some IDs have been removed from Logs. Write an SQL query to find the start and end number of continuous ranges in table Logs.

Order the result table by start_id.

---

### Logs table:

| log_id |
|--------|
| 1      |
| 2      |
| 3      |
| 7      |
| 8      |
| 10     |

---

### Result table:

| start_id | end_id |
|----------|--------|
| 1        | 3      |
| 7        | 8      |
| 10       | 10     |

---

**Explanation:**  
The result table should contain all ranges in table Logs.  
From 1 to 3 is contained in the table.  
From 4 to 6 is missing in the table  
From 7 to 8 is contained in the table.  
Number 9 is missing in the table.  
Number 10 is contained in the table.

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType
from pyspark.sql.functions import col, lag, row_number, min, max
from pyspark.sql.window import Window

# Start Spark session
spark = SparkSession.builder.appName("LogRanges").getOrCreate()

# Define schema
schema = StructType([
    StructField("log_id", IntegerType(), True)
])

# Sample data
data = [(1,), (2,), (3,), (7,), (8,), (10,)]

# Create DataFrame
df = spark.createDataFrame(data, schema)
 

# Register temp view
df.createOrReplaceTempView("Logs")




In [0]:
from pyspark.sql.functions import *
from pyspark.sql.window import * 

win_spec = Window.orderBy("log_id")
rn = row_number().over(win_spec)
result = df.withColumn("rn",rn).withColumn("group_id",col("log_id")-col("rn"))
result.groupBy("group_id").agg(min("log_id").alias("start_id"),max("log_id").alias("end_id")).orderBy("start_id").select("start_id","end_id").display()

In [0]:
%sql

with cte as (
  select 
  log_id , 
  row_number()over(order by log_id ) as rn  , (log_id - rn) as diff from logs
)
Select min (log_id) as start_id ,max(log_id) as end_id from cte group by diff



In [0]:
%sql
WITH NumberedLogs AS (
  SELECT 
    log_id,
    ROW_NUMBER() OVER (ORDER BY log_id) AS row_num
  FROM Logs
),
GroupedLogs AS (
  SELECT 
    log_id,
    row_num,
    log_id - row_num AS group_id
  FROM NumberedLogs
)
SELECT 
  MIN(log_id) AS start_id,
  MAX(log_id) AS end_id, group_id
FROM GroupedLogs
GROUP BY group_id
ORDER BY start_id;

In [0]:
# Step 1: Sort and assign row numbers
window_spec = Window.orderBy("log_id")
df_with_row = df.withColumn("row_num", row_number().over(window_spec))

# Step 2: Compute group identifier
df_grouped = df_with_row.withColumn("group_id", col("log_id") - col("row_num"))
# Step 3: Group by group_id and aggregate
result = df_grouped.groupBy("group_id").agg(
    min("log_id").alias("start_id"),
    max("log_id").alias("end_id")
).orderBy("start_id")

# Display result
display(result)