# Odd and Even Measurements  
**Google SQL Interview Question**

---

### Question  
This is the same question as problem #28 in the SQL Chapter of *Ace the Data Science Interview!*

Assume you're given a table with measurement values obtained from a Google sensor over multiple days with measurements taken multiple times within each day.

Write a query to calculate the sum of odd-numbered and even-numbered measurements separately for a particular day and display the results in two different columns. Refer to the Example Output below for the desired format.

---

### Definition:  
- Within a day, measurements taken at 1st, 3rd, and 5th times are considered **odd-numbered** measurements.
- Measurements taken at 2nd, 4th, and 6th times are considered **even-numbered** measurements.

> Effective April 15th, 2023, the question and solution for this question have been revised.

---

### `measurements` Table:

| Column Name         | Type      |
|---------------------|-----------|
| measurement_id      | integer   |
| measurement_value   | decimal   |
| measurement_time    | datetime  |

---

### Example Input:

| measurement_id | measurement_value | measurement_time       |
|----------------|-------------------|------------------------|
| 131233         | 1109.51           | 07/10/2022 09:00:00    |
| 135211         | 1662.74           | 07/10/2022 11:00:00    |
| 523542         | 1246.24           | 07/10/2022 13:15:00    |
| 143562         | 1124.50           | 07/11/2022 15:00:00    |
| 346462         | 1234.14           | 07/11/2022 16:45:00    |

---

### Example Output:

| measurement_day       | odd_sum | even_sum |
|-----------------------|---------|----------|
| 07/10/2022 00:00:00   | 2355.75 | 1662.74  |
| 07/11/2022 00:00:00   | 1124.50 | 1234.14  |

---

### Explanation:  
On **07/10/2022**, the sum of the odd-numbered measurements (1st: 1109.51, 3rd: 1246.24) is **2355.75**, and the even-numbered measurement (2nd: 1662.74) totals **1662.74**.  
On **07/11/2022**, the 1st and 2nd measurements are **1124.50** and **1234.14** respectively, giving the odd and even sums accordingly.


In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, DoubleType, TimestampType
from pyspark.sql.functions import *
from datetime import datetime

# Initialize Spark session
spark = SparkSession.builder.master('local[1]').appName("OddEvenMeasurements").getOrCreate()

# Define schema
schema = StructType([
    StructField("measurement_id", IntegerType(), True),
    StructField("measurement_value", DoubleType(), True),
    StructField("measurement_time", TimestampType(), True)
])

# Sample data
data = [
    (131233, 1109.51, datetime.strptime("07/10/2022 09:00:00", "%m/%d/%Y %H:%M:%S")),
    (135211, 1662.74, datetime.strptime("07/10/2022 11:00:00", "%m/%d/%Y %H:%M:%S")),
    (523542, 1246.24, datetime.strptime("07/10/2022 13:15:00", "%m/%d/%Y %H:%M:%S")),
    (143562, 1124.50, datetime.strptime("07/11/2022 15:00:00", "%m/%d/%Y %H:%M:%S")),
    (346462, 1234.14, datetime.strptime("07/11/2022 16:45:00", "%m/%d/%Y %H:%M:%S"))
]

# Create DataFrame
df = spark.createDataFrame(data, schema)

# Show the DataFrame
df.show(truncate=False)


+--------------+-----------------+-------------------+
|measurement_id|measurement_value|measurement_time   |
+--------------+-----------------+-------------------+
|131233        |1109.51          |2022-07-10 09:00:00|
|135211        |1662.74          |2022-07-10 11:00:00|
|523542        |1246.24          |2022-07-10 13:15:00|
|143562        |1124.5           |2022-07-11 15:00:00|
|346462        |1234.14          |2022-07-11 16:45:00|
+--------------+-----------------+-------------------+



In [2]:
from pyspark.sql.window import Window

df\
    .withColumn('measurement_day',to_date('measurement_time'))\
    .withColumn('rnk',row_number().over(Window.partitionBy('measurement_day').orderBy('measurement_time')))\
    .groupBy('measurement_day')\
    .agg( sum(when(col('rnk')%2!=0,col('measurement_value'))).alias('odd_sum'),
          sum(when(col('rnk')%2==0,col('measurement_value'))).alias('even_sum'))\
    .show()


+---------------+-------+--------+
|measurement_day|odd_sum|even_sum|
+---------------+-------+--------+
|     2022-07-10|2355.75| 1662.74|
|     2022-07-11| 1124.5| 1234.14|
+---------------+-------+--------+



In [3]:
df.createOrReplaceTempView('measurements')

spark.sql('''

with cte as
(SELECT 
measurement_value,
CAST(measurement_time as date) as measurement_day,
row_number() over(PARTITION BY CAST(measurement_time as date) 
                  order by measurement_time) as rnk
FROM measurements)

SELECT 
  measurement_day,
  sum(case when rnk%2!=0 then measurement_value END) as odd_sum,
  sum(case when rnk%2=0 then measurement_value END) as even_sum
from cte
GROUP BY measurement_day;''').show()

+---------------+-------+--------+
|measurement_day|odd_sum|even_sum|
+---------------+-------+--------+
|     2022-07-10|2355.75| 1662.74|
|     2022-07-11| 1124.5| 1234.14|
+---------------+-------+--------+

