##### Given a Weather table, write a SQL query to find all dates' Ids with higher temperature compared to its previous (yesterday's) dates.
+---------+------------------+------------------+<br>
| Id(INT) | RecordDate(DATE) | Temperature(INT) |<br>
+---------+------------------+------------------+<br>
| 1 | 2015-01-01 | 10 |<br>
| 2 | 2015-01-02 | 25 |<br>
| 3 | 2015-01-03 | 20 |<br>
| 4 | 2015-01-04 | 30 |<br>
+---------+------------------+------------------+<br>
For example, return the following Ids for the above Weather table:<br>
+----+<br>
| Id |<br>
+----+<br>
| 2 |<br>
| 4 |<br>
+----+<br>

##### PySpark Solution

In [0]:
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, DateType

data = [(1, '2015-01-01', 10), (2, '2015-01-02', 25), (3, '2015-01-03', 20), (4, '2015-01-04', 30)]

schema = StructType([
    StructField('Id', IntegerType(), True),
    StructField('RecordDate', StringType(), True),
    StructField('Temperature', IntegerType(), True)
])

df = spark.createDataFrame(data, schema=schema)

# Convert 'RecordDate' column to DateType
df = df.withColumn('RecordDate', df['RecordDate'].cast(DateType()))

df.show()

+---+----------+-----------+
| Id|RecordDate|Temperature|
+---+----------+-----------+
|  1|2015-01-01|         10|
|  2|2015-01-02|         25|
|  3|2015-01-03|         20|
|  4|2015-01-04|         30|
+---+----------+-----------+



In [0]:
from pyspark.sql.functions import lag
from pyspark.sql.window import Window

df2 = (df
       .withColumn("lag", lag("Temperature", 1).over(Window.orderBy('RecordDate')))
       .select('Id')
       .filter("Temperature > lag")
       )

df2.show()

+---+
| Id|
+---+
|  2|
|  4|
+---+

