### Window Functions

![](Images/99/99 Window Functions.jpg)

### First and Last Window Function

![](Images/99/99 First and Last Window Function.jpg)

### Syntax

![](Images/99/99 Syntax.jpg)

### Create Sample Dataframe

In [0]:
from pyspark.sql.functions import *

# create the input dataframe
data = [
    ('C1', '2023-06-01', 100.0),
    ('C1', '2023-06-02', 150.0),
    ('C1', '2023-06-03', 200.0),
    ('C2', '2023-06-01', 50.0),
    ('C2', '2023-06-02', 75.0),
    ('C2', '2023-06-03', 100.0),
]

df = spark.createDataFrame(data, ['Customer_Id', 'Transaction_Date', 'Amount'])

# convert the Transaction_Date column to date type
df = df.withColumn('Transaction_Date', to_date(col('Transaction_Date')))

# display the dataframe
df.display()

### What is Window Function?

In [0]:
from pyspark.sql.window import Window

windowSpec = Window.partitionBy('Customer_Id')
resultDf = df.withColumn('First_Transaction_Date', first('Transaction_Date').over(windowSpec)) \
    .withColumn('Last_Transaction_Date', last('Transaction_Date').over(windowSpec))

resultDf.display()

### Find First and Last Transaction Date for Each Customer

In [0]:
resultDf = df.withColumn('First_Transaction_Date', first('Transaction_Date').over(windowSpec)) \
    .withColumn('Last_Transaction_Date', last('Transaction_Date').over(windowSpec)).drop('Transaction_Date', 'Amount').distinct()

display(resultDf)

### How to Perfrom same Transformation using Spark SQL

In [0]:
# register the dataframe as a temporary view
df.createOrReplaceTempView('Transactions')

### SQL Window Function

In [0]:
%sql
Select distinct Customer_Id,
       FIRST(Transaction_Date) OVER (PARTITION BY Customer_Id) as First_Transaction_Date,
       LAST(Transaction_Date) OVER (PARTITION BY Customer_Id) as Last_Transaction_Date
from Transactions
ORDER BY Customer_Id