# Learn the basics of the Druid Window functions

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->
  
[Window functions](https://druid.apache.org/docs/latest/querying/sql-array-functions) in Apache Druid produce values based upon the relationship of one row within a window of rows to the other rows within the same window. A window is a group of related rows within a result set. For example, rows with the same value for a specific dimension.

This tutorial uses Wikipedia data to demonstrate the how window functions work in Druid.

## Prerequisites

This tutorial works with Druid 28.0.0 or later.

> Note that window functions are an exerimental feature.

In [3]:
import druidapi
import os

if 'DRUID_HOST' not in os.environ.keys():
    druid_host=f"http://localhost:8888"
else:
    druid_host=f"http://{os.environ['DRUID_HOST']}:8888"
    
print(f"Opening a connection to {druid_host}.")
druid = druidapi.jupyter_client(druid_host)

display = druid.display
sql_client = druid.sql
status_client = druid.status

status_client.version

Opening a connection to http://router:8888.


'28.0.0-SNAPSHOT'

## Load example data

The example queries demonstrate a comparison of the total delta value for a change event in Wikipedia by channel and by user. For that reason, we only need the timestamp, channel, user, and delta columns for the source data.

In [22]:
sql='''
REPLACE INTO "example-wikipedia-windows" OVERWRITE ALL
WITH "ext" AS (SELECT *
FROM TABLE(
  EXTERN(
    '{"type":"http","uris":["https://druid.apache.org/data/wikipedia.json.gz"]}',
    '{"type":"json"}'
  )
) EXTEND ("timestamp" VARCHAR, "channel" VARCHAR, "user" VARCHAR, "delta" BIGINT))
SELECT
  TIME_PARSE("timestamp") AS "__time",
  "channel",
  "user",
  "delta"
FROM "ext"
PARTITIONED BY DAY
'''

display.run_task(sql)
sql_client.wait_until_ready('example-wikipedia-windows')
display.table('example-wikipedia-windows')

Loading data, status:[SUCCESS]: 100%|██████████| 100.0/100.0 [00:09<00:00, 10.83it/s]


Position,Name,Type
1,__time,TIMESTAMP
2,channel,VARCHAR
3,delta,BIGINT
4,user,VARCHAR


The dataset describes changes that each individual `user` has made to Wikipedia pages within a `channel` expressed as the number of bytes added or deleted in the `delta` column and where `__time` is when the change was submitted. 

Run this query to have a look at the data:

In [49]:
query = """
SELECT
    __time,
    channel,
    user,
    delta
FROM "example-wikipedia-windows"
WHERE channel IN ('#kk.wikipedia', '#lt.wikipedia')
  AND __time BETWEEN '2016-06-27' AND '2016-06-28'
ORDER BY __time
"""
display.sql(query)

__time,channel,user,delta
2016-06-27T04:20:52.858Z,#kk.wikipedia,Нұрлан Рахымжанов,56
2016-06-27T04:35:03.186Z,#kk.wikipedia,Nurkhan,2440
2016-06-27T06:15:57.686Z,#kk.wikipedia,Шокай,91
2016-06-27T06:17:54.507Z,#lt.wikipedia,Powermelon,-2
2016-06-27T07:32:40.116Z,#kk.wikipedia,Салиха,-1
2016-06-27T07:55:47.080Z,#lt.wikipedia,Powermelon,13
2016-06-27T09:00:56.856Z,#kk.wikipedia,Салиха,2703
2016-06-27T09:05:11.299Z,#lt.wikipedia,80.4.147.222,894
2016-06-27T09:12:13.268Z,#lt.wikipedia,178.11.203.212,391
2016-06-27T09:23:24.677Z,#lt.wikipedia,178.11.203.212,56


## Window Functions in Druid

Druid implements Window Functions over aggregate queries. The general syntax is:
```
SELECT
    <dimensions>,
    <aggregation function(s)>
    window_function()
      OVER ( PARTITION BY <partitioning expression>
             ORDER BY <order expression>
             <window frame>
            )
    FROM <table>
    GROUP BY <dimensions>
```

The `GROUP BY \<dimensions\>` is applied first, calculating all non-window `<aggregation functions>` and then applying the window function over the aggregate results.

Start by defining the aggregation to use as the base query. 
In this example the query standardizes the wikipedia activity metrics by summarizing it by HOUR by `channel` by `user` as in:

In [60]:
query = """
SELECT
    channel, 
    TIME_FLOOR(__time, 'PT1H') as time_hour, 
    user,
    SUM(delta) net_user_changes
FROM "example-wikipedia-windows"
WHERE channel IN ('#kk.wikipedia', '#lt.wikipedia')
  AND __time BETWEEN '2016-06-27' AND '2016-06-28'
GROUP BY TIME_FLOOR(__time, 'PT1H'), channel, user
ORDER BY channel, TIME_FLOOR(__time, 'PT1H'), user

"""

req = sql_client.sql_request(query)
# Window functions are currently experimental. Set the enableWindiowing
# context parameter to "true" to use them.
req.add_context("enableWindowing", "true")
display.sql(req)


channel,time_hour,user,net_user_changes
#kk.wikipedia,2016-06-27T04:00:00.000Z,Nurkhan,2440
#kk.wikipedia,2016-06-27T04:00:00.000Z,Нұрлан Рахымжанов,56
#kk.wikipedia,2016-06-27T06:00:00.000Z,Шокай,91
#kk.wikipedia,2016-06-27T07:00:00.000Z,Салиха,-1
#kk.wikipedia,2016-06-27T09:00:00.000Z,Салиха,2702
#kk.wikipedia,2016-06-27T11:00:00.000Z,Нұрлан Рахымжанов,126
#kk.wikipedia,2016-06-27T15:00:00.000Z,Nurkhan,6900
#lt.wikipedia,2016-06-27T06:00:00.000Z,Powermelon,-2
#lt.wikipedia,2016-06-27T07:00:00.000Z,Powermelon,13
#lt.wikipedia,2016-06-27T09:00:00.000Z,178.11.203.212,447


## ORDER BY Windows

When the window definition only specifies ORDER BY <order expression>, it sorts the aggregate data set and applies the function in that order.

The following query uses `ORDER BY SUM(delta) DESC` to rank user hourly activity from the most changed the least changed within an hour:

In [58]:
query = """
SELECT
    TIME_FLOOR(__time, 'PT1H') as time_hour, 
    channel, 
    user,
    SUM(delta) net_user_changes,
    RANK( ) OVER ( ORDER BY SUM(delta) DESC ) editing_rank
FROM "example-wikipedia-windows"
WHERE channel IN ('#kk.wikipedia', '#lt.wikipedia')
  AND __time BETWEEN '2016-06-27' AND '2016-06-28'
GROUP BY TIME_FLOOR(__time, 'PT1H'), channel, user
ORDER BY 5 

"""

req = sql_client.sql_request(query)
# Window functions are currently experimental. Set the enableWindiowing
# context parameter to "true" to use them.
req.add_context("enableWindowing", "true")
display.sql(req)

time_hour,channel,user,net_user_changes,editing_rank
2016-06-27T15:00:00.000Z,#kk.wikipedia,Nurkhan,6900,1
2016-06-27T19:00:00.000Z,#lt.wikipedia,77.221.66.41,4358,2
2016-06-27T09:00:00.000Z,#kk.wikipedia,Салиха,2702,3
2016-06-27T04:00:00.000Z,#kk.wikipedia,Nurkhan,2440,4
2016-06-27T09:00:00.000Z,#lt.wikipedia,80.4.147.222,894,5
2016-06-27T09:00:00.000Z,#lt.wikipedia,178.11.203.212,447,6
2016-06-27T11:00:00.000Z,#kk.wikipedia,Нұрлан Рахымжанов,126,7
2016-06-27T06:00:00.000Z,#kk.wikipedia,Шокай,91,8
2016-06-27T11:00:00.000Z,#lt.wikipedia,MaryroseB54,59,9
2016-06-27T04:00:00.000Z,#kk.wikipedia,Нұрлан Рахымжанов,56,10


## PARTITION BY Windows

When a window only specifies `PARTITION BY <partition expression>` it calculates the aggregate window function over all the rows that share a <partitioning expression> values within the selected dataset.

In this example, the query uses two different windows `PARTITION BY channel` and `PARTITION BY user` to calculate the overall total activity in the channel and total activity by the user so that they can be compared to individual hourly activity.


In [59]:
query = """
SELECT
    TIME_FLOOR(__time, 'PT1H') as time_hour, channel, user,
    SUM(delta) hourly_user_changes,
    SUM(SUM(delta)) OVER (PARTITION BY user ) AS total_user_changes,
    SUM(SUM(delta)) OVER (PARTITION BY channel ) AS total_channel_changes
FROM "example-wikipedia-windows"
WHERE channel IN ('#kk.wikipedia', '#lt.wikipedia')
  AND __time BETWEEN '2016-06-27' AND '2016-06-28'
GROUP BY TIME_FLOOR(__time, 'PT1H'),2,3
ORDER BY channel,TIME_FLOOR(__time, 'PT1H'), user

"""

req = sql_client.sql_request(query)
# Window functions are currently experimental. Set the enableWindiowing
# context parameter to "true" to use them.
req.add_context("enableWindowing", "true")
display.sql(req)


time_hour,channel,user,hourly_user_changes,total_user_changes,total_channel_changes
2016-06-27T04:00:00.000Z,#kk.wikipedia,Nurkhan,2440,9340,12314
2016-06-27T04:00:00.000Z,#kk.wikipedia,Нұрлан Рахымжанов,56,182,12314
2016-06-27T06:00:00.000Z,#kk.wikipedia,Шокай,91,91,12314
2016-06-27T07:00:00.000Z,#kk.wikipedia,Салиха,-1,2701,12314
2016-06-27T09:00:00.000Z,#kk.wikipedia,Салиха,2702,2701,12314
2016-06-27T11:00:00.000Z,#kk.wikipedia,Нұрлан Рахымжанов,126,182,12314
2016-06-27T15:00:00.000Z,#kk.wikipedia,Nurkhan,6900,9340,12314
2016-06-27T06:00:00.000Z,#lt.wikipedia,Powermelon,-2,39,5851
2016-06-27T07:00:00.000Z,#lt.wikipedia,Powermelon,13,39,5851
2016-06-27T09:00:00.000Z,#lt.wikipedia,178.11.203.212,447,447,5851


Since the windows only define the PARTITION BY clause of the window, the calculation is done over the whole dataset for each value of the \<partition expression\>. Since the dataset is filtered for a single day, these window function results represent the total activity for the day, for the `user` and for the `channel` respectively.

Such a result helps us see the impact that individual user's hourly activity :
- the impact to the channel by comparing hourly_user_changes to total_channel_changes
- the impact of each user over the channel by total_user_changes to total_channel_changes
- the progress of each user's inidividal activity by comparing hourly_user_changes to total_user_changes

## PARTITION BY + ORDER BY Windows

By combining the two window types the query can do ordered calculations within each partition of data.


The following query ranks user hourly activity within the channel:

In [65]:
query = """
SELECT
    channel, 
    TIME_FLOOR(__time, 'PT1H') as time_hour, 
    user,
    SUM(delta) hourly_user_changes,
    RANK() OVER (PARTITION BY channel ORDER BY SUM(delta) DESC) AS rank_within_channel_day
FROM "example-wikipedia-windows"
WHERE channel IN ('#kk.wikipedia', '#lt.wikipedia')
  AND __time BETWEEN '2016-06-27' AND '2016-06-28'
GROUP BY 1, TIME_FLOOR(__time, 'PT1H'),3
ORDER BY channel, 5

"""

req = sql_client.sql_request(query)
# Window functions are currently experimental. Set the enableWindiowing
# context parameter to "true" to use them.
req.add_context("enableWindowing", "true")
display.sql(req)


channel,time_hour,user,hourly_user_changes,rank_within_channel_day
#kk.wikipedia,2016-06-27T15:00:00.000Z,Nurkhan,6900,1
#kk.wikipedia,2016-06-27T09:00:00.000Z,Салиха,2702,2
#kk.wikipedia,2016-06-27T04:00:00.000Z,Nurkhan,2440,3
#kk.wikipedia,2016-06-27T11:00:00.000Z,Нұрлан Рахымжанов,126,4
#kk.wikipedia,2016-06-27T06:00:00.000Z,Шокай,91,5
#kk.wikipedia,2016-06-27T04:00:00.000Z,Нұрлан Рахымжанов,56,6
#kk.wikipedia,2016-06-27T07:00:00.000Z,Салиха,-1,7
#lt.wikipedia,2016-06-27T19:00:00.000Z,77.221.66.41,4358,1
#lt.wikipedia,2016-06-27T09:00:00.000Z,80.4.147.222,894,2
#lt.wikipedia,2016-06-27T09:00:00.000Z,178.11.203.212,447,3


## Window Frames

Window frames are used to limit the set of rows used for the windowed aggregation.
The general form is:
```
<window funtion>
OVER (
        [ PARTITION BY <partition expression>] ORDER BY <order expression>
        ROWS BETWEEN <range start> AND <range end>
     )
```
`<range start>` and `<range end>` can take on values:
UNBOUND PRECEDING   - from the beggining of the partition as order by the \<order expression\>
N ROWS PRECEDING    - N rows before the current row as ordered by the \<order expression\>
CURRENT ROW         - the current row
N ROWS FOLLOWING    - N rows after the current row as ordered by the \<order expression\>
UNBOUNDED FOLLOWING - to the end of the partition as ordered by the \<order expression\>

The following query uses a few differnt window frames overall activity by channel: 

In [79]:
query = """
SELECT
    channel, 
    TIME_FLOOR(__time, 'PT1H')      AS time_hour, 
    SUM(delta)                      AS hourly_channel_changes,
    SUM(SUM(delta)) OVER cumulative AS cumulative_activity_in_channel,
    SUM(SUM(delta)) OVER moving5    AS csum5,
    COUNT(1) OVER moving5           AS count5
FROM "example-wikipedia-windows"
WHERE channel = '#en.wikipedia'
  AND __time BETWEEN '2016-06-27' AND '2016-06-28'
GROUP BY 1, TIME_FLOOR(__time, 'PT1H')
WINDOW cumulative AS (   
                         PARTITION BY channel 
                         ORDER BY TIME_FLOOR(__time, 'PT1H') 
                         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
                     )
                     ,
        moving5 AS ( 
                    PARTITION BY channel 
                    ORDER BY TIME_FLOOR(__time, 'PT1H') 
                    ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
                  )
"""

req = sql_client.sql_request(query)
# Window functions are currently experimental. Set the enableWindiowing
# context parameter to "true" to use them.
req.add_context("enableWindowing", "true")
display.sql(req)


channel,time_hour,hourly_channel_changes,cumulative_activity_in_channel,csum4,count4
#en.wikipedia,2016-06-27T00:00:00.000Z,74996,74996,74996,1
#en.wikipedia,2016-06-27T01:00:00.000Z,24150,99146,99146,2
#en.wikipedia,2016-06-27T02:00:00.000Z,102372,201518,201518,3
#en.wikipedia,2016-06-27T03:00:00.000Z,61362,262880,262880,4
#en.wikipedia,2016-06-27T04:00:00.000Z,61666,324546,324546,5
#en.wikipedia,2016-06-27T05:00:00.000Z,144199,468745,393749,5
#en.wikipedia,2016-06-27T06:00:00.000Z,33414,502159,403013,5
#en.wikipedia,2016-06-27T07:00:00.000Z,79397,581556,380038,5
#en.wikipedia,2016-06-27T08:00:00.000Z,104436,685992,423112,5
#en.wikipedia,2016-06-27T09:00:00.000Z,58020,744012,419466,5


The example above uses the WINDOW clause to define multiple window specifications that can be reused for many window function calculations.
The query uses two windows:
- `cumulative` is partitioned by `channel` and includes all rows from the beginning of partition up to the current row as ordered by `__time` which enables cumulative aggregation
- `moving5` is also partitioned by channel but only includes up to the last 4 rows and the current row as ordered by time

Notice in the `count5` resulting column that the number of rows considered for the `moving5` window:
- starts at 1 because there are no rows before the current one
- and grows up to 5 as defined by `ROWS BETWEEN 4 PRECEDING AND CURRENT ROW`

## Ranking Functions
Ranking window functions calculate their results based on the ORDER BY clause in the window definition.
The next query looks at the activity of a single channel `#lt.wikipedia`  during a single hour `__time BETWEEN '2016-06-27 00:00:00' AND '2016-06-27 01:00:00'` and uses all ranking functions ordered by the total activity by user rounded to the nearest hundred. The rounding causes some ties in the values which is useful to see the difference among rank functions:

In [112]:
query = """
SELECT 
    channel,
    user,
    ROUND(SUM(delta),-2) AS hourly_change_rounded
    ,ROW_NUMBER()   OVER (  ORDER BY ROUND(SUM(delta),-2) DESC ) AS row_no
    ,RANK()         OVER (  ORDER BY ROUND(SUM(delta),-2) DESC ) AS rank_no
    ,DENSE_RANK()   OVER (  ORDER BY ROUND(SUM(delta),-2) DESC ) AS dense_rank_no
    ,PERCENT_RANK() OVER (  ORDER BY ROUND(SUM(delta),-2) DESC ) AS pct_rank
    ,CUME_DIST()    OVER (  ORDER BY ROUND(SUM(delta),-2) DESC ) AS cumulative_dist
    ,NTILE(4)       OVER (  ORDER BY ROUND(SUM(delta),-2) DESC ) AS ntile_val
FROM "example-wikipedia-windows"
WHERE channel = '#en.wikipedia'
  AND __time BETWEEN '2016-06-27 00:00:00' AND '2016-06-27 01:00:00'
GROUP BY TIME_FLOOR(__time, 'PT1H'), channel, user

"""

req = sql_client.sql_request(query)
req.add_context("enableWindowing", "true")
display.sql(req)

channel,user,hourly_change_rounded,row_no,rank_no,dense_rank_no,pct_rank,cumulative_dist,ntile_val
#en.wikipedia,MediaWiki message delivery,24400,1,1,1,0.0,0.0048780487804878,1
#en.wikipedia,Kind Tennis Fan,10600,2,2,2,0.0049019607843137,0.0097560975609756,1
#en.wikipedia,ReferenceBot,5700,3,3,3,0.0098039215686274,0.0146341463414634,1
#en.wikipedia,ClueBot NG,2400,4,4,4,0.0147058823529411,0.0195121951219512,1
#en.wikipedia,Ian.thomson,2200,5,5,5,0.0196078431372549,0.0390243902439024,1
#en.wikipedia,Jim1138,2200,6,5,5,0.0196078431372549,0.0390243902439024,1
#en.wikipedia,Nakon,2200,7,5,5,0.0196078431372549,0.0390243902439024,1
#en.wikipedia,SuggestBot,2200,8,5,5,0.0196078431372549,0.0390243902439024,1
#en.wikipedia,EranBot,1500,9,9,6,0.0392156862745098,0.0439024390243902,1
#en.wikipedia,Derek R Bullamore,1400,10,10,7,0.0441176470588235,0.048780487804878,1


Notice the differences in ordinal ranking values:
- row_no  - `ROW_NUMBER()` is grows monotonically by one for each row, regardless of ties.
- rank_no - `RANK()` assigned the same rank value of `5` to the tied rows but then skips to `9` for the next row because there are 8 rows before it.
- dense_rank_no - `DENSE_RANK()` also assigned the same rank of `5` to the tied values but then continues with `6`.

Distribution ranks:
- pct_rank  - calculates `(rank() -1 ) / ( rows in partition - 1 )` providing a measure of what percentage of values fall before the current value in the distribution.
- cume_dist - calculates the cumulative distribution, it can be read as, this rows is in the top `cume_dist result` percent of the population
- ntile(N)  - calculates which dsitribution bucket the row corresponds to, where N is the number of buckets, ntile(4) is calculating quartiles, ntile(100) would calculate percentile.
   

In [47]:
query = """
SELECT FLOOR(__time TO DAY) AS event_time,
    channel,
    ABS(delta) AS change,
    ROW_NUMBER() OVER w AS row_no,
    RANK() OVER w AS rank_no,
    DENSE_RANK() OVER w AS dense_rank_no,
    PERCENT_RANK() OVER w AS pct_rank,
    CUME_DIST() OVER w AS cumulative_dist,
    NTILE(4) OVER w AS ntile_val,
    LAG(ABS(delta), 1, 0) OVER w AS lag_val,
    LEAD(ABS(delta), 1, 0) OVER w AS lead_val,
    FIRST_VALUE(ABS(delta)) OVER w AS first_val,
    LAST_VALUE(ABS(delta)) OVER w AS last_val
FROM "example-wikipedia-windows"
WHERE channel IN ('#kk.wikipedia', '#lt.wikipedia')
GROUP BY channel, ABS(delta), FLOOR(__time TO DAY) 
WINDOW w AS (PARTITION BY channel ORDER BY ABS(delta) ASC)
"""

req = sql_client.sql_request(query)
req.add_context("enableWindowing", "true")
display.sql(req)

event_time,channel,change,row_no,rank_no,dense_rank_no,pct_rank,cumulative_dist,ntile_val,lag_val,lead_val,first_val,last_val
2016-06-27T00:00:00.000Z,#kk.wikipedia,1,1,1,1,0.0,0.125,1,,7.0,1,6900
2016-06-27T00:00:00.000Z,#kk.wikipedia,7,2,2,2,0.1428571428571428,0.25,1,1.0,56.0,1,6900
2016-06-27T00:00:00.000Z,#kk.wikipedia,56,3,3,3,0.2857142857142857,0.375,2,7.0,63.0,1,6900
2016-06-27T00:00:00.000Z,#kk.wikipedia,63,4,4,4,0.4285714285714285,0.5,2,56.0,91.0,1,6900
2016-06-27T00:00:00.000Z,#kk.wikipedia,91,5,5,5,0.5714285714285714,0.625,3,63.0,2440.0,1,6900
2016-06-27T00:00:00.000Z,#kk.wikipedia,2440,6,6,6,0.7142857142857143,0.75,3,91.0,2703.0,1,6900
2016-06-27T00:00:00.000Z,#kk.wikipedia,2703,7,7,7,0.8571428571428571,0.875,4,2440.0,6900.0,1,6900
2016-06-27T00:00:00.000Z,#kk.wikipedia,6900,8,8,8,1.0,1.0,4,2703.0,,1,6900
2016-06-27T00:00:00.000Z,#lt.wikipedia,1,1,1,1,0.0,0.1,1,,2.0,1,4358
2016-06-27T00:00:00.000Z,#lt.wikipedia,2,2,2,2,0.1111111111111111,0.2,1,1.0,13.0,1,4358


In [None]:
INSERT INTO test VALUES('2023-10-01T00:00:00.000Z','a',5  );
INSERT INTO test VALUES('2023-10-01T00:00:00.000Z','a',5  );
INSERT INTO test VALUES('2023-10-01T00:00:00.000Z','a',5  );
INSERT INTO test VALUES('2023-10-01T00:00:00.000Z','b',5  );
INSERT INTO test VALUES('2023-10-01T00:00:01.000Z','a',10  );
INSERT INTO test VALUES('2023-10-01T00:00:01.000Z','b',5  );
INSERT INTO test VALUES('2023-10-01T00:00:02.000Z','a',10  );
INSERT INTO test VALUES('2023-10-01T00:00:02.000Z','b',5  );
INSERT INTO test VALUES('2023-10-01T00:00:03.000Z','b',10  );
INSERT INTO test VALUES('2023-10-01T00:00:04.000Z','b',10  );

The following query illustrates all of the built in window functions using the same data:

* ROW_NUMBER which returns the numeric value for the row within the window
* RANK which returns the numeric rank of the row within the window
* DENSE_RANK which returns the rank of the row with no gaps. For example, if two rows tie for 1, the next row is ranked 2
* PERCENT_RANK which returns the percentage rank of the row within the window according to the formula (rank - 1)/(total window rows - 1)
* CUME_DIST which returns the cumulative distribution of the current row within the window calculated as number of window (rows at the same rank or higher than current row) / (total window rows)
* NTILE which divides the number of results as evently as possible into a number of tiles as and returns the value of tile for the row
* LAG which returns the value for the row that preceeds the current row by a given offset
* LEAD which returns the value for the row that follows the current row by a given offset
* FIRST_VALUE which returns the value for the expression for the first row within the window
* LAST_VALUE which returns the value for the expression for the last row within the window

See [Window functions](https://druid.apache.org/docs/latest/querying/sql-window-functions) for syntax and more detail.

The query also demonstrates how you can alias a window within the SELECT clause and define it later with the WINDOW keyword.