You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Briefly, the function last_value() does not return the last value in a window. Only running first_value() with ORDER BY ... DESC returns the last value correctly.
The following is my test data:
c_caseid | country | city | AssignDate |
1c741bd70b2322007518478d83673af3 | USA | Kansas City | 1481560249000 |
1c741bd70b2322007518478d83673af3 | USA | Solana Beach | 1481590617000 |
When I run the following query
SELECT DISTINCT
c_caseid,
first_value(city) OVER (PARTITION BY c_caseid ORDER BY AssignDate) AS IncidentFirstAssignedToCity,
last_value(city) OVER (PARTITION BY c_caseid ORDER BY AssignDate) AS IncidentLastAssignedToCity
FROM test
I get the following result:
c_caseid | IncidentFirstAssignedToCity | IncidentLastAssignedToCity |
1c741bd70b2322007518478d83673af3 | Kansas City | Kansas City |
1c741bd70b2322007518478d83673af3 | Kansas City | Solana Beach |
It's as if last_value() provides two different results, instead of one? When I change the query to first_value() and order in reverse, as follows:
SELECT DISTINCT
c_caseid,
first_value(city) OVER (PARTITION BY c_caseid ORDER BY AssignDate) AS IncidentFirstAssignedToCity,
first_value(city) OVER (PARTITION BY c_caseid ORDER BY AssignDate DESC) AS IncidentLastAssignedToCity
FROM test
``
last_ Value: the default statistical range is rows between unbounded preceding and current row
example:
SELECT DISTINCT
c_caseid,
first_value(city) OVER (PARTITION BY c_caseid ORDER BY AssignDate) AS IncidentFirstAssignedToCity,
last_value(city) OVER (PARTITION BY c_caseid ORDER BY AssignDate rows between unbounded preceding and current row ) AS IncidentLastAssignedToCity
FROM test
last_value(city) OVER (PARTITION BY c_caseid ORDER BY AssignDate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
But then either the implementation of the function, or documentation need to be adjusted. It seems counter-intuitive that I need to specify which rows to use when the documentation clearly says it considers the whole window; and considering I am not defining any rows, offsets, etc. I assume the entire partition is considered as a window?
Briefly, the function
last_value()
does not return the last value in a window. Only runningfirst_value()
withORDER BY ... DESC
returns the last value correctly.The following is my test data:
When I run the following query
I get the following result:
It's as if
last_value()
provides two different results, instead of one? When I change the query tofirst_value()
and order in reverse, as follows:Only then I get the correct result:
Am I doing something wrong, or is this actually a bug? I'm running the queries on AWS Athena.
The text was updated successfully, but these errors were encountered: