# Median latitude

* Source: https://towardsdatascience.com/twenty-five-sql-practice-exercises-5fc791e24082
* Source: https://www.hackerrank.com/challenges/weather-observation-station-20/problem

Write a query to return the median latitude of weather stations from each state in the following table, rounding to the nearest tenth of a degree. Note that there is no MEDIAN() function in SQL!

In [1]:
%run Question.ipynb

 * postgresql://fknight:***@localhost/postgres
Done.
Done.
9 rows affected.
9 rows affected.


# Part A

In an effort to find the median, we should place a integer ranking on each row, per state. We will also need the total for each state.

In [5]:
%%sql

SELECT 
    *, 
    row_number() OVER (PARTITION by state ORDER BY latitude ASC) 
        AS state_rank,
    count(*) OVER (PARTITION by state) 
        AS state_count
FROM stations

 * postgresql://fknight:***@localhost/postgres
9 rows affected.


id,city,state,latitude,longitude,state_rank,state_count
4,Davidson,North Carolina,35.5,80.8,1,5
1,Asheville,North Carolina,35.6,82.6,2,5
3,Chapel Hill,North Carolina,35.9,79.1,3,5
2,Burlington,North Carolina,36.1,79.4,4,5
5,Elizabeth City,North Carolina,36.3,76.3,5,5
8,Hettinger,North Dakota,46.0,102.6,1,4
6,Fargo,North Dakota,46.9,96.8,2,4
7,Grand Forks,North Dakota,47.9,97.0,3,4
9,Inkster,North Dakota,48.2,97.6,4,4


# Part B

Using the subquery from Part A, solve the original problem.

```sql
WITH ranks_and_counts AS (
    SELECT 
        *, 
        row_number() 
            OVER (PARTITION by state ORDER BY latitude ASC) 
            AS state_rank,
        count(*) OVER (PARTITION by state) 
            AS state_count
    FROM stations
)
```

In [8]:
%%sql

WITH ranks_and_counts AS (
    SELECT 
        *, 
        row_number() OVER (PARTITION by state ORDER BY latitude ASC) 
            AS state_rank,
        count(*) OVER (PARTITION by state) 
            AS state_count
    FROM stations
)

SELECT state, CAST(avg(latitude) AS float) AS median_latitude 
FROM ranks_and_counts

WHERE
     (1.0*state_count/2) <= state_rank AND
              state_rank <= (1.0*state_count/2 + 1)
GROUP BY state;

 * postgresql://fknight:***@localhost/postgres
2 rows affected.


state,median_latitude
North Carolina,35.9
North Dakota,47.4


## The full solution is given below

In [2]:
%%sql

-- assign latitude-ordered row numbers for each state, and get total
-- row count for each state

WITH ranks_and_counts AS (
    SELECT 
        *, 
        row_number() OVER (PARTITION by state ORDER BY latitude ASC) 
            AS row_number_state,
        count(*) OVER (PARTITION by state) 
            AS row_count
    FROM stations 
)

-- filter to middle row (for odd total row number) or middle two rows
-- (for even total row number), then get average value of those, 
-- grouping by state

SELECT state, CAST(avg(latitude) AS float) AS median_latitude 
FROM ranks_and_counts

WHERE
    1.0*row_count/2  <= row_number_state
    AND
    row_number_state <= 1.0*row_count/2 + 1
    
GROUP BY state;

 * postgresql://fknight:***@localhost/postgres
2 rows affected.


state,median_latitude
North Carolina,35.9
North Dakota,47.4
