# Election data

The database for these excercises is called `ex_election`.

The process for obtaining this data is described in [this blog post](https://kiwidamien.github.io/munging-with-multiindices-election-data.html).

This is a copy of presidential election results per state from 1952 to 2016. It is a medium sized dataset (i.e. it is probably difficult to do the queries "by hand", but it isn't large enough to stress test your queries).

If you think you have found an error in the questions below, please open a Github Issue.

## Note on table format

Note that some of these questions would be considerably easier if the election data was _tidy_. At the moment, the rows in the election data take the form
```
 state | democrat_votes | republican_votes | other_votes | year 
-------+----------------+------------------+-------------+------
 AL    |         275075 |           149231 |           0 | 1952
 AR    |         226300 |           177155 |           0 | 1952
 AZ    |         108528 |           152042 |           0 | 1952
 CA    |        2257646 |          3035587 |           0 | 1952
```

A tidy dataset would take the form:
```
 state | votes  |   party    | year 
-------+--------+------------+------
 AL    |      0 | other      | 1952
 AL    | 149231 | republican | 1952
 AL    | 275075 | democrat   | 1952
 AR    | 177155 | republican | 1952
 AR    | 226300 | democrat   | 1952
 AR    |      0 | other      | 1952
 AZ    | 152042 | republican | 1952
 AZ    | 108528 | democrat   | 1952
 AZ    |      0 | other      | 1952
 CA    |      0 | other      | 1952
```

Sadly, we don't get to choose the format of the data. However, you could transform the data to look like this using a VIEW (hint: see the UNION command). For reference, the instructions for creating this view are included in this subdirectory, but you should try making it yourself first.

It is not required to transform your data this way, but you might find some of the queries easier. 

In [2]:
%load_ext sql
%sql postgres://localhost/ex_election

'Connected: @ex_election'

## Questions

1. **How many candidates are in the candidate table for the 2000 election?**

In [3]:
%%sql
SELECT 
    count(candidate)
FROM
    candidate 
WHERE
    year = 2000

 * postgres://localhost/ex_election
1 rows affected.


count
3


In [4]:
%%sql 
SELECT
    candidate
FROM
    candidate 
WHERE
    year = 2000

 * postgres://localhost/ex_election
3 rows affected.


candidate
"Gore, Al"
"Nader, Ralph"
"Bush, George W."


2. **How many candidates are in the candidate table for each election from 1984 to 2016?**

In [5]:
%%sql
SELECT
    year, count(candidate)
FROM
    candidate 
WHERE
    year >= 1984
AND
    year <= 2016
GROUP BY 
    year

 * postgres://localhost/ex_election
9 rows affected.


year,count
1984,2
1988,2
1992,3
1996,3
2000,3
2004,2
2008,2
2012,2
2016,3


3. **For each election from 1984 to 2016, give the party that won the popular vote (i.e. the most votes, not the most electoral college seats)**

In [6]:
%%sql
CREATE VIEW tidy_election
AS
    SELECT 
        state
        ,year
        ,'democrat' party
        ,democrat_votes votes
    FROM
        election
UNION
    SELECT 
        state
        ,year
        ,'republican' party
        ,republican_votes votes
    FROM
        election
UNION
    SELECT
        state
        ,year
        ,'other' party
        ,other_votes votes
    FROM
        election
        

 * postgres://localhost/ex_election
Done.


[]

In [7]:
%%sql
WITH 
    party_totals AS (
        SELECT
            year
            ,party
            ,sum(votes) total
            ,RANK() OVER(PARTITION BY YEAR ORDER BY SUM(votes) DESC) 
        FROM
            tidy_election
        GROUP BY 
            year, party
)
SELECT
    year, party, total
FROM
    party_totals
WHERE    
    rank = 1
AND
    year >= 1984

 * postgres://localhost/ex_election
9 rows affected.


year,party,total
1984,republican,54455472
1988,republican,48886597
1992,democrat,44909806
1996,democrat,47400125
2000,democrat,51009810
2004,republican,62039572
2008,democrat,69499428
2012,democrat,65918507
2016,democrat,65853625


4. **Extension of previous question: for each election from 1984 to 2016, give the party that won the popular vote and the margin (i.e. the amount that the winning party got over the party that came in second place).** You can assume that the third party votes ("Other") are irrelevant, and just compare Democrats and Republicans.

In [8]:
%%sql
WITH 
    party_totals AS (
        SELECT
            year
            ,party
            ,sum(votes) total
            ,RANK() OVER(PARTITION BY YEAR ORDER BY SUM(votes) DESC) 
        FROM
            tidy_election
        GROUP BY 
            year, party
)
SELECT 
    /* w = winning party, l = losing/2nd place party */
    w.year, w.party, w.total, w.total - l.total margin
FROM
        party_totals w
    JOIN
        party_totals l
    ON
        w.year = l.year
    AND
        w.rank = 1
    AND
        l.rank = 2
    AND
        w.year >= 1984

 * postgres://localhost/ex_election
9 rows affected.


year,party,total,margin
1984,republican,54455472,16878120
1988,republican,48886597,7077121
1992,democrat,44909806,5805256
1996,democrat,47400125,8201370
2000,democrat,51009810,547398
2004,republican,62039572,3012457
2008,democrat,69499428,9549105
2012,democrat,65918507,4984100
2016,democrat,65853625,2868519


5. **Which states have had fewer than 3 democratic victories (i.e. fewer than 3 elections where the democrats got the majority of the votes in that state) since 1952?**

In [9]:
%%sql
CREATE VIEW
    year_state_rank
AS
SELECT
    year
    ,state 
    ,party
    ,votes
    ,RANK() OVER (PARTITION BY year,state ORDER BY votes DESC)
FROM
    tidy_election
;
    

 * postgres://localhost/ex_election
Done.


[]

In [10]:
%%sql
SELECT
    state
    ,count(state)
FROM
    year_state_rank
WHERE
    rank = 1 
AND
    party = 'democrat'
GROUP BY 
    state
HAVING
    count(state) < 3
ORDER BY 
    state

 * postgres://localhost/ex_election
12 rows affected.


state,count
AK,1
AZ,1
ID,1
IN,2
KS,1
MT,2
ND,1
NE,1
OK,1
SD,1


6. **Which states have had fewer than 3 republican victories since 1952?**

In [11]:
%%sql
SELECT 
    l.state
    , COUNT(DISTINCT r.year)  num_years 
FROM 
        year_state_rank l 
    LEFT JOIN 
        year_state_rank r 
    ON 
        l.state = r.state  
        AND 
            r.rank = 1 
        AND 
            r.party = 'republican'
GROUP BY 
    l.state
HAVING 
    count(distinct r.year) < 3
ORDER BY 
    l.state

 * postgres://localhost/ex_election
2 rows affected.


state,num_years
DC,0
HI,2


In [12]:
%%sql
/* alternative solution */ 
WITH rwins AS (
    SELECT 
        * 
    FROM 
        year_state_rank
    WHERE
        party = 'republican'
    AND
        rank = 1
),
states AS (
    SELECT
        DISTINCT state
    FROM
        election
)
SELECT
    s.state
    ,count(r.year)
FROM
        states s
    LEFT JOIN
        rwins r
    ON
        s.state = r.state
GROUP BY
    s.state
HAVING
    count(r.year) < 3

 * postgres://localhost/ex_election
2 rows affected.


state,count
HI,2
DC,0


7. We are interested in measuring the partisanship of the states. We will define a partisan state as one that is consistently won by a single party (either Democrat or Republican) since 1988. For example, since 1988 California has been won by the republicans once, and won by the democrats 7 times. Under this metric, California would be considered "partisan". (Note that if we include elections back to 1952, the republicans have won CA 9 times, and democrats have only won it 8 times).

**Find the states where all of the elections since 1988 (including 1988) have been won by the same party**

In [13]:
%%sql
SELECT
    state,
    party
FROM
    year_state_rank
WHERE
    year >= 1988
GROUP BY 
    state, party
HAVING
    sum(rank) = count(distinct year)
ORDER BY 
    party, state

 * postgres://localhost/ex_election
21 rows affected.


state,party
DC,democrat
HI,democrat
MA,democrat
MN,democrat
NY,democrat
OR,democrat
RI,democrat
WA,democrat
AK,republican
AL,republican
