# Evaluating rules of thumb for selecting target seats

## Background

The Liberal Democrats (LDs) use two high level measures of a constituency's viability as a potential target seat: whether LDs are currently in second place, and whether there is a difference of 10,000 votes or fewer between the LDs and the winner of the seat. In this analysis I seek to determine which of the two is the better predictor of whether a non-incumbent party (not just the LDs) will win the constituency at the next election. 

Many other factors are included in the selection of target seats, such as number of LD councillors, level of fundraising, qualitative sense of the quality of the party's candidate in that seat, and so on. I hope to analyse these other factors in future.

## Research Design

In this first analysis, I will consider each unique 'campaign' - that is, each unique combination of a political party and a constituency - in the 2024 election. Future analysis will include earlier elections. 

Both of the two principal factors are determined by a party's performance in the prior election, so I will need to draw on 2019 election results also. Since the boundaries of general election constituencies changed between the 2019 and 2024 election, I will need to use the estimates of notional 2019 performance of each party in the new 2024 constituencies produced by Rallings & Thrasher.

## Hypotheses:

- H1. All three selected factors will be statistically discernible predictors of the outcome of a campaign.
    - H0. Null Hypothesis: one or more of the three selected factors will _not_ be a statistically discerinble predictor of the outcome of the campaign.
- H2. Difference in 2019 percentage vote share will be a stronger predictor of the campaign outcome than difference in 2019 number of votes.
    - H0. Null Hypothesis: difference in 2019 percentage vote share will _not_ be a stronger predictor of the campaign outcome than difference in 2019 number of votes.
- H3. Whether the campaign achieved second place in the 2019 election will be a stronger predictor of the 2024 campaign outcome than either other factor.
    - H0. Null hypothesis: whether the campaign achieved second place in the 2019 election will _not_ be a stronger predictor of the 2024 campaign outcome than either other factor.


## Controls:

Many other factors could credibly contribute to the outcome of a campaign, including characteristics of the constituency and its population; the candidate; the party; and the campaign itself. Examples of those are as follows:

- The constituency & its population: rural vs urban, economic classification, levels of education, levels of employment, etc
- The candidate: gender, age, whether or not resides in the constituency, etc
- The national party: support in opinion polls, approval ratings for leader, national expenditure etc.
- The local party: number of councillors, results in local elections, number of members, etc.
- The campaign: quality of message, number of volunteers, local expenditure, volume of literature delivered, etc

Over future analyses I would like to establish the explanatory power of a number of these variables. For this initial analysis, I will seek to control for them, as potentially influential factors on the outcome of each campaign. In this initial analysis, I will limit myself to controlling for those variables that are provided in the House of Commons official results datasets. For future analysis I will source additional data with which to both test and control for other considerations.

From the house of Commons official results data, I will use the following as controls:

- Candidates:
    - Gender
    - Former MP: whether or not the candidate has 

## Data Considerations:

To ensure valid conclusions can be drawn, I will need to remove: 

1. all campaigns by incumbent MPs
2. all campaigns in the Speaker's constituency, Chorley, which is not contested by convention

Unlike in the previous analysis, I will not need to remove constituencies in Northern Ireland, because this current analysis is not specific to the performance of the Liberal Democrats (who do not run in those constituencies).

I will also need to control for other likely factors. In this initial analysis, I will control for considerations covered by variables in the datasets in use:

1. Politcal party
2. Gender of candidate
3. Region / country of campaign

As such, I will need to bring together three data sources for this initial analysis:

1. 2024 general election results by constituency
2. 2024 general election results by candidate
3. Estimates of notional 2019 general election results by constituency

(Note also that a party might have stood different candidates in the same constituency in each of the two elections, so ideally I would also take into account data about 2019 candidates, rather than just 2024 candidates. However, the notional 2019 Rallings & Thrasher data only covered results per constituenncy, not per candidate, so I would not be able to establish a match between 2019 and 2024 constituencies.)