# 1. Describe the data
- It has 24 observations.
- There are 14 variables, all are in thousands:
    - El denotes electeur inscrits or registered voters.
    - A stand for Mitterand’s in the first round.
    - B stand for Giscard’s in the first round.
    - A2 stand for Mitterand’s in the second round.
    - B2 stand for Giscard’s in the first round.
    - C-K are the ﬁrst round votes of the other candidates.
    - The total number of voters in the second round exceeded that of the first round — we can denote this difference as N. We will treat this group as if it were another first-round candidate, although there are other reasonable ways to handle this.
- French presidential elections are held in two rounds. In 1981, there were 10 candidates in the first round. The top two candidates advanced to the second round, where François Mitterrand defeated Valéry Giscard-d’Estaing. Candidates who lost in the first round can gain political favors by encouraging their supporters to vote for one of the finalists. Since voting is private, we can’t know exactly how these votes were transferred, but we can infer patterns from the published vote totals. Anderson and Loynes (1987) provided data on these vote totals for every fourth department in France.

# 2. Load package and data

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
import faraway.utils

In [4]:
import faraway.datasets.fpe
fpe_df = faraway.datasets.fpe.load()
fpe_df.head()

Unnamed: 0,EI,A,B,C,D,E,F,G,H,J,K,A2,B2,N
Ain,260,51,64,36,23,9,5,4,4,3,3,105,114,17
Alpes,75,14,17,9,9,3,1,2,1,1,1,32,31,5
Ariege,107,27,18,13,17,2,2,2,1,1,1,57,33,6
Bouches.du.Rhone,1036,191,204,119,205,29,13,13,10,10,6,466,364,30
Charente.Maritime,367,71,76,47,37,8,34,5,4,4,2,163,142,17


In [5]:
fpe_df.shape

(24, 14)

# 3. Weighted Least Squares
- Sometimes the errors are uncorrelated but have unequal variance where the form of the inequality is known. In such cases, Σ is diagonal but the entries are not equal. Weighted least squares (WLS) is a special case of GLS and can be used in this situation.
- We have ${\Sigma}=\left(\begin{matrix}\frac{1}{w_1}&\cdots&0\\\vdots&\ddots&\vdots\\0&\cdots&\frac{1}{w_n}\\\end{matrix}\right)$ where $w_i$ are the weights.
- Then $S=\left(\begin{matrix}\frac{1}{\sqrt{w_1}}&\cdots&0\\\vdots&\ddots&\vdots\\0&\cdots&\frac{1}{\sqrt{w_n}}\\\end{matrix}\right).$
- We regress $\sqrt{w_i}y_i$ on $\sqrt{w_i}x_i$
- The first column of model matrix X is replaced with $\sqrt{w_i}.$
- The residuals become $\sqrt{w_i}{\hat{\varepsilon}}_i.$
- Some examples:
    - If the variance of the errors $ε_i$ is proportional to the predictor variable $x_i (var{\varepsilon_i\propto x_i})$, then it might be appropriate to use the inverse of the predictor variable $(w_i\ =\ x_i^{-1})$ as a way to estimate the errors. This approach can be chosen when there is a positive relationship between $\left|{\hat{\varepsilon}}_i\right|$ and $x_i$ in a plot.
    - When we have a group data with each group having $n_i$ observations. We calculate the average of each group $Y_i$. Then $var{Y_i=var{\varepsilon_i=\frac{\sigma^2}{n_i}.}}$ It is common to encounter situations where the responses we are analyzing are actually averages. For example, when studying life expectancies in different countries, we might calculate the average life expectancy for each country. However, it is important to be cautious and ensure that the variance in the life expectancies is truly proportional to the group size (number of people in each country). If we were to set the weights (importance) of each country's average life expectancy equal to the population size of the country, it might seem reasonable at first. However, there are many other factors that contribute to the variation in life expectancies, and these factors could overshadow the effect of population size. Therefore, it is important to consider all sources of variation when determining the weights for averages.
    - When we have observed responses that have different levels of quality. To account for this, we can assign weights to each response $w_i=\frac{1}{var{y_i}}$. By assigning higher weights to responses with lower variance (indicating higher quality), we can give more importance to those responses in our analysis or calculations.

- $A2 = β_A A + β_B B + β_C C + β_D D + β_E E + β_F F + β_G G + β_H H + β_J J + β_K K + β_N N$
- $β_i$ represents the proportion of votes transferred from candidate i to Mitterand in the second round.
- We can do the same for Giscard-d’Estaing, but the β's will just be the remaining proportions, so it’s unnecessary. Our first model uses -1 in the formula to indicate no intercept.

In [6]:
# Normal linear regression
lmod = smf.wls("A2 ~ A + B + C + D + E + F + G + H + J + K + N - 1", fpe_df).fit()

- We expect transfer proportions to vary between departments.
- If we consider the above as a regression equation, there will be some error varying from department to department. The error’s variance will be proportional to the number of voters, resembling the variance of a sum rather than a mean.
- Since the weights should be inversely proportional to the variance, they should be set to 1/EI.

In [7]:
wmod = smf.wls("A2 ~ A + B + C + D + E + F + G + H + J + K + N - 1", fpe_df, weights = 1/fpe_df.EI ).fit()

- Only the relative proportions of the weights matter. For instance, if we multiply the weights by an arbitrary factor, such as 53, the results remain unchanged.

In [8]:
wmod53 = smf.wls("A2 ~ A + B + C + D + E + F + G + H + J + K+ N - 1", fpe_df, weights = 53/fpe_df.EI ).fit()

In [9]:
# Examine the coefficients from these three models
pd.DataFrame([lmod.params, wmod.params, wmod53.params],
             index=['no weights','weights','weights*53']).round(3)

Unnamed: 0,A,B,C,D,E,F,G,H,J,K,N
no weights,1.075,-0.125,0.257,0.905,0.671,0.783,2.166,-0.854,0.144,0.518,0.558
weights,1.067,-0.105,0.246,0.926,0.249,0.755,1.972,-0.566,0.612,1.211,0.529
weights*53,1.067,-0.105,0.246,0.926,0.249,0.755,1.972,-0.566,0.612,1.211,0.529


- We see that using weights makes a difference but only the relative size of the weights matters.
- One remaining issue, unrelated to weighting, is that proportions should be between zero and one. We can address this by truncating any coefficients that fall outside this range to either zero or one. This is done by modifying the response (using variables with a fixed coefficient of one) and omitting variables with a fixed coefficient of zero.

In [10]:
y = fpe_df.A2 - fpe_df.A - fpe_df.G - fpe_df.K
X = fpe_df.loc[:,["C","D","E","F","N"]]
wmod = sm.WLS(y, X, weights = 1/fpe_df.EI ).fit()
wmod.params

C    0.225773
D    0.969977
E    0.390204
F    0.744240
N    0.608539
dtype: float64

- We see that voters for the Communist candidate D apparently almost all voted for the Socialist Mitterand in the second round.
- However, we see that around 20% of the voters for the Gaullist candidate C voted for Mitterand. This is surprising since these 
voters would normally favor the more right wing candidate, Giscard.
- This appears to be the decisive factor. We see that of the larger blocks of smaller candidates, the Ecology party voters, E, roughly split their votes as did the first round non-voters.
- The other candidates had very few voters, and so their behavior is less interesting.

In [11]:
# Alternative methods to constrain the coefficients
# Bake the weights into the variables first
y = fpe_df.A2
X = fpe_df.loc[:,["A","B","C","D","E","F","G","H","J","K","N"]]
weights = 1/fpe_df.EI
Xw = (X.T * np.sqrt(weights)).T
yw = y * np.sqrt(weights)

In [17]:
# Use a constrained optimization
res= sp.optimize.lsq_linear(Xw, yw, bounds=(0, 1)) 
pd.Series(np.round(res.x,3),index=lmod.params.index)

A    1.000
B    0.000
C    0.208
D    0.969
E    0.359
F    0.743
G    1.000
H    0.367
J    0.000
K    1.000
N    0.575
dtype: float64

- The results are quite similar for the candidates C, D, E and N who have substantial numbers of votes, but the coefficients for small party candidates vary much more. 