Copyright (C) 2023 Macarena Picazo Mora

Examining the effect of the Refugee Crisis. The relationship between attitudes toward Immigration and public Euroscepticism in EU member states

### The Influence of Attitudes Toward Migration on Attitudes Toward European Integration

This statistical analysis aims to measure the influence of anti-immigration sentiment on Euroscepticism. We utilize data from the cross-sectional European Social Surveys conducted in [2012](https://ess-search.nsd.no/en/study/7ccf7f30-fd1a-470a-9b90-4c91b0bc7438), [2014](https://ess-search.nsd.no/en/study/ccd56840-e949-4320-945a-927c49e1dc4f) and [2016](https://ess-search.nsd.no/en/study/f8e11f55-0c14-4ab3-abde-96d3f14d3c76)).

Four regression models are run:

**Model 1** serves as a baseline model to confirm the negative link between anti-immigrant attitudes and Euroscepticism. It uses the 11-value European integration variable (column `euftf`) as the dependent variable, and the independent variable is whether respondents think that immigrants make their country a better or worse place to live (column `imwbcnt`). Other control variables are also included.

**Model 2** includes country dummies or fixed effects to account for uncontrolled between-country variation in Euroscepticism.

**Model 3** and **Model 4** add time dummies for 2014 and 2016, with the year 2012 serving as a reference category. They also add an interaction term between the two-time dummies and the variable gauging immigration attitudes. This helps to measure whether there is a statistically significant difference in the effect of immigration attitudes on attitudes toward European integration between 2012, 2014, and 2016. The only difference between these two models is that Model 3 does not include country dummies, whereas Model 4 does.

Dummy Variables
In these regression models, dummy variables are used to incorporate categorical variables into the regression model. In Model 2, country dummies are used to account for uncontrolled between-country variation in Euroscepticism. In Models 3 and 4, time dummies are added for the years 2014 and 2016.

Interaction Terms
Interaction terms are used in regression analysis when we want to investigate whether the relationship between two variables depends on the value of a third variable. In Models 3 and 4, interaction terms between the time dummies and the variable measuring immigration attitudes are added. This helps to understand whether the influence of immigration attitudes on attitudes towards European integration changes over time.

In [13]:
library(tidyverse)
library(haven)
library(ggplot2)
library(gridExtra)
library(svglite)
library(car)
library(stargazer)

In [14]:
raw_data_2012 <- read_csv("./data/ESS6e02_5/ESS6e02_5.csv")
raw_data_2014 <- read_csv("./data/ESS7e02_2/ESS7e02_2.csv")
raw_data_2016 <- read_csv("./data/ESS8e02_2/ESS8e02_2.csv")

[1mRows: [22m[34m54673[39m [1mColumns: [22m[34m625[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (10): name, proddate, cntry, ctzshipc, cntbrthc, lnghom1, lnghom2, fbrn...
[32mdbl[39m (615): essround, edition, idno, dweight, pspwght, pweight, anweight, tvt...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m40185[39m [1mColumns: [22m[34m601[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (10): name, proddate, cntry, ctzshipc, cntbrthc, lnghom1, lnghom2, fbrn...
[32mdbl[39m (591): essround, edition, idno, dweight, pspwght, pweight, tvtot, tvpol,...

[36mℹ[39m Use `spec()` to retrieve the full column specification for

In [33]:
print(length(raw_data_2012$euftf)) # number of rows
print(length(raw_data_2014$euftf)) # number of rows
print(length(raw_data_2016$euftf)) # number of rows

[1] 54673
[1] 40185
[1] 44387


**All variables have been rescaled to go from 0 to 10 except for age**

**The name of the variables have been chosen such that a higher numerical value indicates more in quantity or intensity of the variable. The scale of some variables has been inverted to make this possible. The variables that have been inverted before being fed to the linear model are: `euftf`, `imwbcnt`, `hincfel`, `domicil`, `polintr`, `uemp3m`**

## Model 1

- **Euroscepticism (dependent variable)**: `euftf`
    - European unification go further or gone too far?
- **Opposition to immigration (independent variable)**: `imwbcnt`
    - Immigrants make country worse or better place to live
- **Gender**: `gndr`
    - Gender
- **Age**: `ageo`
    - Age
- **Education**: `eisced`
    - Highest level of education, ES - ISCED
- **Feeling about household's income**: `hincfel`
    - Feeling about household's income nowadays 
- **Degree of urbanisation**: `domicil`
    - Domicile, respondent's description
- **Political Intereset**: `polintr`
    - How interested in politics
- **Satisfaction with democracy**: `stfdem`
    - How satisfied with the way democracy works in country
- **Unemployment**: `uemp3m`
    - Ever unemployed and seeking work for a period more than three months
- **Ideology**: `lrscale`
    - Placement on left right scale

In [18]:
variables <- c(
            "euftf",
            "imwbcnt",
            "agea",
            "eisced",
            "hincfel",
            "domicil",
            "polintr",
            "stfdem",
            "uemp3m",
            "lrscale",
            "gndr"
            )

data_2012 <- raw_data_2012[, variables]
data_2014 <- raw_data_2014[, variables]
data_2016 <- raw_data_2016[, variables]

df <- rbind(data_2012, data_2014, data_2016)

print("Missing values:")
print(sapply(df, function(x) sum(is.na(x))))

print("Unique values in column")
for (column in variables) {
  unique_values <- unique(df[[column]])
  print(paste("Unique values in", column, ":"))
  print(unique_values)
}

print("Length before cleaning NA values:")
print(length(data))

# We clean the data by removing all rows where there's no response for at least one question
# see ESS data codebook
df <- subset(df, euftf <= 10 & imwbcnt <= 10 & gndr <= 2
             & eisced <= 7 & eisced >= 1
             & hincfel <= 4 & domicil <= 5
             & polintr <= 4 & stfdem <= 10 & uemp3m <= 2
             & lrscale <= 10)

# inverting the scale
df$euftf <- 10 - df$euftf
df$imwbcnt <- 10 - df$imwbcnt
df$hincfel <- 4 - df$hincfel
df$domicil <- 5 - df$domicil
df$polintr <- 4 - df$polintr
df$uemp3m <- 2 - df$uemp3m

# setting the lowest value to 0
df$gndr <- df$gndr - 1
df$eisced <- df$eisced - 1
df$hincfel <- df$hincfel - 1
df$domicil <- df$domicil - 1
df$polintr <- df$polintr - 1
df$uemp3m <- df$uemp3m - 1

# rescaling to go from 0 to 10
df$gndr <- df$gndr * 10
df$eisced <- df$eisced / 6 * 10
df$hincfel <- df$hincfel / 3 * 10
df$domicil <- df$domicil / 4 * 10
df$polintr <- df$polintr / 3 * 10
df$uemp3m <- df$uemp3m * 10

print("Length after removing NA values:")
print(length(df$imwbcnt))

[1] "Missing values:"
  euftf imwbcnt    agea  eisced hincfel domicil polintr  stfdem  uemp3m lrscale 
      0       0       0       0       0       0       0       0       0       0 
   gndr 
      0 
[1] "Unique values in column"
[1] "Unique values in euftf :"
 [1]  8 10  7  9  6  0  3 88  4  5  1  2 77 99
[1] "Unique values in imwbcnt :"
 [1]  8 10  5  7 88  6  2  0  9  3  1  4 99 77
[1] "Unique values in agea :"
 [1]  63  29  66 999  59  74  57  64  17  42  80  78  61  60  38  39  18  68  51
[20]  16  23  56  43  32  71  53  50  55  69  79  52  33  20  40  62  35  54  70
[39]  22  31  46  49  41  21  25  24  30  36  48  77  45  26  19  34  75  76  65
[58]  44  47  72  15  73  27  67  28  37  58  83  82  84  81  94  88  98  85  87
[77]  86  92  90  89  91  96  93  95 103  99  97 100 101 102  14 104 114
[1] "Unique values in eisced :"
 [1]  2  4 88  1  3  6  5  7 55 99 77
[1] "Unique values in hincfel :"
[1] 4 2 9 3 1 8 7
[1] "Unique values in domicil :"
[1] 4 1 8 3 2 5 9 7
[1] "Uniq

In [19]:
model_1 <- lm(euftf ~ imwbcnt + gndr + agea + eisced + hincfel + domicil + polintr + stfdem + uemp3m + lrscale, data = df)

summary(model_1)


Call:
lm(formula = euftf ~ imwbcnt + gndr + agea + eisced + hincfel + 
    domicil + polintr + stfdem + uemp3m + lrscale, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.4107 -1.6991 -0.0187  1.7012  7.1677 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.8791227  0.0385619 100.595  < 2e-16 ***
imwbcnt      0.3060895  0.0035621  85.931  < 2e-16 ***
gndr         0.0035213  0.0015468   2.277   0.0228 *  
agea         0.0011888  0.0001788   6.647 3.00e-11 ***
eisced      -0.0319975  0.0026691 -11.988  < 2e-16 ***
hincfel      0.0136400  0.0029500   4.624 3.77e-06 ***
domicil     -0.0283796  0.0025378 -11.183  < 2e-16 ***
polintr      0.0023357  0.0027387   0.853   0.3937    
stfdem      -0.0730834  0.0033624 -21.736  < 2e-16 ***
uemp3m       0.0039081  0.0017311   2.258   0.0240 *  
lrscale      0.0284908  0.0034569   8.242  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.49

## Model 2 (adding year interaction variables)

- **Euroscepticism (dependent variable)**: `euftf`
    - European unification go further or gone too far?
- **Opposition to immigration (independent variable)**: `imwbcnt`
    - Immigrants make country worse or better place to live
- **Gender**: `gndr`
    - Gender
- **Age**: `ageo`
    - Age
- **Education**: `eisced`
    - Highest level of education, ES - ISCED
- **Feeling about household's income**: `hincfel`
    - Feeling about household's income nowadays 
- **Degree of urbanisation**: `domicil`
    - Domicile, respondent's description
- **Political Intereset**: `polintr`
    - How interested in politics
- **Satisfaction with democracy**: `stfdem`
    - How satisfied with the way democracy works in country
- **Unemployment**: `uemp3m`
    - Ever unemployed and seeking work for a period more than three months
- **Ideology**: `lrscale`
    - Placement on left right scale
- **Year 2014**: `imwyys==2014`
    - Set to 1 if year of interview is 2014
- **Year 2016**: `imwyys==2016`
    - Set to 1 if year of interview is 2016
- **Interaction term between different-ethnic immigration opposition and year 2014**: `imwyys==2014` and `imdfetn`
    - Generated by `year_2014 * imdfetn`
- **Interaction term between different-ethnic immigration opposition and year 2016**: `imwyys==2016` and `imdfetn`
    - Generated by `year_2016 * imdfetn`

In [22]:
variables <- c(
            "euftf",
            "imwbcnt",
            "agea",
            "eisced",
            "hincfel",
            "domicil",
            "polintr",
            "stfdem",
            "uemp3m",
            "lrscale",
            "gndr",
            "inwyys"
            )

data_2012 <- raw_data_2012[, variables]
data_2014 <- raw_data_2014[, variables]
data_2016 <- raw_data_2016[, variables]

df <- rbind(data_2012, data_2014, data_2016)

print("Missing values:")
print(sapply(df, function(x) sum(is.na(x))))

print("Unique values in column")
for (column in variables) {
  unique_values <- unique(df[[column]])
  print(paste("Unique values in", column, ":"))
  print(unique_values)
}

print("Length before cleaning NA values:")
print(length(df$imwbcnt))

# We clean the data by removing all rows where there's no response for at least one question
# see ESS data codebook
df <- subset(df, euftf <= 10 & imwbcnt <= 10 & gndr <= 2
             & eisced <= 7 & eisced >= 1
             & hincfel <= 4 & domicil <= 5
             & polintr <= 4 & stfdem <= 10 & uemp3m <= 2
             & lrscale <= 10 & inwyys <= 3000)

# we invert the scale such that a higher value reflects more eurosceptism
df$euftf <- 10 - df$euftf
df$imwbcnt <- 10 - df$imwbcnt
df$hincfel <- 4 - df$hincfel
df$domicil <- 5 - df$domicil
df$polintr <- 4 - df$polintr
df$uemp3m <- 2 - df$uemp3m

# setting the lowest value to 0
df$gndr <- df$gndr - 1
df$eisced <- df$eisced - 1
df$hincfel <- df$hincfel - 1
df$domicil <- df$domicil - 1
df$polintr <- df$polintr - 1
df$uemp3m <- df$uemp3m - 1

# rescaling to go from 0 to 10
df$gndr <- df$gndr * 10
df$eisced <- df$eisced / 6 * 10
df$hincfel <- df$hincfel / 3 * 10
df$domicil <- df$domicil / 4 * 10
df$polintr <- df$polintr / 3 * 10
df$uemp3m <- df$uemp3m * 10

df$year_2014 <- ifelse(df$inwyys == 2014, 1, 0)
df$year_2016 <- ifelse(df$inwyys == 2016, 1, 0)

df$imm_interaction_2014 <- df$imwbcnt * df$year_2014
df$imm_interaction_2016 <- df$imwbcnt * df$year_2016

print("Length after removing NA values:")
print(length(df$imwbcnt))

[1] "Missing values:"
  euftf imwbcnt    agea  eisced hincfel domicil polintr  stfdem  uemp3m lrscale 
      0       0       0       0       0       0       0       0       0       0 
   gndr  inwyys 
      0       0 
[1] "Unique values in column"
[1] "Unique values in euftf :"
 [1]  8 10  7  9  6  0  3 88  4  5  1  2 77 99
[1] "Unique values in imwbcnt :"
 [1]  8 10  5  7 88  6  2  0  9  3  1  4 99 77
[1] "Unique values in agea :"
 [1]  63  29  66 999  59  74  57  64  17  42  80  78  61  60  38  39  18  68  51
[20]  16  23  56  43  32  71  53  50  55  69  79  52  33  20  40  62  35  54  70
[39]  22  31  46  49  41  21  25  24  30  36  48  77  45  26  19  34  75  76  65
[58]  44  47  72  15  73  27  67  28  37  58  83  82  84  81  94  88  98  85  87
[77]  86  92  90  89  91  96  93  95 103  99  97 100 101 102  14 104 114
[1] "Unique values in eisced :"
 [1]  2  4 88  1  3  6  5  7 55 99 77
[1] "Unique values in hincfel :"
[1] 4 2 9 3 1 8 7
[1] "Unique values in domicil :"
[1] 4 1 8 3 2

In [23]:
model_2 <- lm(euftf ~ imwbcnt + gndr + agea + eisced + hincfel + domicil + polintr + stfdem + uemp3m + lrscale + year_2014 + year_2016 + imm_interaction_2014 + imm_interaction_2016, data = df)

summary(model_2)


Call:
lm(formula = euftf ~ imwbcnt + gndr + agea + eisced + hincfel + 
    domicil + polintr + stfdem + uemp3m + lrscale + year_2014 + 
    year_2016 + imm_interaction_2014 + imm_interaction_2016, 
    data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.6769 -1.6842 -0.0163  1.6880  7.2898 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)           3.8407919  0.0406145  94.567  < 2e-16 ***
imwbcnt               0.2960785  0.0043620  67.877  < 2e-16 ***
gndr                  0.0032110  0.0015441   2.080 0.037565 *  
agea                  0.0011566  0.0001785   6.480 9.24e-11 ***
eisced               -0.0311379  0.0026656 -11.682  < 2e-16 ***
hincfel               0.0048410  0.0029789   1.625 0.104146    
domicil              -0.0260881  0.0025356 -10.289  < 2e-16 ***
polintr              -0.0007451  0.0027378  -0.272 0.785515    
stfdem               -0.0761838  0.0033626 -22.656  < 2e-16 ***
uemp3m                0.0030589  0.00

## Model 3 (segmenting by immigration origin)

- **Euroscepticism (dependent variable)**: `euftf`
    - European unification go further or gone too far?
    - "Unification go further" (0) -> "Already too far" (10)
    - **The scale of the variable has been inverted**
- **Opposition to same ethnic immigration(independent variable)**: `imsmetn`
    - Question: "Allow many/few immigrants of same race/ethnic group as majority"
    - "Allow many to come and live here" (1) to "Allow none" (4)
- **Opposition to different ethnic immigration(independent variable)**: `imdfetn`
    - Question: "Allow many/few immigrants of different race/ethnic group as majority"
    - "Allow many to come and live here" (1) to "Allow none" (4)
- **Gender**: `gndr`
    - "Male" (1) and "Female" (2)
- **Age**: `ageo`
- **Education**: `eisced`
    - This variable is scaled such that a lower value corresponds to a lower level of education, while a higher value represents a higher level of education.
    - Lowest value is "lower than secondary education" (1) and highest is "doctoral degree" (7)
- **Feeling about household's income**: `hincfel`
    - Values from 1 to 4 ranging from "Living comfortably on present income" (1) to "Very difficult on present income" (4)
- **Urbanization**: `domicil`
    - A higher value indicates a more rural background, and a lower value indicates a more urban background.
    - Values from 1 to 5 ranging from "A big city" (1), "Suburbs or outskirts of big city" (2), ..., "Farm or home in countryside" (5)
- **Political Intereset**: `polintr`
    - "Very Interested" (1) to "Not at all interested" (4)
- **Satisfaction with democracy**: `stfdem`
    - "Extremely dissatisfied" (0) to "Extremely satisfied" (10)
- **Unemployment**: `uemp3m`
    - "Yes" (1) and "No" (2)
- **Ideology**: `lrscale`
    - "Left" (0) to "Right" (10)

In [24]:
variables <- c(
            "euftf",
            "agea",
            "eisced",
            "hincfel",
            "domicil",
            "polintr",
            "stfdem",
            "uemp3m",
            "lrscale",
            "gndr",
            "imsmetn",
            "imdfetn"
            )

data_2012 <- raw_data_2012[, variables]
data_2014 <- raw_data_2014[, variables]
data_2016 <- raw_data_2016[, variables]

df <- rbind(data_2012, data_2014, data_2016)

print("Missing values:")
print(sapply(df, function(x) sum(is.na(x))))

print("Unique values in column")
for (column in variables) {
  unique_values <- unique(df[[column]])
  print(paste("Unique values in", column, ":"))
  print(unique_values)
}

print("Length before cleaning NA values:")
print(length(df$euftf))

# We clean the data by removing all rows where there's no response for at least one question
# see ESS data codebook
df <- subset(df, euftf <= 10 & gndr <= 2
             & eisced <= 7 & eisced >= 1
             & hincfel <= 4 & domicil <= 5
             & polintr <= 4 & stfdem <= 10 & uemp3m <= 2
             & lrscale <= 10 & imsmetn >= 1 & imsmetn <= 4
             & imdfetn >= 1 & imdfetn <= 4) 

# we invert the scale such that a higher value reflects more eurosceptism
df$euftf <- 10 - df$euftf
df$hincfel <- 4 - df$hincfel
df$domicil <- 5 - df$domicil
df$polintr <- 4 - df$polintr
df$uemp3m <- 2 - df$uemp3m

# setting the lowest value to 0
df$imsmetn <- df$imsmetn - 1
df$imdfetn <- df$imdfetn - 1
df$gndr <- df$gndr - 1
df$eisced <- df$eisced - 1
df$hincfel <- df$hincfel - 1
df$domicil <- df$domicil - 1
df$polintr <- df$polintr - 1
df$uemp3m <- df$uemp3m - 1

# rescaling to go from 0 to 10
df$imsmetn <- df$imsmetn / 3 * 10
df$imdfetn <- df$imdfetn / 3 * 10
df$gndr <- df$gndr * 10
df$eisced <- df$eisced / 6 * 10
df$hincfel <- df$hincfel / 3 * 10
df$domicil <- df$domicil / 4 * 10
df$polintr <- df$polintr / 3 * 10
df$uemp3m <- df$uemp3m * 10

print("Length after removing NA values:")
print(length(df$euftf))

[1] "Missing values:"
  euftf    agea  eisced hincfel domicil polintr  stfdem  uemp3m lrscale    gndr 
      0       0       0       0       0       0       0       0       0       0 
imsmetn imdfetn 
      0       0 
[1] "Unique values in column"
[1] "Unique values in euftf :"
 [1]  8 10  7  9  6  0  3 88  4  5  1  2 77 99
[1] "Unique values in agea :"
 [1]  63  29  66 999  59  74  57  64  17  42  80  78  61  60  38  39  18  68  51
[20]  16  23  56  43  32  71  53  50  55  69  79  52  33  20  40  62  35  54  70
[39]  22  31  46  49  41  21  25  24  30  36  48  77  45  26  19  34  75  76  65
[58]  44  47  72  15  73  27  67  28  37  58  83  82  84  81  94  88  98  85  87
[77]  86  92  90  89  91  96  93  95 103  99  97 100 101 102  14 104 114
[1] "Unique values in eisced :"
 [1]  2  4 88  1  3  6  5  7 55 99 77
[1] "Unique values in hincfel :"
[1] 4 2 9 3 1 8 7
[1] "Unique values in domicil :"
[1] 4 1 8 3 2 5 9 7
[1] "Unique values in polintr :"
[1] 1 3 2 4 8 9 7
[1] "Unique values in 

In [25]:
model_3 <- lm(euftf ~ imsmetn + imdfetn + gndr + agea + eisced + hincfel + domicil + polintr + stfdem + uemp3m + lrscale, data = df)

summary(model_3)


Call:
lm(formula = euftf ~ imsmetn + imdfetn + gndr + agea + eisced + 
    hincfel + domicil + polintr + stfdem + uemp3m + lrscale, 
    data = df)

Residuals:
   Min     1Q Median     3Q    Max 
-7.894 -1.718  0.011  1.694  6.849 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.6789854  0.0355076 131.774  < 2e-16 ***
imsmetn      0.0567974  0.0038361  14.806  < 2e-16 ***
imdfetn      0.1567630  0.0037261  42.072  < 2e-16 ***
gndr         0.0034873  0.0015667   2.226 0.026021 *  
agea         0.0010272  0.0001821   5.639 1.71e-08 ***
eisced      -0.0325533  0.0027128 -12.000  < 2e-16 ***
hincfel      0.0203825  0.0030015   6.791 1.12e-11 ***
domicil     -0.0294403  0.0025706 -11.452  < 2e-16 ***
polintr      0.0041341  0.0027885   1.483 0.138200    
stfdem      -0.1113286  0.0033296 -33.436  < 2e-16 ***
uemp3m       0.0061095  0.0017555   3.480 0.000501 ***
lrscale      0.0218851  0.0035229   6.212 5.25e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**

## Model 4 (segmenting by immigration origin and adding year interaction terms)

- **Euroscepticism (dependent variable)**: `euftf`
    - European unification go further or gone too far?
    - "Unification go further" (0) -> "Already too far" (10)
    - **The scale of the variable has been inverted**
- **Opposition to same ethnic immigration(independent variable)**: `imsmetn`
    - Question: "Allow many/few immigrants of same race/ethnic group as majority"
    - "Allow many to come and live here" (1) to "Allow none" (4)
- **Opposition to different ethnic immigration(independent variable)**: `imsmetn`
    - Question: "Allow many/few immigrants of different race/ethnic group as majority"
    - "Allow many to come and live here" (1) to "Allow none" (4)
- **Gender**: `gndr`
    - "Male" (1) and "Female" (2)
- **Age**: `ageo`
- **Education**: `eisced`
    - This variable is scaled such that a lower value corresponds to a lower level of education, while a higher value represents a higher level of education.
    - Lowest value is "lower than secondary education" (1) and highest is "doctoral degree" (7)
- **Feeling about household's income**: `hincfel`
    - Values from 1 to 4 ranging from "Living comfortably on present income" (1) to "Very difficult on present income" (4)
- **Urbanization**: `domicil`
    - A higher value indicates a more rural background, and a lower value indicates a more urban background.
    - Values from 1 to 5 ranging from "A big city" (1), "Suburbs or outskirts of big city" (2), ..., "Farm or home in countryside" (5)
- **Political Intereset**: `polintr`
    - "Very Interested" (1) to "Not at all interested" (4)
- **Satisfaction with democracy**: `stfdem`
    - "Extremely dissatisfied" (0) to "Extremely satisfied" (10)
- **Unemployment**: `uemp3m`
    - "Yes" (1) and "No" (2)
- **Ideology**: `lrscale`
    - "Left" (0) to "Right" (10)
- **Year 2014**: `imwyys==2014`
    - Set to 1 if year of interview is 2014
- **Year 2016**: `imwyys==2016`
    - Set to 1 if year of interview is 
- **Interaction term between different-ethnic immigration opposition and year 2014**: `imwyys==2014` and `imdfetn`
    - Generated by `year_2014 * imdfetn`
- **Interaction term between different-ethnic immigration opposition and year 2016**: `imwyys==2016` and `imdfetn`
    - Generated by `year_2016 * imdfetn`

In [26]:
variables <- c(
            "euftf",
            "agea",
            "eisced",
            "hincfel",
            "domicil",
            "polintr",
            "stfdem",
            "uemp3m",
            "lrscale",
            "gndr",
            "imsmetn",
            "imdfetn",
            "inwyys"
            )

data_2012 <- raw_data_2012[, variables]
data_2014 <- raw_data_2014[, variables]
data_2016 <- raw_data_2016[, variables]

df <- rbind(data_2012, data_2014, data_2016)

print("Missing values:")
print(sapply(df, function(x) sum(is.na(x))))

print("Unique values in column")
for (column in variables) {
  unique_values <- unique(df[[column]])
  print(paste("Unique values in", column, ":"))
  print(unique_values)
}

print("Length before cleaning NA values:")
print(length(df$euftf))

# We clean the data by removing all rows where there's no response for at least one question
# see ESS data codebook
df <- subset(df, euftf <= 10 & gndr <= 2
             & eisced <= 7 & eisced >= 1
             & hincfel <= 4 & domicil <= 5
             & polintr <= 4 & stfdem <= 10 & uemp3m <= 2
             & lrscale <= 10 & imsmetn >= 1 & imsmetn <= 4
             & imdfetn >= 1 & imdfetn <= 4 & inwyys <= 3000) 

df$euftf <- 10 - df$euftf
df$hincfel <- 4 - df$hincfel
df$domicil <- 5 - df$domicil
df$polintr <- 4 - df$polintr
df$uemp3m <- 2 - df$uemp3m

# setting the lowest value to 0
df$imsmetn <- df$imsmetn - 1
df$imdfetn <- df$imdfetn - 1
df$gndr <- df$gndr - 1
df$eisced <- df$eisced - 1
df$hincfel <- df$hincfel - 1
df$domicil <- df$domicil - 1
df$polintr <- df$polintr - 1
df$uemp3m <- df$uemp3m - 1

# rescaling to go from 0 to 10
df$imsmetn <- df$imsmetn / 3 * 10
df$imdfetn <- df$imdfetn / 3 * 10
df$gndr <- df$gndr * 10
df$eisced <- df$eisced / 6 * 10
df$hincfel <- df$hincfel / 3 * 10
df$domicil <- df$domicil / 4 * 10
df$polintr <- df$polintr / 3 * 10
df$uemp3m <- df$uemp3m * 10

df$year_2014 <- ifelse(df$inwyys == 2014, 1, 0)
df$year_2016 <- ifelse(df$inwyys == 2016, 1, 0)
df$imm_interaction_2014 <- df$imdfetn * df$year_2014
df$imm_interaction_2016 <- df$imdfetn * df$year_2016

print("Length after removing NA values:")
print(length(df$euftf))

[1] "Missing values:"
  euftf    agea  eisced hincfel domicil polintr  stfdem  uemp3m lrscale    gndr 
      0       0       0       0       0       0       0       0       0       0 
imsmetn imdfetn  inwyys 
      0       0       0 
[1] "Unique values in column"
[1] "Unique values in euftf :"
 [1]  8 10  7  9  6  0  3 88  4  5  1  2 77 99
[1] "Unique values in agea :"
 [1]  63  29  66 999  59  74  57  64  17  42  80  78  61  60  38  39  18  68  51
[20]  16  23  56  43  32  71  53  50  55  69  79  52  33  20  40  62  35  54  70
[39]  22  31  46  49  41  21  25  24  30  36  48  77  45  26  19  34  75  76  65
[58]  44  47  72  15  73  27  67  28  37  58  83  82  84  81  94  88  98  85  87
[77]  86  92  90  89  91  96  93  95 103  99  97 100 101 102  14 104 114
[1] "Unique values in eisced :"
 [1]  2  4 88  1  3  6  5  7 55 99 77
[1] "Unique values in hincfel :"
[1] 4 2 9 3 1 8 7
[1] "Unique values in domicil :"
[1] 4 1 8 3 2 5 9 7
[1] "Unique values in polintr :"
[1] 1 3 2 4 8 9 7
[1] "U

In [27]:
model_4 <- lm(euftf ~ imsmetn + imdfetn + gndr + agea + eisced + hincfel + domicil + polintr + stfdem + uemp3m + lrscale + year_2014 + year_2016 + imm_interaction_2014 + imm_interaction_2016, data = df)

summary(model_4)


Call:
lm(formula = euftf ~ imsmetn + imdfetn + gndr + agea + eisced + 
    hincfel + domicil + polintr + stfdem + uemp3m + lrscale + 
    year_2014 + year_2016 + imm_interaction_2014 + imm_interaction_2016, 
    data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.6987 -1.7232 -0.0032  1.6992  6.9008 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)           4.6000996  0.0368227 124.926  < 2e-16 ***
imsmetn               0.0571524  0.0038308  14.919  < 2e-16 ***
imdfetn               0.1511586  0.0042199  35.821  < 2e-16 ***
gndr                  0.0031846  0.0015630   2.037 0.041607 *  
agea                  0.0009839  0.0001817   5.415 6.15e-08 ***
eisced               -0.0313652  0.0027074 -11.585  < 2e-16 ***
hincfel               0.0106152  0.0030274   3.506 0.000454 ***
domicil              -0.0268697  0.0025669 -10.468  < 2e-16 ***
polintr               0.0010040  0.0027857   0.360 0.718542    
stfdem               -0.114

## Plotting all the model results

In [34]:
stargazer(model_1, model_2, align=TRUE,
covariate.labels = c("Opposition to immigration",
                     "Gender",
                     "Age",
                     "Degree of Education",
                     "Perceived household income",
                     "Degree of urbanisation",
                     "Interest in politics",
                     "Satisfaction with democracy",
                     "Unemployment",
                     "Political ideology in left-right scale",
                     "Year 2014",
                     "Year 2016",
                     "Interaction term between 2014 and immigration attitudes",
                     "Interaction term between 2016 and immigration attitudes",
                     "Constant"),
                     dep.var.labels=c("Euroscepticism"))


% Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-mail: marek.hlavac at gmail.com
% Date and time: Thu, Jun 01, 2023 - 14:43:57
% Requires LaTeX packages: dcolumn 
\begin{table}[!htbp] \centering 
  \caption{} 
  \label{} 
\begin{tabular}{@{\extracolsep{5pt}}lD{.}{.}{-3} D{.}{.}{-3} } 
\\[-1.8ex]\hline 
\hline \\[-1.8ex] 
 & \multicolumn{2}{c}{\textit{Dependent variable:}} \\ 
\cline{2-3} 
\\[-1.8ex] & \multicolumn{2}{c}{Euroscepticism} \\ 
\\[-1.8ex] & \multicolumn{1}{c}{(1)} & \multicolumn{1}{c}{(2)}\\ 
\hline \\[-1.8ex] 
 Opposition to immigration & 0.306^{***} & 0.296^{***} \\ 
  & (0.004) & (0.004) \\ 
  & & \\ 
 Gender & 0.004^{**} & 0.003^{**} \\ 
  & (0.002) & (0.002) \\ 
  & & \\ 
 Age & 0.001^{***} & 0.001^{***} \\ 
  & (0.0002) & (0.0002) \\ 
  & & \\ 
 Degree of Education & -0.032^{***} & -0.031^{***} \\ 
  & (0.003) & (0.003) \\ 
  & & \\ 
 Perceived household income & 0.014^{***} & 0.005 \\ 
  & (0.003) & (0.003) \\ 
  & & \\ 
 Degree of ur

In [35]:
stargazer(model_3, model_4, align=TRUE,
covariate.labels = c("Opposition to same ethnic/race immigration",
                     "Opposition to different ethnic/race immigration",
                     "Age",
                     "Gender",
                     "Degree of education",
                     "Perceived household income",
                     "Degree of urbanisation",
                     "Interest in politics",
                     "Satisfaction with democracy",
                     "Unemployment",
                     "Political ideology in left-right scale",
                     "Year 2014",
                     "Year 2016",
                     "Interaction term between 2014 and immigration attitudes",
                     "Interaction term between 2016 and immigration attitudes",
                     "Constant"),
                     dep.var.labels=c("Euroscepticism"))


% Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-mail: marek.hlavac at gmail.com
% Date and time: Thu, Jun 01, 2023 - 14:44:07
% Requires LaTeX packages: dcolumn 
\begin{table}[!htbp] \centering 
  \caption{} 
  \label{} 
\begin{tabular}{@{\extracolsep{5pt}}lD{.}{.}{-3} D{.}{.}{-3} } 
\\[-1.8ex]\hline 
\hline \\[-1.8ex] 
 & \multicolumn{2}{c}{\textit{Dependent variable:}} \\ 
\cline{2-3} 
\\[-1.8ex] & \multicolumn{2}{c}{Euroscepticism} \\ 
\\[-1.8ex] & \multicolumn{1}{c}{(1)} & \multicolumn{1}{c}{(2)}\\ 
\hline \\[-1.8ex] 
 Opposition to same ethnic/race immigration & 0.057^{***} & 0.057^{***} \\ 
  & (0.004) & (0.004) \\ 
  & & \\ 
 Opposition to different ethnic/race immigration & 0.157^{***} & 0.151^{***} \\ 
  & (0.004) & (0.004) \\ 
  & & \\ 
 Age & 0.003^{**} & 0.003^{**} \\ 
  & (0.002) & (0.002) \\ 
  & & \\ 
 Gender & 0.001^{***} & 0.001^{***} \\ 
  & (0.0002) & (0.0002) \\ 
  & & \\ 
 Degree of education & -0.033^{***} & -0.031^{***} \\ 
  & (

###