<h1 style="text-align: center;">PSTAT 100 Final Project</h1>
<h4 style="text-align: center;">By: Mason Delan, Drew Cockerham, Priyanka Pulikeshi, and Omar de Santiago</h4>


<h2>Data Description</h2>

<p>
 
Why was the data collected? What is the purpose of this dataset?</li>
    <ul>
        <li>
            The World Happiness Report is an annual publication that provides insights into the subjective well-being and happiness levels of countries around the world. The purpose of this dataset is to provide policymakers, politicians, and the general public with valuable information that contributes to a society’s well-being.
What are the observational units?
        </li>
    </ul>
The observational units are a specific country during a particular year.
What is the population of interest?
    <ul>
        <li>
            The population of interest is the global population, and more specifically, the citizens of respective countries that are included in the data. This captures a wide range of perspectives and other nuances in order to represent happiness worldwide.
        </li>
    </ul>
How was the sample obtained (e.g. random sampling, administrative data, convenience sampling, etc.)?
    <ul>
        <li>
            The data regarding GDP per capita was obtained through the administrative data of particular countries. The rest of the sample was obtained through random sampling, specifically from data from the WHO and the Gallup World Poll. Every nation distributes surveys to its citizens in a way that guarantees a representative and varied sample. According to the Gallup World Poll website, the interviews cover at least 80% of a country’s population and are representative of its non institutionalized adult population above the age of 15. This method guarantees that the results may be extrapolated to a wider population while also reducing bias.
        </li>
    </ul>
Can inferences about the population be drawn from the sample?
    <ul>
        <li>
            Inferences can be made about the sample population. The possibility that the sample is representative of the greater population rises when random sampling is used. But it's crucial to remember that poll results are arbitrary and impacted by individual and cultural variances.
          </li>
    </ul>


</p>

<p>
Dataset Description:
    
1. Country name: This column contains the names of different countries included in the report. Each row corresponds to a specific country.

2. Year: The year column represents the specific year for which the happiness data is recorded. It indicates the time period during which the survey or data collection took place.

3. Life Ladder: The Life Ladder column measures the overall subjective well-being or life satisfaction of individuals in a country. It is typically represented on a scale from 0 to 10, where higher values indicate greater life satisfaction.

4. Log GDP per capita: This column represents the logarithm of the Gross Domestic Product (GDP) per capita of a country, which is calculated by how much each country produces, divided by the number of people in the country. It serves as a measure of economic prosperity or standard of living, with higher values indicating higher GDP per person.

5. Social Support: The Social Support column assesses the availability and strength of social networks and support systems within a country. It measures the degree to which individuals have assistance from family, friends, and other social connections. The question associated with this column was “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?”.

6. Health life expectancy: This column represents the average life expectancy or the number of years a person can expect to live in good health. It serves as an indicator of the physical health, mental health, and well-being of the population.

7. Freedom to make Life Choices: This column measures the extent to which individuals perceive having freedom and autonomy in making life choices, such as career, relationships, and personal decisions. Higher values indicate greater freedom. The question associated with this column is “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”.

8. Generosity: The Generosity column reflects the level of generosity or willingness to help others within a country’s population. It measures the frequency of charitable donations and voluntary work. The question associated with this column is “Have you donated money to a charity in the past month?”.

9. Perception of Corruption: This column captures the perceived level of corruption within a country. It assesses the extent to which corruption is perceived to be prevalent in public institutions and the overall trust in the government and public sector. The questions associated with this column are “Is corruption widespread throughout the government or not” and “Is corruption widespread within businesses or not?”.

10. Positive affect: The Positive affect column measures the frequency and intensity of positive emotions experienced by individuals in a country. The result of this column is given by the average of yes or no answers based on emotions of laughter, enjoyment, and interest.

11. Negative affect: The Negative affect column assesses the frequency and intensity of negative emotions experienced by individuals in a country. The result of this column is given by the average of yes or no answers based on emotions of worry, sadness, and anger.
</p>

<h2>Data Analysis (Part 1)</h2>

In [40]:
import numpy as np  # linear algebra
import pandas as pd  # data manipulation and processing
import altair as alt # data visualization
import statsmodels.formula.api as smf  # regression model
import statsmodels.tools as sm # regression model
import warnings
warnings.filterwarnings('ignore')

In [41]:
# import and clean up the World Happiness Report 2023 dataset
data = pd.read_csv('data/whr-2023.csv')
data.dropna()
data.head()

# compute pairwise correlations of variables
data_ = data.copy()
data_.drop(['Country name','year'], axis=1, inplace=True)
corr_data = data_.corr()
corr_data

Unnamed: 0,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect
Life Ladder,1.0,0.784868,0.721662,0.713499,0.534493,0.18163,-0.4315,0.518169,-0.339969
Log GDP per capita,0.784868,1.0,0.68359,0.818126,0.367525,-0.000854,-0.352847,0.237933,-0.247541
Social support,0.721662,0.68359,1.0,0.597659,0.409326,0.068572,-0.222551,0.431038,-0.441837
Healthy life expectancy at birth,0.713499,0.818126,0.597659,1.0,0.373465,0.010775,-0.299016,0.223048,-0.1407
Freedom to make life choices,0.534493,0.367525,0.409326,0.373465,1.0,0.32503,-0.476517,0.57868,-0.275438
Generosity,0.18163,-0.000854,0.068572,0.010775,0.32503,1.0,-0.279435,0.307097,-0.080801
Perceptions of corruption,-0.4315,-0.352847,-0.222551,-0.299016,-0.476517,-0.279435,1.0,-0.280606,0.266265
Positive affect,0.518169,0.237933,0.431038,0.223048,0.57868,0.307097,-0.280606,1.0,-0.330301
Negative affect,-0.339969,-0.247541,-0.441837,-0.1407,-0.275438,-0.080801,0.266265,-0.330301,1.0


In [42]:
# create heatmap based on correlation matrix
corrlong1 = pd.melt(corr_data.reset_index(), id_vars='index', value_vars=corr_data.columns)
corrlong1.columns = ['row', 'col', 'Correlation']
fig1=alt.Chart(corrlong1).mark_rect().encode(
    x=alt.X('col:O', title=None),
    y=alt.Y('row:O',  title=None),
    color=alt.Color('Correlation:Q', scale=alt.Scale(scheme='blueorange', domain=(1,-1)))).properties(
    width=400,
    height=400,
    title='World Happiness Report Correlation Heatmap'
).configure_axis(
    labelFontSize=12, 
).configure_title(
    fontSize=16  
)
fig1

<p>We now have constructed a heatmap to determine possible variables of interest based on their respective correlations. In essence, we want to choose strong relationships to explore further. Some noticeably strong correlations include Log GDP per capita and Healthy life expectancy at birth. We want to further break down Log GDP per capita into two main groups, low and high, with the threshold being based on the median Log GDP of this dataset. This categorization will help us in further investigating the relationship.
</p>

In [43]:
# create variable that indicates wealth (High or Low) based on Log GDP per capita
median_gdp =  data['Log GDP per capita'].median()
data['Wealth'] = np.where(data['Log GDP per capita'] > median_gdp, 'High', 'Low')

# create variable that indicates time period (recession, covid, or other) based on Year
data['Indicator'] = data['year'].apply(lambda x: 'recession' if x in [2008, 2009, 2010] 
                                       else ('covid' if x in [2020, 2021, 2022] 
                                       else 'other'))

# drop Country name and year columns
data.drop(['Country name','year'], axis=1, inplace=True)

<p><p>Based on our breakdown of the Log GDP per capita, we are interested in exploring the impacts of specific global events such as the Great Recession (2008-2009) & the COVID-19 Pandemic (2020-2022), and how these events may have impacted the happiness of populations in both high GDP nations and low GDP nations. By organizing the data further with these two categorical variables, we may see whether a change in these factors would have some significant impact on the relationships between various signifiers of happiness in nations across the world.</p></p>

In [44]:
# split data into 6 groups: high gdp during covid, high gdp during recession, high gdp other,
# low gdp during covid, low gdp during recession, and low gdp other
highgdp_rec = data[(data['Wealth'] == 'High') & (data['Indicator'] == 'recession')]
highgdp_rec.drop(['Wealth', 'Indicator'], axis=1, inplace=True)
lowgdp_rec = data[(data['Wealth'] == 'Low') & (data['Indicator'] == 'recession')]
lowgdp_rec.drop(['Wealth', 'Indicator'], axis=1, inplace=True)
highgdp_covid = data[(data['Wealth'] == 'High') & (data['Indicator'] == 'covid')]
highgdp_covid.drop(['Wealth', 'Indicator'], axis=1, inplace=True)
lowgdp_covid = data[(data['Wealth'] == 'Low') & (data['Indicator'] == 'covid')]
lowgdp_covid.drop(['Wealth', 'Indicator'], axis=1, inplace=True)
highgdp_other = data[(data['Wealth'] == 'High') & (data['Indicator'] == 'other')]
highgdp_other.drop(['Wealth', 'Indicator'], axis=1, inplace=True)
lowgdp_other = data[(data['Wealth'] == 'Low') & (data['Indicator'] == 'other')]
lowgdp_other.drop(['Wealth', 'Indicator'], axis=1, inplace=True)

# compute pairwise correlations of variables for each group
corr_mx1=highgdp_rec.corr()
corr_mx2=lowgdp_rec.corr()
corr_mx3=highgdp_covid.corr()
corr_mx4=lowgdp_covid.corr()
corr_mx5=highgdp_other.corr()
corr_mx6=lowgdp_other.corr()

# create heatmaps based on the correlation matrixes for each group
corr_mx_long1 = pd.melt(corr_mx1.reset_index(), id_vars='index', value_vars=corr_mx1.columns)
corr_mx_long1.columns = ['row', 'col', 'Correlation']

fig1=alt.Chart(corr_mx_long1).mark_rect().encode(
    x=alt.X('col:O', title=None),
    y=alt.Y('row:O',  title=None),
    color=alt.Color('Correlation:Q', scale=alt.Scale(scheme='blueorange', domain=(1,-1)))
)
corr_mx_long2 = pd.melt(corr_mx2.reset_index(), id_vars='index', value_vars=corr_mx2.columns)
corr_mx_long2.columns = ['row', 'col', 'Correlation']

fig2=alt.Chart(corr_mx_long2).mark_rect().encode(
    x=alt.X('col:O', title=None),
    y=alt.Y('row:O',  title=None),
    color=alt.Color('Correlation:Q', scale=alt.Scale(scheme='blueorange', domain=(1,-1)))
)
corr_mx_long3 = pd.melt(corr_mx3.reset_index(), id_vars='index', value_vars=corr_mx3.columns)
corr_mx_long3.columns = ['row', 'col', 'Correlation']

fig3=alt.Chart(corr_mx_long3).mark_rect().encode(
    x=alt.X('col:O', title=None),
    y=alt.Y('row:O',  title=None),
    color=alt.Color('Correlation:Q', scale=alt.Scale(scheme='blueorange', domain=(1,-1)))
)
corr_mx_long4 = pd.melt(corr_mx4.reset_index(), id_vars='index', value_vars=corr_mx4.columns)
corr_mx_long4.columns = ['row', 'col', 'Correlation']

fig4=alt.Chart(corr_mx_long4).mark_rect().encode(
    x=alt.X('col:O', title=None),
    y=alt.Y('row:O',  title=None),
    color=alt.Color('Correlation:Q', scale=alt.Scale(scheme='blueorange', domain=(1,-1)))
)
corr_mx_long5 = pd.melt(corr_mx5.reset_index(), id_vars='index', value_vars=corr_mx5.columns)
corr_mx_long5.columns = ['row', 'col', 'Correlation']

fig5=alt.Chart(corr_mx_long5).mark_rect().encode(
    x=alt.X('col:O', title=None),
    y=alt.Y('row:O',  title=None),
    color=alt.Color('Correlation:Q', scale=alt.Scale(scheme='blueorange', domain=(1,-1)))
)
corr_mx_long6 = pd.melt(corr_mx6.reset_index(), id_vars='index', value_vars=corr_mx6.columns)
corr_mx_long6.columns = ['row', 'col', 'Correlation']

fig6=alt.Chart(corr_mx_long6).mark_rect().encode(
    x=alt.X('col:O', title=None),
    y=alt.Y('row:O',  title=None),
    color=alt.Color('Correlation:Q', scale=alt.Scale(scheme='blueorange', domain=(1,-1)))
)

fig1 = fig1.properties(title='High GDP, recession',
    width=200,
    height=200,
)
fig2 = fig2.properties(title='Low GDP, recession',
    width=200,
    height=200,
)
fig3 = fig3.properties(title='High GDP, covid',
    width=200,
    height=200,
)
fig4 = fig4.properties(title='Low GDP, covid',
    width=200,
    height=200,
)
fig5 = fig5.properties(title='High GDP, other',
    width=200,
    height=200,
)
fig6 = fig6.properties(title='Low GDP, other',
    width=200,
    height=200,
)

row1 = alt.hconcat(fig1, fig2)  
row2 = alt.hconcat(fig3, fig4)  
row3 = alt.hconcat(fig5, fig6)

final_layout = alt.vconcat(row1, row2, row3).configure_axis(
    labelFontSize=10, 
).configure_title(
    fontSize=12  
)

final_layout.display()

<p>By plotting these heatmaps, we can visualize the variation in pairwise correlations between the following categories: High GDP & recession, Low GDP & recession, High GDP & covid, Low GDP & covid, High GDP & other, and Low GDP & other. The term ‘other’ will be our baseline for our correlations of interest, ‘other’ referring to time periods not included in the Great Recession or COVID-19 Pandemic. An interesting observation of the heatmaps is the discrepancy between High GDP & recession and Low GDP & recession with the relationship between Life Ladder and Social Support; this is something we will be exploring further

We were interested in answering the question of whether there was discrepancy in relationships between variables of interest when comparing data from high GDP nations to data from low GDP nations, as well as data from around the time of the 2008 financial crisis, COVID-19 pandemic, or neither. In the interest of exploratory analysis, we decided to create multiple linear regression models with Life Ladder, a variable that most comprehensively encapsulates the measure for happiness in this dataset, being the response variable. In terms of visualization, make a scatterplot of correlations of interest and then overlay the respective regression lines made with predicted values of the response for each of the 6 combinations of factors. 
</p>

<h2>Question of Interest</h2>

<b>How are the relationships between variables of interest, which are happiness signifiers, affected by variations in GDP among nations and the timing of data collection, specifically during the 2008 financial crisis or the COVID-19 pandemic?</b>

<p>Hypothesis: We have reason to believe that the differences in the characteristics of these two globally disruptive events would lead to discrepancies in the relationships between signifiers of happiness. For example, the unique way in which the COVID 19 pandemic affected people’s social lives might result in a change in the relationship between social support and life ladder, as compared with during the recession or other years. Both events might see a reduction in mean life ladder scores, but the social support score might have a greater drop during COVID. </p>

<h2>Data Analysis (Part 2)</h2>

<h3>Regression Model 1</h3>

$$
\text{Life Ladder}_i
    = \beta_0 + \beta_1\text{Social Support} + \underbrace{\beta_2\text{High}_i }_\text{GDP Level} + \underbrace{\beta_3\text{Recession}_i + \beta_4\text{Covid}_i}_\text{Time Period} + \epsilon_i
$$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{Low GDP, Other}\right) = \underbrace{\beta_0}_\text{intercept} + \beta_1\text{Social Support} $$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{High GDP, Other}\right) = \underbrace{\beta_0 + \beta_2}_\text{intercepts} + \beta_1\text{Social Support} $$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{Low GDP, Recession}\right) =\underbrace{\beta_0 + \beta_3}_\text{intercepts} + \beta_1\text{Social Support} $$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{High GDP, Recession}\right) = \underbrace{\beta_0 + \beta_2 + \beta_3}_\text{intercepts} + \beta_1\text{Social Support} $$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{Low GDP, Covid}\right) = \underbrace{\beta_0 + \beta_4}_\text{intercepts} + \beta_1\text{Social Support} $$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{High GDP, Covid}\right) = \underbrace{\beta_0 + \beta_2+ \beta_4}_\text{intercepts} + \beta_1\text{Social Support} $$

In [45]:
data1 = data.copy()

# rename columns to prepare for applying regression
data1.rename(columns={"Life Ladder": "Life_Ladder"}, inplace = True)
data1.rename(columns={"Social support": "Social_Support"}, inplace = True)
data1.rename(columns={"Freedom to make life choices": "Freedom_to_make_life_choices"}, inplace = True)
data1.rename(columns={"Perceptions of corruption": "Perceptions_of_corruption"}, inplace = True)
data1.rename(columns={"Positive affect": "Positive_affect"}, inplace = True)
data1.rename(columns={"Negative affect": "Negative_affect"}, inplace = True)
data1.rename(columns={"Healthy life expectancy at birth": "Healthy_life_expectancy_at_birth"}, inplace = True)

# explanatory variable matrix with Social Support as quantitative variable and Wealth and Indicator as categorical variables
dataa1 = data1.copy()
density_encoded1 = pd.get_dummies(dataa1.Wealth, dtype=int)
density_encoded1_5 = pd.get_dummies(dataa1.Indicator, dtype=int)
x_vars = pd.concat([dataa1.Social_Support, density_encoded1, density_encoded1_5], axis = 1)
x = sm.tools.add_constant(x_vars)
x.head()

Unnamed: 0,const,Social_Support,High,Low,covid,other,recession
0,1.0,0.451,0,1,0,0,1
1,1.0,0.552,0,1,0,0,1
2,1.0,0.539,0,1,0,0,1
3,1.0,0.521,0,1,0,1,0
4,1.0,0.521,0,1,0,1,0


In [53]:
# apply regression using chosen model with Wealth = Low and Indicator = Other being the reference levels
fit1 = smf.ols(formula='Life_Ladder ~ Social_Support + C(Wealth, Treatment("Low")) + C(Indicator, Treatment("other"))',
               data=data1).fit() 
print(fit1.summary())

                            OLS Regression Results                            
Dep. Variable:            Life_Ladder   R-squared:                       0.604
Model:                            OLS   Adj. R-squared:                  0.604
Method:                 Least Squares   F-statistic:                     833.4
Date:                Wed, 13 Dec 2023   Prob (F-statistic):               0.00
Time:                        09:27:53   Log-Likelihood:                -2345.9
No. Observations:                2186   AIC:                             4702.
Df Residuals:                    2181   BIC:                             4730.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                                                    coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------------

<p>Here, we wondered if there was a way to improve the $R^2$ value in some way and decided to add an interaction term to the model that captured the effects of the Social Support, Wealth, and Indicator variables on eachother.</p>

$$
\text{Life Ladder}_i
    = \beta_0 + \beta_1\text{Social Support} + \underbrace{\beta_2\text{High}_i }_\text{GDP Level} + \underbrace{\beta_3\text{Recession}_i + \beta_4\text{Covid}_i}_\text{Time Period} + \underbrace{\beta_5\text{Social Support}_i\text{Covid}_i + \beta_6\text{Social Support}_i\text{Recession}_i}_\text{Interactions} $$
    $$\underbrace{ + \beta_7\text{Social Support}_i\text{High}_i\text{Covid}_i + \beta_8\text{Social Support}_i\text{High}_i\text{Recession}_i + \beta_9\text{Social Support}_i\text{High}_i\text{Other}_i}_\text{Interactions} + \epsilon_i
$$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{Low GDP, Other}\right) = \underbrace{\beta_0}_\text{intercepts} + \beta_1\text{Social Support} $$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{High GDP, Other}\right) = \underbrace{\beta_0 + \beta_2 + \beta_9}_\text{intercepts} + \beta_1\text{Social Support} $$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{Low GDP, Recession}\right) =\underbrace{\beta_0 + \beta_3 + \beta_6}_\text{intercepts} + \beta_1\text{Social Support} $$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{High GDP, Recession}\right) = \underbrace{\beta_0 + \beta_2 + \beta_3 + \beta_8}_\text{intercepts} + \beta_1\text{Social Support} $$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{Low GDP, Covid}\right) = \underbrace{\beta_0 + \beta_4 + \beta_5}_\text{intercepts} + \beta_1\text{Social Support} $$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{High GDP, Covid}\right) = \underbrace{\beta_0 + \beta_2+ \beta_4 + \beta_7}_\text{intercepts} + \beta_1\text{Social Support} $$

In [55]:
# apply regression to above model including interaction term
fit2 = smf.ols(formula='''Life_Ladder ~ Social_Support + C(Wealth, Treatment("Low")) + C(Indicator, Treatment("other")) 
+ Social_Support:C(Wealth, Treatment("Low")):C(Indicator, Treatment("other"))''', data=data1).fit() 
print(fit2.summary())

                            OLS Regression Results                            
Dep. Variable:            Life_Ladder   R-squared:                       0.638
Model:                            OLS   Adj. R-squared:                  0.637
Method:                 Least Squares   F-statistic:                     426.5
Date:                Wed, 13 Dec 2023   Prob (F-statistic):               0.00
Time:                        09:29:45   Log-Likelihood:                -2248.5
No. Observations:                2186   AIC:                             4517.
Df Residuals:                    2176   BIC:                             4574.
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                                                                                                     coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------

<p>After adding the interaction term, the $R^2$ of our model improved slightly from 0.604 to 0.638. This means that around 63.8% of the variance in the model can be explained by the independent variables.</p>

<h5>Fitted Model:</h5>
$$
\hat{\text{Life Ladder}}
    = 1.9707 + 3.7052\text{(Social Support)} - 3.1774\text{(High)} + 0.5519\text{(Recession)}$$$$ - 0.7239\text{(Covid)} + 1.3297\text{(Social Support*Covid)}-0.7160\text{(Social Support*Recession)}$$$$ + 4.6819\text{(Social Support*High*Other)} + 4.2995\text{(Social Support*High*Covid)}$$$$ + 4.8485\text{(Social Support*High*Recession)}
$$


<h5>Low GDP and Other:</h5>
$$
\hat{\text{Life Ladder}} = 1.9707 + 3.7052\text{(Social Support)}
$$
The expected Life Ladder is the sum of the intercept $\beta_0$ and the impact of a one-unit increase in Social Support.

<h5>High GDP and Other:</h5>
$$
\hat{\text{Life Ladder}} = 3.4752 + 3.7052\text{(Social Support)}
$$
The expected Life Ladder is the sum of the intercept $\beta_0$, the impact of a one-unit increase in Social Support, the impact of being in the High GDP category, and the interaction between the conditions Social Support, High GDP, and other.

<h5>Low GDP and Covid:</h5>
$$
\hat{\text{Life Ladder}} = 2.5765 + 3.7052\text{(Social Support)}
$$
The expected Life Ladder is the sum of the intercept $\beta_0$, the impact of a one-unit increase in Social Support, the impact of Covid, and the interaction between the conditions Social Support and Covid.

<h5>High GDP and Covid:</h5>
$$
\hat{\text{Life Ladder}} = 2.3689 + 3.7052\text{(Social Support)}
$$
The expected Life Ladder is the sum of the intercept $\beta_0$, the impact of a one-unit increase in Social Support, the impact of being in the High GDP category and of Covid, and the interaction between the conditions Social Support, High GDP, and Covid.

<h5>Low GDP and Recession:</h5>
$$
\hat{\text{Life Ladder}} = 1.8066 + 3.7052\text{(Social Support)}
$$
The expected Life Ladder is the sum of the intercept $\beta_0$, the impact of a one-unit increase in Social Support, and the impact of the Recession, and the interaction between the conditions Social Support, High GDP, Recession.

<h5>High GDP and Recession:</h5>
$$
\hat{\text{Life Ladder}} = 4.1937 + 3.7052\text{(Social Support)}
$$
The expected Life Ladder is the sum of the intercept $\beta_0$, the impact of a one-unit increase in Social Support, the impact of being in the High GDP category and of the Recession, and the interaction between the conditions Social Support, High GDP, and Recession.

<h5>Interpretation:</h5>

The intercepts highlight the impact of the two categorical variables, Wealth and Indicator, and the interaction variables on the Life Ladder of countries having the same Social Support. Based on these equations, the highest intercept occurs in the "High GDP and Recession" condition at 4.1937 and the lowest intercept occurs in the "Low GDP and Recession" condition at 1.8066. For each time period except the Covid, the conditions with High GDP have a higher intercept than their Low GDP counterparts. In order to better visualize these discrepancies, we decided to plot the Social Support and Life Ladder as a base scatterplot and layered each of the six regression lines on top. We used the observed Social Support values from the dataset to calculate predicted response values for the regression lines.

In [56]:
reg_data1 = data1.copy()

# make a list of sample coefficients from model fit
coefficients = {
    'Intercept': 1.9707,
    'C(Wealth, Treatment("Low"))[T.High]': -3.1774,
    'C(Indicator, Treatment("other"))[T.covid]': -0.7239,
    'C(Indicator, Treatment("other"))[T.recession]': 0.5519,
    'Social_Support': 3.7052,
    'Social_Support:C(Indicator, Treatment("other"))[T.covid]':1.3297,
    'Social_Support:C(Indicator, Treatment("other"))[T.recession]':                                      -0.7160,
    'Social_Support:C(Wealth, Treatment("Low"))[T.High]:C(Indicator, Treatment("other"))[covid]':         4.2995,
    'Social_Support:C(Wealth, Treatment("Low"))[T.High]:C(Indicator, Treatment("other"))[other]':         4.6819,
    'Social_Support:C(Wealth, Treatment("Low"))[T.High]:C(Indicator, Treatment("other"))[recession]':     4.8485 
}

# calculate predicted values of Life Ladder based on the estimated coefficients plus specific interactions
reg_data1['predicted_ll_lo'] = (coefficients['Intercept'] + (coefficients['Social_Support'] * reg_data1['Social_Support']))
reg_data1['predicted_ll_lc_i'] = (coefficients['Intercept'] + coefficients['C(Indicator, Treatment("other"))[T.covid]'] + 
(coefficients['Social_Support'] * reg_data1['Social_Support']) + coefficients['Social_Support:C(Indicator, Treatment("other"))[T.covid]'])
reg_data1['predicted_ll_lr_i'] = (coefficients['Intercept'] + coefficients['C(Indicator, Treatment("other"))[T.recession]'] + 
(coefficients['Social_Support'] * reg_data1['Social_Support']) + coefficients['Social_Support:C(Indicator, Treatment("other"))[T.recession]'])
reg_data1['predicted_ll_ho_i'] = (coefficients['Intercept'] + coefficients['C(Wealth, Treatment("Low"))[T.High]'] + 
(coefficients['Social_Support'] * reg_data1['Social_Support']) + coefficients['Social_Support:C(Wealth, Treatment("Low"))[T.High]:C(Indicator, Treatment("other"))[other]'])
reg_data1['predicted_ll_hc_i'] = (coefficients['Intercept'] + coefficients['C(Wealth, Treatment("Low"))[T.High]'] + coefficients['C(Indicator, Treatment("other"))[T.covid]'] + 
(coefficients['Social_Support'] * reg_data1['Social_Support']) + coefficients['Social_Support:C(Wealth, Treatment("Low"))[T.High]:C(Indicator, Treatment("other"))[covid]'])
reg_data1['predicted_ll_hr_i'] = (coefficients['Intercept'] + coefficients['C(Wealth, Treatment("Low"))[T.High]'] + coefficients['C(Indicator, Treatment("other"))[T.recession]'] + 
(coefficients['Social_Support'] * reg_data1['Social_Support']) + coefficients['Social_Support:C(Wealth, Treatment("Low"))[T.High]:C(Indicator, Treatment("other"))[recession]'])


# create scatter plot of Social Support against Life Ladder
chart1 = alt.Chart(reg_data1).mark_circle(color='pink').encode(
    x=alt.X('Social_Support:Q', title='Social Support'),
    y='Life_Ladder:Q',
)

# create regression lines based on predicted values from each group
line_ll_lo = alt.Chart(reg_data1).mark_line(color='coral').encode(
    x='Social_Support:Q',
    y='predicted_ll_lo:Q',
)
line_ll_lc_i = alt.Chart(reg_data1).mark_line(color='green').encode(
    x='Social_Support:Q',
    y='predicted_ll_lc_i:Q'
)
line_ll_lr_i = alt.Chart(reg_data1).mark_line(color='red').encode(
    x='Social_Support:Q',
    y='predicted_ll_lr_i:Q'
)
line_ll_ho_i = alt.Chart(reg_data1).mark_line(color='purple').encode(
    x='Social_Support:Q',
    y='predicted_ll_ho_i:Q'
)
line_ll_hc_i = alt.Chart(reg_data1).mark_line(color='black').encode(
    x='Social_Support:Q',
    y='predicted_ll_hc_i:Q'
)
line_ll_hr_i = alt.Chart(reg_data1).mark_line(color='blue').encode(
    x='Social_Support:Q',
    y='predicted_ll_hr_i:Q'
)
title_ll_lo = alt.Chart({'values': [{'x': 0.015, 'y': 2.7, 'title': 'Low, Other'}]}).mark_text(
    align='left',
    baseline='middle',
    dx=5,
    color='coral',
    fontSize=11
).encode(
    x='x:Q',
    y='y:Q',
    text='title:N'
)
title_ll_lc_i = alt.Chart({'values': [{'x': 0.015, 'y': 3.45, 'title': 'Low, Covid, Interaction'}]}).mark_text(
    align='left',
    baseline='middle',
    dx=5,
    color='green',
    fontSize=11
).encode(
    x='x:Q',
    y='y:Q',
    text='title:N'
)
title_ll_lr_i = alt.Chart({'values': [{'x': 0.015, 'y': 2.45, 'title': 'Low, Recession,  Interaction'}]}).mark_text(
    align='left',
    baseline='middle',
    dx=5,
    color='red',
    fontSize=11
).encode(
    x='x:Q',
    y='y:Q',
    text='title:N'
)
title_ll_ho_i = alt.Chart({'values': [{'x': 0.015, 'y': 4.25, 'title': 'High, Other,  Interaction'}]}).mark_text(
    align='left',
    baseline='middle',
    dx=5,
    color='purple',
    fontSize=11
).encode(
    x='x:Q',
    y='y:Q',
    text='title:N'
)
title_ll_hc_i = alt.Chart({'values': [{'x': 0.015, 'y': 3.15, 'title': 'High, Covid, Interaction'}]}).mark_text(
    align='left',
    baseline='middle',
    dx=5,
    color='black',
    fontSize=11
).encode(
    x='x:Q',
    y='y:Q',
    text='title:N'
)
title_ll_hr_i = alt.Chart({'values': [{'x': 0.015, 'y': 4.85, 'title': 'High, Recession,  Interaction'}]}).mark_text(
    align='left',
    baseline='middle',
    dx=5,
    color='blue',
    fontSize=11
).encode(
    x='x:Q',
    y='y:Q',
    text='title:N'
)

# combine scatterplot and regression lines
final_chart1 = (chart1 + line_ll_lo + line_ll_lc_i + line_ll_lr_i + line_ll_ho_i + line_ll_hc_i + line_ll_hr_i + title_ll_lo + title_ll_lc_i + title_ll_lr_i 
    + title_ll_ho_i + title_ll_hc_i + title_ll_hr_i).properties(
    width=600,
    height=600,
    title='Social Support vs. Life Ladder, considering GDP and Time Period'
).configure_axis(
    labelFontSize=14,
    titleFontSize=16,
).configure_title(
    fontSize=16
).encode(
    y=alt.Y(title='Life Ladder', scale=alt.Scale(domain=[1.5, 8.5]))
)
final_chart1

<p>By the graph, we can see that all of the lines representing each variation of the model have the same slope, but different intercepts. According to this regression model, countries having the same average social support have a higher average life ladder score if they are a country with high GDP compared to if they are a country with low GDP, except in the case of the Covid time period. Low GDP countries having the same average social support have a higher average Life Ladder score during Covid as compared to during the Recession or other years. There is a significant absolute difference between the High GDP/Recession & Low GDP/Recession intercepts of about 2.3871, presenting us with another very interesting point of contention, why is this discrepancy far greater than our baseline ‘other'? Along the same lines, why is the discrepency between High GDP/Covid & Low GDP/Covid intercepts considerably smaller than than our baseline ‘other'?
</p>

<h3>Regression Model 2</h3>

<p>We decided to further investigate the impact of the discrepencies in the High and Low GDP countries in the Recession and Covid time periods by creating a new model with Social Support replaced by Freedom to Make Life Choices. In this way, we would be measuring the same response, Life Ladder, and seeing if the impact of the categorical variables and their interaction with a separate quantitative variable had a similar affect as it did with our first model. </p>

$$
\text{Life Ladder}_i
    = \beta_0 + \beta_1\text{Freedom to Make Life Choices} + \underbrace{\beta_2\text{High}_i }_\text{GDP Level} + \underbrace{\beta_3\text{Recession}_i + \beta_4\text{Covid}_i}_\text{Time Period} + \epsilon_i
$$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{Low GDP, Other}\right) = \underbrace{\beta_0}_\text{intercept} + \beta_1\text{Freedom to Make Life Choices} $$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{High GDP, Other}\right) = \underbrace{\beta_0 + \beta_2}_\text{intercepts} + \beta_1\text{Freedom to Make Life Choices} $$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{Low GDP, Recession}\right) =\underbrace{\beta_0 + \beta_3}_\text{intercepts} + \beta_1\text{Freedom to Make Life Choices} $$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{High GDP, Recession}\right) = \underbrace{\beta_0 + \beta_2 + \beta_3}_\text{intercepts} + \beta_1\text{Freedom to Make Life Choices} $$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{Low GDP, Covid}\right) = \underbrace{\beta_0 + \beta_4}_\text{intercepts} + \beta_1\text{Freedom to Make Life Choices} $$

$$\mathbb{E}\left(\text{Life Ladder})\;|\; \text{High GDP, Covid}\right) = \underbrace{\beta_0 + \beta_2+ \beta_4}_\text{intercepts} + \beta_1\text{Freedom to Make Life Choices} $$

In [49]:
data2 = data1.copy()

# explanatory variable matrix with Freedom to make Life Choices as quantitative variable and Wealth and Indicator as categorical variables
dataa2 = data2.copy()
density_encoded2 = pd.get_dummies(dataa1.Wealth, dtype=int)
density_encoded2_5 = pd.get_dummies(dataa1.Indicator, dtype=int)
x_vars2 = pd.concat([dataa2.Freedom_to_make_life_choices, density_encoded2, density_encoded2_5], axis = 1)
x2 = sm.tools.add_constant(x_vars2)
x2.head()

Unnamed: 0,const,Freedom_to_make_life_choices,High,Low,covid,other,recession
0,1.0,0.718,0,1,0,0,1
1,1.0,0.679,0,1,0,0,1
2,1.0,0.6,0,1,0,0,1
3,1.0,0.496,0,1,0,1,0
4,1.0,0.531,0,1,0,1,0


In [57]:
# apply regression to above model including interaction variable
fit2 = smf.ols(formula='''Life_Ladder ~ Freedom_to_make_life_choices + C(Wealth, Treatment("Low")) + C(Indicator, Treatment("other")) + 
Freedom_to_make_life_choices:C(Wealth, Treatment("Low")):C(Indicator, Treatment("other"))''', data=data2).fit() 
print(fit2.summary())

                            OLS Regression Results                            
Dep. Variable:            Life_Ladder   R-squared:                       0.582
Model:                            OLS   Adj. R-squared:                  0.580
Method:                 Least Squares   F-statistic:                     333.3
Date:                Wed, 13 Dec 2023   Prob (F-statistic):               0.00
Time:                        09:34:59   Log-Likelihood:                -2393.2
No. Observations:                2166   AIC:                             4806.
Df Residuals:                    2156   BIC:                             4863.
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                                                                                                                   coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------

<h5>Fitted Model:</h5>
$$
\hat{\text{Life Ladder}}
    = 3.3119 + 1.9841\text{(Freedom to Make Life Choices)} - 0.5854\text{(High)} + 0.5247\text{(Recession)} - 1.3826\text{(Covid)}$$ $$+ 1.8772\text{(Freedom to Make Life Choices*Covid)}-0.5785\text{(Freedom to Make Life Choices*Recession)}$$$$ + 2.3936\text{(Freedom to Make Life Choices*High*Other)} + 2.1542\text{(Freedom to Make Life Choices*High*Covid)}$$$$ + 2.6484\text{(Freedom to Make Life Choices*High*Recession)}$$



<h5>Low GDP and Other:</h5>
$$
\hat{\text{Life Ladder}} = 3.3119 + 1.9841\text{(Freedom to Make Life Choices)}
$$
The expected Life Ladder is the sum of the intercept $\beta_0$ and the impact of a one-unit increase in Freedom to Make Life Choices.

<h5>High GDP and Other:</h5>
$$
\hat{\text{Life Ladder}} = 5.1201 + 1.9841\text{(Freedom to Make Life Choices)}
$$
The expected Life Ladder is the sum of the intercept $\beta_0$, the impact of a one-unit increase in Freedom to Make Life Choices, the impact of being in the High GDP category, and the interaction between the conditions Freedom to Make Life Choices, High GDP, and other.

<h5>Low GDP and Covid:</h5>
$$
\hat{\text{Life Ladder}} = 3.8065 + 1.9841\text{(Freedom to Make Life Choices)}
$$
The expected Life Ladder is the sum of the intercept $\beta_0$, the impact of a one-unit increase in Freedom to Make Life Choices, the impact of Covid, and the interaction between the conditions Freedom to Make Life Choices and Covid.

<h5>High GDP and Covid:</h5>
$$
\hat{\text{Life Ladder}} = 3.4981 + 1.9841\text{(Freedom to Make Life Choices)}
$$
The expected Life Ladder is the sum of the intercept $\beta_0$, the impact of a one-unit increase in Freedom to Make Life Choices, the impact of being in the High GDP category and of Covid, and the interaction between the conditions Freedom to Make Life Choices, High GDP, and Covid.

<h5>Low GDP and Recession:</h5>
$$
\hat{\text{Life Ladder}} = 3.2581 + 1.9841\text{(Freedom to Make Life Choices)}
$$
The expected Life Ladder is the sum of the intercept $\beta_0$, the impact of a one-unit increase in Freedom to Make Life Choices, and the impact of the Recession, and the interaction between the conditions Freedom to Make Life Choices, High GDP, and Recession.

<h5>High GDP and Recession:</h5>
$$
\hat{\text{Life Ladder}} = 5.8996 + 1.9841\text{(Freedom to Make Life Choices)}
$$
The expected Life Ladder is the sum of the intercept $\beta_0$, the impact of a one-unit increase in Freedom to Make Life Choices, the impact of being in the High GDP category and of the Recession, and the interaction between the conditions Freedom to Make Life Choices, High GDP, and Recession.

<h5>Interpretation:</h5>

The intercepts highlight the impact of the two categorical variables, Wealth and Indicator, and the interaction variables on the Life Ladder of countries having the same Freedom to Make Life Choices. Based on these equations, the highest intercept occurs in the "High GDP and Recession" condition at 5.8996 and the lowest intercept occurs in the "Low GDP and Recession" condition at 3.2581. Notably, the "High GDP and Recession" and "Low GDP and Recession" had the highest and lowest intercepts, respectively, in our first model involving Social Support as well. Again, as was the case with the first model, for each time period except the Covid, the conditions with High GDP have a higher intercept than their Low GDP counterparts. In order to better visualize these discrepancies, we decided to plot the Freedom to Make Life Choices and Life Ladder as a base scatterplot and layered each of the six regression lines on top. We used the observed Freedom to Make Life Choices values from the dataset to calculate predicted response values for the regression lines.

In [58]:
reg_data2 = data2.copy()

# make a list of sample coefficients from model fit
coefficients2 = {
    'Intercept': 3.3119,
    'C(Wealth, Treatment("Low"))[T.High]': -0.5854,
    'C(Indicator, Treatment("other"))[T.covid]': -1.3826,
    'C(Indicator, Treatment("other"))[T.recession]': 0.5247,
    'Freedom_to_make_life_choices': 1.9841 ,
    'Freedom_to_make_life_choices:C(Indicator, Treatment("other"))[T.covid]':1.8772,
    'Freedom_to_make_life_choices:C(Indicator, Treatment("other"))[T.recession]': -0.5785,
    'Freedom_to_make_life_choices:C(Wealth, Treatment("Low"))[T.High]:C(Indicator, Treatment("other"))[covid]': 2.1542,
    'Freedom_to_make_life_choices:C(Wealth, Treatment("Low"))[T.High]:C(Indicator, Treatment("other"))[other]': 2.3936,
    'Freedom_to_make_life_choices:C(Wealth, Treatment("Low"))[T.High]:C(Indicator, Treatment("other"))[recession]': 2.6484 
}

# calculate predicted values of Life Ladder based on the estimated coefficients plus specific interactions
reg_data2['predicted_ll_lo2'] = (coefficients2['Intercept'] + (coefficients2['Freedom_to_make_life_choices'] * reg_data2['Freedom_to_make_life_choices']))
reg_data2['predicted_ll_lc_i2'] = (coefficients2['Intercept'] + coefficients2['C(Indicator, Treatment("other"))[T.covid]'] + 
(coefficients2['Freedom_to_make_life_choices'] * reg_data2['Freedom_to_make_life_choices']) + coefficients2['Freedom_to_make_life_choices:C(Indicator, Treatment("other"))[T.covid]'])
reg_data2['predicted_ll_lr_i2'] = (coefficients2['Intercept'] + coefficients2['C(Indicator, Treatment("other"))[T.recession]'] + 
(coefficients2['Freedom_to_make_life_choices'] * reg_data2['Freedom_to_make_life_choices']) + coefficients2['Freedom_to_make_life_choices:C(Indicator, Treatment("other"))[T.recession]'])
reg_data2['predicted_ll_ho_i2'] = (coefficients2['Intercept'] + coefficients2['C(Wealth, Treatment("Low"))[T.High]'] + 
(coefficients2['Freedom_to_make_life_choices'] * reg_data2['Freedom_to_make_life_choices']) + 
coefficients2['Freedom_to_make_life_choices:C(Wealth, Treatment("Low"))[T.High]:C(Indicator, Treatment("other"))[other]'])
reg_data2['predicted_ll_hc_i2'] = (coefficients2['Intercept'] + coefficients2['C(Wealth, Treatment("Low"))[T.High]'] + coefficients2['C(Indicator, Treatment("other"))[T.covid]'] + 
(coefficients2['Freedom_to_make_life_choices'] * reg_data2['Freedom_to_make_life_choices']) + 
coefficients2['Freedom_to_make_life_choices:C(Wealth, Treatment("Low"))[T.High]:C(Indicator, Treatment("other"))[covid]'])
reg_data2['predicted_ll_hr_i2'] = (coefficients2['Intercept'] + coefficients2['C(Wealth, Treatment("Low"))[T.High]'] + coefficients2['C(Indicator, Treatment("other"))[T.recession]'] + 
(coefficients2['Freedom_to_make_life_choices'] * reg_data2['Freedom_to_make_life_choices']) + 
coefficients2['Freedom_to_make_life_choices:C(Wealth, Treatment("Low"))[T.High]:C(Indicator, Treatment("other"))[recession]'])


# create scatter plot of Social Support against Life Ladder
chart2 = alt.Chart(reg_data2).mark_circle(color='pink').encode(
    x=alt.X('Freedom_to_make_life_choices:Q', title='Freedom to Make Life Choices'),
    y='Life_Ladder:Q',
)

# create regression lines based on predicted values from each group
line_ll_lo2 = alt.Chart(reg_data2).mark_line(color='coral').encode(
    x='Freedom_to_make_life_choices:Q',
    y='predicted_ll_lo2:Q',
)
line_ll_lc_i2 = alt.Chart(reg_data2).mark_line(color='green').encode(
    x='Freedom_to_make_life_choices:Q',
    y='predicted_ll_lc_i2:Q'
)
line_ll_lr_i2 = alt.Chart(reg_data2).mark_line(color='red').encode(
    x='Freedom_to_make_life_choices:Q',
    y='predicted_ll_lr_i2:Q'
)
line_ll_ho_i2 = alt.Chart(reg_data2).mark_line(color='purple').encode(
    x='Freedom_to_make_life_choices:Q',
    y='predicted_ll_ho_i2:Q'
)
line_ll_hc_i2 = alt.Chart(reg_data2).mark_line(color='black').encode(
    x='Freedom_to_make_life_choices:Q',
    y='predicted_ll_hc_i2:Q'
)
line_ll_hr_i2 = alt.Chart(reg_data2).mark_line(color='blue').encode(
    x='Freedom_to_make_life_choices:Q',
    y='predicted_ll_hr_i2:Q'
)
title_ll_lo2 = alt.Chart({'values': [{'x': 0.015, 'y': 3.75, 'title': 'Low, Other'}]}).mark_text(
    align='left',
    baseline='middle',
    dx=5,
    color='coral',
    fontSize=11
).encode(
    x='x:Q',
    y='y:Q',
    text='title:N'
)
title_ll_lc_i2 = alt.Chart({'values': [{'x': 0.015, 'y': 4.35, 'title': 'Low, Covid, Interaction'}]}).mark_text(
    align='left',
    baseline='middle',
    dx=5,
    color='green',
    fontSize=11
).encode(
    x='x:Q',
    y='y:Q',
    text='title:N'
)
title_ll_lr_i2 = alt.Chart({'values': [{'x': 0.015, 'y': 3.5, 'title': 'Low, Recession,  Interaction'}]}).mark_text(
    align='left',
    baseline='middle',
    dx=5,
    color='red',
    fontSize=11
).encode(
    x='x:Q',
    y='y:Q',
    text='title:N'
)
title_ll_ho_i2 = alt.Chart({'values': [{'x': 0.015, 'y': 5.45, 'title': 'High, Other,  Interaction'}]}).mark_text(
    align='left',
    baseline='middle',
    dx=5,
    color='purple',
    fontSize=11
).encode(
    x='x:Q',
    y='y:Q',
    text='title:N'
)
title_ll_hc_i2 = alt.Chart({'values': [{'x': 0.015, 'y': 4, 'title': 'High, Covid, Interaction'}]}).mark_text(
    align='left',
    baseline='middle',
    dx=5,
    color='black',
    fontSize=11
).encode(
    x='x:Q',
    y='y:Q',
    text='title:N'
)
title_ll_hr_i2 = alt.Chart({'values': [{'x': 0.015, 'y': 6.3, 'title': 'High, Recession,  Interaction'}]}).mark_text(
    align='left',
    baseline='middle',
    dx=5,
    color='blue',
    fontSize=11
).encode(
    x='x:Q',
    y='y:Q',
    text='title:N'
)

# combine scatterplot and regression lines
final_chart2 = (chart2 + line_ll_lo2 + line_ll_lc_i2 + line_ll_lr_i2 + line_ll_ho_i2 + line_ll_hc_i2 + line_ll_hr_i2 + title_ll_lo2 + 
    title_ll_lc_i2 + title_ll_lr_i2 + title_ll_ho_i2 + title_ll_hc_i2 + title_ll_hr_i2).properties(
    width=600,
    height=600,
    title='Freedom to Make Life Choices vs. Life Ladder, considering GDP and Time Period'
).configure_axis(
    labelFontSize=14,
    titleFontSize=16,
).configure_title(
    fontSize=16
).encode(
    y=alt.Y(title='Life Ladder', scale=alt.Scale(domain=[1.0, 8.5]))
)
final_chart2

<p>According to this regression model, countries having the same average Freedom to Make Life Choices have a higher average life ladder score if they are a country with high GDP compared to if they are a country with low GDP, except in the case of the Covid time period. Low GDP countries having the same average Freedom to Make Life Choices have a higher average Life Ladder score during Covid as compared to during the Recession or other years. The largest intercept discrepancy based on GDP occurs during the Recession, with the absolute difference between the High GDP/Recession & Low GDP/Recession intercepts being 2.6415. The smallest intercept discrepancy based on GDP occurs during Covid, with the absolute difference between the High GDP/Covid & Low GDP/Covid intercepts being a mere 0.3084. 
    
Interestingly, the patterns observed above were reflected, with different specific values of course, in the first model that used Social Support as a quantiative varibale instead of Freedom to Make Life Choices. This indicates that there is some notable consistency as to the impact of the various combinations of the Wealth and Indicator variables on the relationship between various vairbales and Life Ladder. 
</p>

<h2>Summary</h2>

<p>PARAGRAPH! about the similarity of the observations and the order of the regression lines between the first and second model. something about how the recession led to an exacerbaiton of the difference in gdp while during covid, the impact of gdp was much lower, almost negligible. makes sense social support was widespread during covid, regardless of high or low gdp. also makes sense in terms of freedom to make life choices having a similar impact on life ladder during covid because everyone was prettymuch in the same situation of lockdown and was restricted in that way whether you lived in a welathy country or not.

Intercepts of various condition combinations from highest to lowest in both models:
<ol>
    <li>
        High GDP and Recession
    </li>
      <li>
        High GDP and Other
    </li>
      <li>
        Low GDP and Covid
    </li>
      <li>
        High GDP and Covid
    </li>
      <li>
        Low GDP and Other
    </li>
      <li>
        Low GDP and Recession
    </li>
</ol>

 </p>

In [59]:
def fit_and_display_r_squared(data, independent_variables):
    for var1 in independent_variables:
        for var2 in independent_variables:
            if var1 != var2:
                formula = f'{var1} ~ {var2} + C(Wealth, Treatment("Low")) + C(Indicator, Treatment("other")) + {var2}:C(Wealth, Treatment("Low")):C(Indicator, Treatment("other"))'
                
                fit = smf.ols(formula=formula, data=data).fit()
                r_squared = fit.rsquared
                
                print(f'R-squared for {var1}, {var2}: {r_squared:.4f}')

# Example usage:
# Replace 'your_dataset', 'Life_ladder', and the list of independent variables with your actual data
# and variables.
fit_and_display_r_squared(data2,['Life_Ladder','Freedom_to_make_life_choices', 'Social_Support', 'Healthy_life_expectancy_at_birth','Generosity','Perceptions_of_corruption',
                                 'Positive_affect'])


R-squared for Life_Ladder, Freedom_to_make_life_choices: 0.5818
R-squared for Life_Ladder, Social_Support: 0.6382
R-squared for Life_Ladder, Healthy_life_expectancy_at_birth: 0.6004
R-squared for Life_Ladder, Generosity: 0.5056
R-squared for Life_Ladder, Perceptions_of_corruption: 0.5251
R-squared for Life_Ladder, Positive_affect: 0.6099
R-squared for Freedom_to_make_life_choices, Life_Ladder: 0.3328
R-squared for Freedom_to_make_life_choices, Social_Support: 0.2528
R-squared for Freedom_to_make_life_choices, Healthy_life_expectancy_at_birth: 0.1763
R-squared for Freedom_to_make_life_choices, Generosity: 0.2453
R-squared for Freedom_to_make_life_choices, Perceptions_of_corruption: 0.2889
R-squared for Freedom_to_make_life_choices, Positive_affect: 0.4164
R-squared for Social_Support, Life_Ladder: 0.5616
R-squared for Social_Support, Freedom_to_make_life_choices: 0.4142
R-squared for Social_Support, Healthy_life_expectancy_at_birth: 0.4381
R-squared for Social_Support, Generosity: 0.350