# Research Question

To what extent does self-perceived loneliness vary across different education levels in Canada?

# Variables

- Education Level: A categorical variable representing different education levels (e.g., high school diploma, college diploma, bachelor’s degree, graduate degree).

Education often shapes social networks and access to support systems, potentially influencing loneliness. Exploring these relationships may reveal critical insights for social health initiatives.

- Self-Perceived Loneliness: A self-reported categorical that reflects participants’ sense of isolation or lack of companionship.

Loneliness is an important indicator of mental well-being and social health, making it a valuable outcome variable for this analysis.

**Control Variables:

- Gender: A categorical variable (male, female, non-binary) that may affect social experiences and thus perceived loneliness.

- Ethnicity: A categorical variable with various ethnic backgrounds (e.g. Asian) that could influence social networks and cultural experiences of loneliness.

**Visualisations**

1. Bar Plots

Bar plots provide a straightforward way to display the distribution of respondents across the categories of these variables. This is helpful for understanding the sample composition and identifying differences in educational attainment, ethnic background, or gender among participants. It also helps visualize patterns, such as whether certain educational levels, ethnic backgrounds, or genders have higher or lower frequencies.

2. Stacked Bar Plot or Grouped Bar Plot

Stacked bar charts can visualize the percentage distribution of loneliness categories within each education level, broken down by gender and ethnicity. For example, the x-axis could be Education levels and the y-axis could be percentage of total respondents within each education level. Each bar is divided into segments representing different loneliness categories, with color coding for gender and ethnicity. This visualization helps to highlight the proportions of each loneliness category within educational groups, providing insights into the distribution of loneliness and allowing for easy comparison between groups.

3. Heatmaps

Heatmaps can illustrate the interaction between education levels, gender, and loneliness categories, providing a visual summary of how these factors relate. For example, One axis represents education levels, while the other represents loneliness categories.
The color intensity indicates the number of respondents in each category combination. Heatmaps allow for a quick visual assessment of where the highest concentrations of loneliness occur among different education levels and demographics, making patterns easy to identify.

**Summary Statistics**

- Frequency Counts

We can count the number of respondents in each loneliness category. This gives a direct indication of how many individuals fall into each category and helps in understanding the distribution of loneliness across education levels.

- Proportions

Calculate the proportion of respondents in each loneliness category relative to the total number of respondents. Proportions provide insight into the relative distribution of loneliness categories within each education level.

# Analysis

1. Data Preparation

First, we will ensure that the dataset is cleaned and all relevant variables (loneliness, education level, gender, ethnicity) are appropriately formatted.

2. Linear regression

To assess the relationship between education level and loneliness while controlling for gender and ethnic background, we will use multiple linear regression with loneliness as the outcome variable, and education level, gender, and ethnic background as predictors.

Regression Model:

Loneliness = β0 + β1(Education Level) + β2(Gender) + β3(Ethnic Background) + ϵ

Or in python:

#Define the formula for the linear regression model
formula = 'Loneliness ~ C(Education_Level) + C(Gender) + C(Ethnicity)'

#Fit the model
model = smf.ols(formula=formula, data=data).fit()

#View the summary of the model
print(model.summary())


**Null Hypotheses for Each Coefficient**

For each coefficient in your model, we will establish a null hypothesis that assumes there is no association between the predictor variable and the outcome variable (loneliness).

- For Education Level:

Null Hypothesis (H0,1): β1 = 0 (There is no average change in loneliness associated with changes in education level.)

- For Gender:

Null Hypothesis (H0,2): β2 = 0 (There is no average change in loneliness associated with changes in gender.)

- For Ethnicity:
Null Hypothesis (H0,3): β3 = 0 (There is no average change in loneliness associated with changes in ethnicity.)


**Alternative Hypotheses**

For each null hypothesis, the alternative hypothesis would assert that there is a significant association:

- For Education Level:

Alternative Hypothesis (H1,1): β1 ≠ 0

- For Gender:

Alternative Hypothesis (H1,2): β1 ≠ 0

- For Ethnicity:

Alternative Hypothesis (H1,3): β1 ≠ 0


**Accessing p-values**

Once we fit our model and obtain the summary statistics, we can extract the p-values for each coefficient using:

#Extracting the p-values for hypothesis testing
hypothesis_testing_table = model.summary().tables[1]
print(hypothesis_testing_table)

**Interpretation of Results**

After we have the p-values for each coefficient:

If the p-value for β1 (Education Level) is less than our significance level (typically 0.05), we reject it and conclude that education level significantly affects loneliness. Then we repeat this interpretation for gender (β2) and ethnicity (β3)

**Assumptions**:

- Linearity: There’s a linear relationship between loneliness and predictors.

- Independence: Observations are independent of each other.

- Homoscedasticity: Variance of residuals is constant across all groups.

- Normality of errors: the ϵi errors are normally distributed.

- Multicollinearity: Predictors are not highly correlated with each other, as high multicollinearity can distort the coefficients.

# Hypothesis

Our null hypotheses were as follows:
(H0,1): β1 = 0, (H0,2): β2 = 0, (H0,3): β3 = 0 

And our alternate hypotheses are:
(H1,1): β1 ≠ 0, (H1,2): β2 ≠ 0, (H1,3): β3 ≠ 0

**Expectation**

- For education, we hypothesize that individuals with higher education levels (e.g. masters degree) may report lower levels of self-perceived loneliness compared to those with lower education levels (e.g. high school or less). This expectation is grounded in the idea that higher education can lead to better social integration and networking opportunities. We expect to reject the null hypothesis β1 with a p-value < 0.05, indicating a statistically significant relationship between education level and loneliness.

- For gender, we hypothesize that there may be differences in loneliness levels between genders, with some studies suggesting that men may report higher loneliness due to socialization patterns that discourage emotional expression. Although overall, we belive gender has a non-significant effect on loneliness, leading us to fail to reject the null hypothesis β2 with a p-value > 0.05

- For ethnicity, we hypothesize that individuals from minority ethnic backgrounds may experience higher levels of loneliness due to factors such as cultural disconnection or social isolation. We expect to reject the null hypothesis β3 with a p-value < 0.05, indicating that ethnicity significantly affects self-perceived loneliness.

**Relevance of Results**

The anticipated results of our analysis and our p-values will provide valuable insights into how self-perceived loneliness varies across educational levels, while also considering the effects of gender and ethnicity. 

# Ethical Considerations

Avoid Stereotyping - It’s important to interpret findings responsibly to avoid reinforcing stereotypes about certain educational or ethnic groups being “more lonely” or “more connected.” Findings should be presented in a way that respects cultural diversity and emphasizes that loneliness is a complex, multifaceted experience.

**Group Partner**

I would like to have Tara Billa in my project group.