# Course Project Proposal

### Q1: How does the frequency of social interaction with family, friends, and neighbors relate to feelings of existential loneliness and emotional well-being among survey respondents?

This question aims to estimate the relation between social interaction frequency and two key outcomes: existential loneliness and emotional well-being. Whether more frequent social interactions correlate with lower loneliness and improved well-being, this seeks to highlight the  role of social engagement.

Analysing the mean of loneliness and well-being scores across different interaction frequency groups to determine if any differences are statistically significant. I am interested in the relationships between social interaction frequency (with family, friends, and neighbors) and both existential loneliness and emotional well-being, particularly feelings of emptiness.

1. **Social Interaction Frequency**  
   - Variables: `CONNECTION_social_days_family_p7d`, `CONNECTION_social_days_friends_p7d`, `CONNECTION_social_days_neighbours_p7d`
   - Response Categories: none, some (1-3), most (4-6), every day
   - **Visualization**: A set of bar charts for social interaction frequency against the distribution of responses.

2. **Loneliness Metrics**  
   - Variables: `LONELY_existential_loneliness_scale_outlook`, `LONELY_existential_loneliness_scale_share`
   - Response Scale: 1-9, where 1 is "strongly agree" and 9 is "strongly disagree"
   - **Visualization**: A box plot for each loneliness metric, grouped by levels of social interaction frequency. 

3. **Emotional Well-being**  
   - Variable: `LONELY_dejong_emotional_social_loneliness_scale_emptiness`
   - Response Categories: yes, no, more or less
   - **Visualization**: A stacked bar chart could represent the proportion of each response category (yes, no, more or less) across interaction levels. 

#### Analysis
1. Calculate descriptive statistics (mean and standard deviation) for each variable: Social Interaction Frequency, Loneliness Metrics, and Emotional Well-being using pandas. Summarize loneliness and well-being scores at each interaction level (e.g., "rarely," "sometimes," "often," "always"). Use a bar plot to show the distribution of social interaction frequency and a box plot to visualize loneliness and emotional well-being scores for each interaction level. This will help identify differences in score distributions and detect outliers.

2. Utilize bootstrapping to generate confidence intervals around the mean loneliness and emotional well-being scores for each interaction level, drawing a large number of bootstrap samples (e.g., 10,000). Non-overlapping confidence intervals suggest a statistically significant difference, indicating a potential association between interaction frequency and well-being metrics.

3. Set up a linear regression model with Loneliness Metrics as the outcome variable, including indicator variables for each level of social interaction frequency. This allows measurement of the distinct association of each frequency level with loneliness, with coefficients indicating whether interaction levels are associated with higher or lower scores compared to a baseline (e.g., "rarely"). A similar model can be used for Emotional Well-being

**Assumptions** 
-  loneliness scale has a linear relationship with the frequency of social interactions.
- Bootstrapping assumes that our sample is representative of the population and is independent.

#### Hypothesis: individuals who engage in social interactions more frequently will report lower levels of loneliness and higher levels of emotional well-being. Specifically, we expect to see:

1.
Existential Loneliness:
○
H<sub>0</sub>: There is no linear association between the frequency of social interaction and existential loneliness scores in the population. This could be expressed more formally as H<sub>0</sub>: β<sub>1</sub> = 0, where β<sub>1</sub> is the slope coefficient for social interaction frequency in a Simple Linear Regression model predicting existential loneliness.
- A negative association with interaction frequency, where individuals who "often" or "always" interact with family, friends, or community members will have significantly lower loneliness scores than those who "rarely" or "sometimes" interact.
2.
Emotional Well-being:
○
H<sub>0</sub>: There is no linear association between the frequency of social interaction and emotional well-being scores in the population. This could also be represented as H<sub>0</sub>: β<sub>1</sub> = 0, where β<sub>1</sub> is the slope coefficient for social interaction frequency in a Simple Linear Regression model predicting emotional well-being.

- A positive association, with individuals engaging in frequent interactions reporting higher emotional well-being scores.

**Expected Results**:
The bootstrapping confidence intervals for loneliness scores will be lower and narrower in the "often" and "always" interaction groups compared to "rarely." If the intervals do not overlap, this would indicate that higher social interaction correlates with lower loneliness. In the linear regression analysis, we expect negative coefficients for loneliness at higher interaction levels, suggesting reduced loneliness compared to the "rarely" baseline, while positive coefficients for emotional well-being would indicate increased well-being among those who "often" or "always" engage socially.

---
### Q2: Is there an association between self-perceived mental health, self-esteem, and relationship satisfaction, and to what extent does loneliness influence these associations?

This question examines the relationships between self-perceived mental health, self-esteem, and relationship satisfaction, focusing on how loneliness relates to them. this aims to determine if individuals with higher loneliness scores report lower self-esteem, mental health, and relationship satisfaction, highlighting loneliness's potential impact on well-being.

By estimating mean scores for these variables across different loneliness levels, we can assess how they vary and whether these variations have any correlation with each other. I am  interested in how loneliness influences the strength and direction of associations among self-esteem, mental health, and relationship satisfaction.

1. **Self-Perceived Mental Health**  
   - Variables: `WELLNESS_self_rated_mental_health`, `WELLNESS_subjective_happiness_scale_happy`, `PSYCH_self_esteem_unknown_scale_think_of_me`
   - Response Scales:  
      - Mental health: Excellent, Very good, Good, Fair, Poor  
      - Happiness scale: 1-7 (1 - Not at all, 2, 3, 4, 5, 6, 7 - A great deal)
      - Self-esteem: 1-7 (1 - Not at all, 2, 3, 4, 5, 6, 7 - A great deal)
   - **Visualization**: A histogram for mental health and happiness distributions, and scatter plots to examine associations between self-rated happiness and other mental health variables.

2. **Self-Esteem Metrics**  
   - Variables: `PSYCH_rosenberg_self_esteem_satisfied`, `PSYCH_rosenberg_self_esteem_worth`, `PSYCH_rosenberg_self_esteem_good_qualities`
   - Response Scale: strongly agree, agree, disagree, strongly disagree
   - **Visualization**: A stacked bar chart for each self-esteem measure's distribution and allow comparison across levels of mental health.

3. **Social Satisfaction**  
   - Variable: `WELLNESS_satisfied_relationship`
   - Response Categories: (Extremely dissatisfied, Dissatisfied, Somewhat dissatisfied, Neither satisfied nor dissatisfied, Somewhat satisfied, Satisfied, Extremely satisfied)
   - **Visualization**: A violin plot to show the spread of relationship satisfaction across self-esteem and mental health levels can highlight patterns in satisfaction, 
   
4. **Loneliness Metric**  
   - Variable: `LONELY_existential_loneliness_scale_understand`
   - Response Scale: 1-9 (where 1 is strongly agree and 9 is strongly disagree)
   - **Visualization**: A Bar Plot categorize participants by their loneliness levels and plot the average relationship satisfaction score for each group. 

#### Analysis Plan

1. **Data Summarization with pandas**:
   - Calculate summary statistics for each variable, such as the average self-esteem score across levels of self-perceived mental health and relationship satisfaction. Visualize self-perceived mental health ratings against self-esteem and relationship satisfaction using grouped bar plots to identify patterns or trends, like whether higher ratings correlate with better mental health.

2. **Bootstrapped Confidence Intervals**:
   - Use bootstrapping to create confidence intervals around mean self-esteem and relationship satisfaction scores for each mental health category, estimating significant differences without parametric assumptions. Compare confidence intervals to determine if individuals with “Excellent” self-perceived mental health have higher average self-esteem and relationship satisfaction than those with “Good” or “Fair” ratings.

3. **Permutation Test for Associations**:
   - Conduct a permutation test to assess the significance of the association between self-perceived mental health and self-esteem by shuffling self-esteem scores and recalculating the distribution of associations. Repeat this test for the relationship between self-perceived mental health and relationship satisfaction to validate any observed relationships.

4. **Linear Regression with Indicator Variables**:
   - Perform linear regression using indicator variables for each mental health level (e.g., “Fair” as the baseline) to quantify their effects on self-esteem and relationship satisfaction. This will help determine the strength and significance of the associations while controlling for potential confounding variables.
   
**Assumption**
- the samples drawn from the population are independent of one another, allowing for valid comparisons between groups and ensuring that results from one individual do not influence those of another.
- the scales used to measure self-perceived mental health, self-esteem, and relationship satisfaction are reliable, meaning they consistently yield the same results under similar conditions, ensuring that the assessments accurately reflect the constructs of interest.
- there are linear relationships between self-perceived mental health, self-esteem, and relationship satisfaction, allowing for the application of linear regression techniques to analyze the data effectively.

#### Hypothesis and Expected Results

1. **Self-Esteem and Mental Health**  
   - **Null Hypothesis (H₀)**: There is no significant difference in self-esteem scores across categories of self-perceived mental health. Any observed differences are due to random chance.
   - **Alternative Hypothesis (Hₐ)**: Individuals with higher self-perceived mental health ("Excellent" or "Very Good") will show significantly higher self-esteem scores compared to those with lower mental health ratings.

   **Expected Result**: Bootstrapped confidence intervals should reveal significant differences, with higher self-esteem in the "Excellent" and "Very Good" categories. Permutation tests are expected to yield significant p-values, supporting the association. Positive coefficients in linear regression would confirm a positive relationship with self-esteem.

2. **Relationship Satisfaction and Mental Health**  
   - **Null Hypothesis (H₀)**: There is no significant difference in relationship satisfaction across different levels of self-perceived mental health. Observed differences are likely due to chance.
   - **Alternative Hypothesis (Hₐ)**: Higher self-perceived mental health ("Excellent" or "Very Good") is associated with significantly greater relationship satisfaction compared to lower ratings.

   **Expected Result**: Bootstrapped confidence intervals should confirm that higher mental health correlates with increased relationship satisfaction. A significant p-value from permutation tests would indicate this relationship is not due to chance. Positive coefficients in linear regression would affirm a positive association with relationship satisfaction.
   
---
### Q3:  How does social engagement during physical activities relate to feelings of loneliness and the ability to express one's true self?
This question explores the association between social engagement in physical activities (like sports or group exercises) and lower loneliness and greater self-expression. This aims to highlight the benefits of socialisation in excerising by estimating average loneliness and self-expression scores across different levels of social engagement. This will help assess whether individuals who participate socially in physical activities experience lower loneliness and higher authenticity.

I am interested in how social engagement in physical activities relates to feelings of loneliness and self-expression, specifically if individuals who engage socially feel more comfortable being themselves.

1. **Social Engagement in Physical Activities**  
   - Variable: `EXERCISE_social_frequency`
   - Response Categories: Always,Often, Sometimes, Rarely, Never with others
   - **Visualization**: A bar chart showing the distribution of amount of social engagement. 

2. **Loneliness**  
   - Variable: `LONELY_ucla_loneliness_scale_left_out`
   - Response Categories: hardly, sometimes, often
   - **Visualization**: A grouped bar chart to show loneliness levels by exercise engagement frequency.

3. **Relational Satisfaction**  
   - Variable: `PSYCH_relational_needs_satisfaction_scale_7pt_true_self_without_rejection`
   - Response Scale: 1-9 (1 is strongly agree and 9 is strongly disagree)
   - **Visualization**: A scatter plot of relational satisfaction scores by exercise frequency, with a trendline indicating possible positive associations. 

#### Analysis Plan

1. **Summarizing Data with pandas**:
   - Calculate descriptive statistics for average mental health ratings, self-esteem scores, and relationship satisfaction across different levels of social media use (Low, Moderate, High). Use bar or box plots to visualize trends, helping identify patterns between social media use and the three outcome variables.

2. **Two-Sample Bootstrapped Confidence Intervals**:
   - Create confidence intervals for each outcome variable to compare average scores between groups with different social media usage levels. Bootstrapping will help determine if differences, such as between “High” and “Low” usage groups, are statistically significant.

3. **Comparing Groups with Permutation Tests**:
   - Conduct permutation tests to assess if differences in mental health, self-esteem, and relationship satisfaction scores are statistically meaningful by comparing observed data to a randomly permuted distribution. This will test if high social media usage significantly associates with lower mental health scores compared to low usage.

4. **Linear Regression with Indicator Variable Contrasts**:
   - Apply linear regression using social media usage levels as indicator variables to model each outcome variable separately. By setting “Low” usage as the baseline, we can examine the coefficients for “Moderate” and “High” usage to determine their impact on mental health and relationship outcomes.
   
**Assumption**
- the responses from participants regarding their social engagement during physical activities are independent, meaning that one participant's level of engagement does not influence another's
- the scale used to measure loneliness and self-expression are valid and effectively capture the constructs they are intended to assess, ensuring that the findings are meaningful and accurate.
- variances of loneliness and self-expression scores are consistent across different levels of social engagement, which is important for the validity of the statistical tests employed, such as linear regression and permutation tests.
   
#### Hypothesis and Expected Results

**Hypothesis**:
1. **Loneliness**  
   - **Null Hypothesis (H<sub>0</sub>)**: There is no association between social engagement frequency during physical activities and loneliness levels, meaning individuals' loneliness scores do not significantly differ based on their social engagement.

2. **Ability to Express True Self**  
   - **Null Hypothesis (H<sub>0</sub>)**: There is no association between social engagement frequency and the ability to express one's true self, implying varying social engagement levels do not significantly affect self-expression scores.

**Expected Results**:
1. Loneliess: Bootstrapped confidence intervals should show lower loneliness scores for individuals with higher social engagement, with non-overlapping intervals indicating significant differences. A significant p-value from the permutation test would support that more frequent social engagement reduces loneliness. We anticipate a negative coefficient for higher engagement levels in linear regression, indicating that greater social engagement correlates with lower loneliness.
2. Ability to Express True Self: : Higher engagement levels are expected to show higher self-expression scores in bootstrapped confidence intervals, with non-overlapping intervals suggesting significant associations. The permutation test is likely to yield a significant p-value, indicating that high engagement participants report greater self-expression. In regression analysis, we expect positive coefficients for higher engagement levels, suggesting that increased engagement enhances self-expression.