# Course Project Proposal

### Q1: How does the frequency of social interaction with family, friends, and neighbors relate to feelings of existential loneliness and emotional well-being among survey respondents?

This aims to estimate the association between social interaction frequency and two key outcome variables: existential loneliness and emotional well-being. By examining whether more frequent social interactions are linked to lower levels of loneliness and improved well-being, this research will provide insights into the potential protective role of social engagement against feelings of loneliness.

By understanding the mean loneliness and well-being scores for individuals at different levels of social interaction frequency. Specifically, this involves estimating how these scores vary across groups with different interaction frequencies and determining if any observed differences are statistically meaningful.

I am interested in the relationship between social interaction frequency (with family, friends, and neighbors) and existential loneliness scores, as well as the relationship between social interaction frequency and emotional well-being, as reflected by feelings of emptiness.

1. **Social Interaction Frequency**  
   - Variables: `CONNECTION_social_days_family_p7d`, `CONNECTION_social_days_friends_p7d`, `CONNECTION_social_days_neighbours_p7d`
   - Response Categories: none, some (1-3), most (4-6), every day
   - **Purpose**: We hypothesize that people who socialize more frequently with family, friends, or neighbors may report lower levels of loneliness and emotional emptiness.
   - **Visualization**: A set of bar charts for each social interaction frequency variable will be used to visualize the distribution of responses. Comparing these charts will help identify if more frequent interaction is associated with different levels of loneliness or well-being.

2. **Loneliness Metrics**  
   - Variables: `LONELY_existential_loneliness_scale_outlook`, `LONELY_existential_loneliness_scale_share`
   - Response Scale: 1-9, where 1 is "strongly agree" and 9 is "strongly disagree"
   - **Purpose**: These variables measure existential loneliness through agreement with statements about feeling connected or misunderstood. We aim to analyze if greater social interaction correlates with lower loneliness scores.
   - **Visualization**: A box plot could be used for each loneliness metric, grouped by levels of social interaction frequency. This will help in visualizing the spread of loneliness scores across different interaction levels, showing if higher interaction correlates with lower loneliness.

3. **Emotional Well-being**  
   - Variable: `LONELY_dejong_emotional_social_loneliness_scale_emptiness`
   - Response Categories: yes, no, more or less
   - **Purpose**: This variable serves as a direct indicator of emotional well-being, specifically feelings of emptiness. Comparing this across social interaction frequencies could reveal whether frequent socializing reduces the likelihood of reported emptiness.
   - **Visualization**: A stacked bar chart could represent the proportion of each response category (yes, no, more or less) across interaction levels. This will allow for a clear comparison of well-being states by social frequency.

#### Analysis
1. by calculating descriptive statistics for each variable: _Social Interaction Frequency_, _Loneliness Metrics_, and _Emotional Well-being_. Use pandas to summarize the mean and standard deviation for loneliness and well-being scores at each level of social interaction frequency (e.g., "rarely," "sometimes," "often," "always"). use the bar plot for the distribution of social interaction frequency to get a sense of how interaction levels vary across participants. The box plot to visualize loneliness and emotional well-being scores for each interaction level. These will help identify potential differences in score distributions across interaction frequencies and detect any outliers or unusual patterns.

2. bootstrapping to generate confidence intervals around the mean loneliness and emotional well-being scores for each level of social interaction. Draw a large number of bootstrap samples (e.g., 10,000) to estimate the mean scores’ confidence intervals for each interaction level. If the confidence intervals for loneliness or emotional well-being scores do not overlap between interaction levels, this would suggest a statistically significant difference, implying that interaction frequency might be associated with these well-being metrics.

3. Assess the specific relationship between different levels of social interaction (categorical predictor) and loneliness or emotional well-being (continuous outcomes). Set up a linear regression model with _Loneliness Metrics_ as the outcome variable and include indicator variables (dummy variables) for each level of social interaction frequency. This will allow us to measure the distinct association of each frequency level (e.g., "sometimes," "often") with loneliness. The coefficients of these indicators will show whether each interaction level is associated with higher or lower loneliness scores compared to a baseline (e.g., "rarely" interacting). A similar model can be run with _Emotional Well-being_ as the outcome variable.

#### Hypothesis: individuals who engage in social interactions more frequently will report lower levels of loneliness and higher levels of emotional well-being. Specifically, we expect to see:

- **Loneliness**: A negative association with interaction frequency, where individuals who "often" or "always" interact with family, friends, or community members will have significantly lower loneliness scores than those who "rarely" or "sometimes" interact.
- **Emotional Well-being**: A positive association, with individuals engaging in frequent interactions reporting higher emotional well-being scores.

**Expected Results**:
- **Bootstrapping Confidence Intervals**: We expect that the confidence intervals for loneliness scores will be narrower and lower for the "often" and "always" interaction groups compared to the "rarely" group. If the intervals do not overlap significantly between groups, this would provide evidence that higher interaction frequency correlates with lower loneliness.
- **Linear Regression with Indicator Variables**: We anticipate negative coefficients for the loneliness score (indicating reduced loneliness) for each higher level of interaction frequency compared to the "rarely" baseline. For emotional well-being, we expect positive coefficients (indicating increased well-being) for higher interaction levels, particularly for those who "often" or "always" engage in social activities.

**Relevance of Results**:
If our hypothesis holds true, it would suggest that increased social interaction could play a protective role against loneliness and might enhance emotional well-being. This could be significant for developing mental health interventions that promote socialization as a buffer against loneliness. Identifying such patterns could help guide community and public health initiatives focused on improving access to social opportunities, particularly for those at risk of social isolation.

---
### Q2: Is there an association between self-perceived mental health, self-esteem, and relationship satisfaction, and to what extent does loneliness influence these associations?

This question seeks to examine the relationships between self-perceived mental health, self-esteem, and relationship satisfaction, with a particular focus on how loneliness may moderate these associations. By assessing if individuals with higher loneliness scores report lower self-esteem, mental health, and relationship satisfaction, we aim to understand loneliness's potential impact on these areas of well-being. 

By estimating the mean scores for self-perceived mental health, self-esteem, and relationship satisfaction across different levels of loneliness, we should be able to see how these scores vary for individuals with higher vs. lower levels of loneliness and determining if these variations indicate significant associations.

I am interested in the relationship between loneliness levels and each of the well-being variables (self-esteem, mental health, and relationship satisfaction) and whether loneliness significantly influences the strength or direction of the associations between self-esteem, mental health, and relationship satisfaction.


1. **Self-Perceived Mental Health**  
   - Variables: `WELLNESS_self_rated_mental_health`, `WELLNESS_subjective_happiness_scale_happy`, `PSYCH_self_esteem_unknown_scale_think_of_me`
   - Response Scales:  
      - Mental health: Excellent, Very good, Good, Fair, Poor  
      - Happiness scale: 1-7 (1 - Not at all, 2, 3, 4, 5, 6, 7 - A great deal)
      - Self-esteem: 1-7 (1 - Not at all, 2, 3, 4, 5, 6, 7 - A great deal)
   - **Purpose**: These variables offer self-assessments on mental health and happiness, with an additional metric on self-esteem (worrying about others’ opinions). We aim to determine if high self-rated mental health and happiness align with higher self-esteem and satisfaction in relationships.
   - **Visualization**: A combination of histogram for mental health and happiness distributions, and scatter plots to examine associations between self-rated happiness and other mental health variables. This approach will reveal underlying trends in perceived mental health.

2. **Self-Esteem Metrics**  
   - Variables: `PSYCH_rosenberg_self_esteem_satisfied`, `PSYCH_rosenberg_self_esteem_worth`, `PSYCH_rosenberg_self_esteem_good_qualities`
   - Response Scale: strongly agree, agree, disagree, strongly disagree
   - **Purpose**: These self-esteem metrics provide deeper insights into respondents’ sense of self-worth and satisfaction. By exploring these metrics in conjunction with relationship satisfaction, we can assess if higher self-esteem correlates with better self-perceived mental health.
   - **Visualization**: A series of stacked bar charts can display each self-esteem measure's distribution and allow comparison across levels of mental health. This will visually capture potential links between self-esteem and relationship satisfaction.

3. **Social Satisfaction**  
   - Variable: `WELLNESS_satisfied_relationship`
   - Response Categories: (Extremely dissatisfied, Dissatisfied, Somewhat dissatisfied, Neither satisfied nor dissatisfied, Somewhat satisfied, Satisfied, Extremely satisfied)
   - **Purpose**: Satisfaction in relationships may indicate overall well-being. Exploring its relationship with mental health and loneliness could reveal if stronger social ties improve mental health outcomes.
   - **Visualization**: A violin plot showing the spread of relationship satisfaction across self-esteem and mental health levels can highlight patterns in satisfaction, offering a detailed view of relationship satisfaction distribution.

4. **Loneliness Metric**  
   - Variable: `LONELY_existential_loneliness_scale_understand`
   - Response Scale: 1-9 (where 1 is strongly agree and 9 is strongly disagree)
   - **Purpose**: To see if high loneliness scores are inversely associated with self-perceived mental health and satisfaction. We anticipate that higher loneliness scores are related to lower relationship satisfaction.
   - **Visualization**: A Bar Plot Categorize participants by their loneliness levels and plot the average relationship satisfaction score for each group. This could visually show how relationship satisfaction changes across different levels of loneliness, making it easier to spot trends in satisfaction associated with loneliness levels.

#### Analysis Plan

1. **Summarizing Data with pandas**:
   - Begin by calculating basic summary statistics for each variable. For example, examine the average self-esteem score across different levels of self-perceived mental health and relationship satisfaction.
   - Visualize self-perceived mental health ratings (e.g., Excellent, Very Good, Good, Fair, Poor) against self-esteem and relationship satisfaction scores using grouped bar plots. These visualizations will help us quickly see any patterns or trends in the data, such as whether higher self-esteem or satisfaction ratings align with better self-rated mental health.

2. **Two-Sample Bootstrapped Confidence Intervals**:
   - We can apply bootstrapping to construct confidence intervals around the mean self-esteem and relationship satisfaction scores for each category of self-perceived mental health. This will allow us to estimate if the average self-esteem and relationship satisfaction significantly differ across mental health categories without relying on parametric assumptions.
   - For instance, by comparing confidence intervals, we can check if individuals with “Excellent” self-perceived mental health have significantly higher average self-esteem and relationship satisfaction than those with “Good” or “Fair” ratings.

3. **Permutation Test to Assess Associations**:
   - We can perform a permutation test to evaluate if there is a statistically significant association between self-perceived mental health and self-esteem. In this test, we randomly shuffle self-esteem scores across the sample and calculate a new distribution of associations, allowing us to see whether the observed association between self-esteem and mental health is likely due to chance.
   - This test can be repeated to assess the relationship between self-perceived mental health and relationship satisfaction, further validating any observed relationships between the variables.

4. **Linear Regression with Indicator Variable Contrasts**:
   - To quantify how self-perceived mental health affects self-esteem and relationship satisfaction, we could conduct linear regression with indicator variables for each level of mental health (e.g., using “Fair” as a baseline and comparing it to “Good,” “Very Good,” and “Excellent”).
   - Here, self-esteem and relationship satisfaction will serve as the dependent variables in separate regression models. This approach will allow us to determine the strength and significance of the association between self-perceived mental health levels and these outcomes while controlling for potential confounding variables if available.

#### Hypothesis and Expected Results

**Hypothesis**:
We hypothesize that individuals with higher self-perceived mental health ratings (e.g., "Excellent" or "Very Good") will report significantly higher self-esteem and relationship satisfaction than those with lower ratings (e.g., "Fair" or "Poor"). This hypothesis is based on the idea that positive mental health perceptions often correspond to higher self-esteem and satisfaction in relationships, potentially due to greater emotional stability, confidence, and perceived support in personal connections.

**Expected Results**:
1. **If the hypothesis is correct**, we would expect:
   - Bootstrapped confidence intervals to show a significant difference in self-esteem and relationship satisfaction between high and low mental health categories, with higher self-esteem and satisfaction scores for those with better mental health.
   - Permutation tests to yield a statistically significant p-value, indicating that the observed relationships between self-perceived mental health and the other two variables are unlikely to be due to chance.
   - In the linear regression models, we expect positive and significant coefficients for higher levels of self-perceived mental health when predicting self-esteem and relationship satisfaction, with "Excellent" or "Very Good" ratings being associated with higher outcomes.

2. **Interpretation**:
   - If these results are obtained, they would suggest that improving individuals’ self-perceived mental health could potentially enhance their self-esteem and relationship satisfaction. These insights could be valuable for mental health interventions that aim to strengthen self-perception and relational well-being as intertwined aspects of emotional health.

---
### Q3:  How does social engagement during physical activities relate to feelings of loneliness and the ability to express one's true self?
This question focuses on exploring whether social engagement during physical activities (such as sports or group exercises) is associated with lower loneliness and greater self-expression. By examining how engagement in social activities influences these outcomes, this study aims to highlight potential benefits of social participation in physical activity contexts. By estimating the average loneliness and self-expression scores across different levels of social engagement during physical activities will allow us to assess whether those who participate socially in physical activities experience lower loneliness and higher authenticity (or self-expression).

I am interested in the relationship between social engagement in physical activities and feelings of loneliness, as well as the relationship between social engagement in physical activities and self-expression or authenticity, examining if individuals who engage socially feel more comfortable being themselves.

1. **Social Engagement in Physical Activities**  
   - Variable: `EXERCISE_social_frequency`
   - Response Categories: Always,Often, Sometimes, Rarely, Never with others
   - **Purpose**: This variable will show how often the respondent socialise with others while exercising. Higher the interaction, higher ability to express themselves and less feelings of loneliness
   - **Visualization**: A bar chart showing the distribution of social engagement categories will provide a clear visual representation of respondent engagement frequency. Comparing these responses will reveal if frequent social engagement is associated with lower loneliness.

2. **Loneliness**  
   - Variable: `LONELY_ucla_loneliness_scale_left_out`
   - Response Categories: hardly, sometimes, often
   - **Purpose**: This variable will measure how often respondents feel lonely. Higher loneliness is expected among those who rarely exercise socially.
   - **Visualization**: A grouped bar chart to show loneliness levels by exercise engagement frequency, making it easy to spot associations between social exercise and loneliness.

3. **Relational Satisfaction**  
   - Variable: `PSYCH_relational_needs_satisfaction_scale_7pt_true_self_without_rejection`
   - Response Scale: 1-9 (1 is strongly agree and 9 is strongly disagree)
   - **Purpose**: This variable gauges comfort with showing one’s true self. By exploring it alongside exercise engagement, we can see if more social exercise settings relate to higher relational satisfaction.
   - **Visualization**: A scatter plot of relational satisfaction scores by exercise frequency, with a trendline indicating possible positive associations. This visualization will help establish if social engagement in physical activity boosts relational comfort.

#### Analysis Plan

1. **Summarizing Data with pandas**:
   - First, calculate descriptive statistics for each variable: average mental health rating, self-esteem score, and relationship satisfaction across different levels of social media use (e.g., Low, Moderate, High usage).
   - Visualize these relationships with bar or box plots to quickly identify trends, such as whether mental health scores or self-esteem appear to decline or rise with increased social media usage. These visualizations will provide an initial sense of any patterns between social media use and the three outcome variables.

2. **Two-Sample Bootstrapped Confidence Intervals**:
   - For each outcome variable (mental health, self-esteem, and relationship satisfaction), we will create confidence intervals to compare average scores between groups with different social media usage levels.
   - Bootstrapping will help us estimate whether differences between, for example, “High” and “Low” social media use groups are statistically significant, without relying on parametric assumptions.

3. **Comparing Groups with Permutation Tests**:
   - To further investigate whether the differences in mental health, self-esteem, and relationship satisfaction scores are statistically meaningful, we’ll perform permutation tests. For each outcome, we’ll compare the observed data against a randomly permuted distribution to see if the association with social media use is likely due to chance.
   - For instance, this will allow us to test if high social media usage is significantly associated with lower mental health scores compared to low social media usage.

4. **Linear Regression with Indicator Variable Contrasts**:
   - We can apply linear regression to further quantify the relationships. Using social media usage levels as an indicator variable, we can model each outcome variable (mental health, self-esteem, and relationship satisfaction) separately. This approach will allow us to observe how each level of social media use affects mental health and relationship outcomes, holding other factors constant.
   - In these models, “Low” social media use could serve as the baseline category. We would then examine the coefficients for “Moderate” and “High” usage to see if they are positive or negative and statistically significant.

#### Hypothesis and Expected Results

**Hypothesis**:
We hypothesize that higher levels of social media usage are associated with lower mental health scores, reduced self-esteem, and lower relationship satisfaction. This hypothesis is grounded in existing literature, which often suggests that frequent social media use can impact mental health and personal relationships due to factors like comparison effects and online social dynamics.

**Expected Results**:
1. **If the hypothesis holds**:
   - Bootstrapped confidence intervals would show significant differences between high and low social media usage groups, with lower average mental health scores, self-esteem, and relationship satisfaction for high social media users.
   - Permutation tests are likely to yield statistically significant p-values, especially when comparing high social media users to low social media users, indicating a non-random association between high social media use and negative mental health outcomes.
   - Linear regression would reveal negative coefficients for “Moderate” and “High” social media use levels when predicting mental health, self-esteem, and relationship satisfaction scores, suggesting that increased social media use correlates with declines in these areas.

2. **Interpretation**:
   - If these results are found, it would suggest that heavy social media usage could negatively impact individuals’ mental health, self-esteem, and interpersonal relationships. This insight could guide mental health recommendations, highlighting the potential value in moderating social media use to improve well-being and relational satisfaction.