# Research Analysis Proposal: CSCS
---

### **Analysis 1: Impact of Social Media on Perceived Social Connection**

**Research Question:**  
What is the relationship between the frequency of social media usage and social connection levels with friends among Canadian respondents?

**Variables:**  
1. **CONNECTION_social_media_time_per_day** (independent quantitative variable): This measures how many hours respondents have spent using social media per day in the past week. 
<br>
Visualization: <br>
KDE - A KDE can illustrate the distribution of hours spent on social media. This is useful because it helps visualize the common ranges of social media usage among respondents and identify any patterns or outliers in the data. It illustrates the data with a smooth curve, representing all of the data, making it easy to see where most of the data falls and if there are any trends.

<br>

2. **CONNECTION_social_time_friends_p7d** (dependent quantitative variable): This measures how many hours respondents have spent time with friends in the past week. It reflects the extent of real-life social interactions, which is the outcome of interest in this analysis.
<br>
Visualization: <br>
Box Plot - A box plot clearly shows the median and interquartile range for how many hours respondents have spent with their friends. It indicates outliers which can be useful for this data since some people may respond very differently for this type of question. 
<br>

**Assumptions:**
<br>
- Linearity: There is a linear relationship between the independent variable (social media usage) and the dependent variable (time spent with friends). This can be assessed visually using scatter plots.
- Independence: Observations are independent of each other. Each respondent's social media usage and time spent with friends should not be influenced by others' responses.
- Homoscedasticity: The variance of the residuals (errors) is constant across all levels of the independent variable. You can check this by plotting the residuals against the predicted values and looking for a consistent spread.
- Normality of Residuals: The residuals of the model should be approximately normally distributed. This can be assessed using a histogram or a Q-Q plot of the residuals.

**Hypothesis: It is expected that as the frequency of social media usage increases, the social time spent with friends will decrease.**

**Planned Analysis: Simple Linear Regression**  
<br>
Y=β 
0
​
 +β 
1
​
 x+ϵ
<br>
The coefficient 
𝛽
1
​
  will indicate the direction and strength of the relationship. A negative 
𝛽
1
​
  suggests that higher social media usage is associated with less time spent with friends. The null hypothesis is that 𝛽1 = 0. The alternative hypothesis is that the null hypothesis is false.
<br>

**Visualization** - Scatter Plot <br> 
A scatter plot will visually illustrate any potential linear relationship and can help in interpreting the correlation results. It allows for a clear visual representation of the relationship (or lack thereof) between the two variables.  

**Hypothesized Results**<br>  
It's expected that as social media usage increases, time spent with friends will decrease. So, a negative correlation is expected. This is because social media often takes away time that could be spent interacting face-to-face and shifts the social world online. These findings could have important implications for understanding how digital communication affects real-life social relationships and may inform recommendations for balancing online and offline social interactions.

If I reject the null hypothesis, it suggests a significant relationship where higher social media usage is linked to reduced in-person interactions. If the null hypothesis is not rejected, it suggests that there is not enough evidence to support the claim that social media usage lowers time spent with friends. 


---

### **Analysis 2: Effect of Income on Participation in Social Activities**

**Research Question:**  
What is the relationship between income level and the frequency of participation in social activities?

**Variables:**  
1. **DEMO_household_income** (independent quantitative variable separated into categories): Respondents' yearly household income until Dec. 31, 2022.<br>
Visualization: <br>
Bar Plot - Since the question has already separated the income values into ranges for the respondents to pick from, a bar plot would be the best visualization tool. A bar plot can effectively display the distribution of household incomes among respondents separeted by bins of ranges. 
<br>

2. **CONNECTION_activities_meeting_organization_p3m** (dependent qualitative variable): How often respondents have attended a meeting of other organization(s) (i.e. outside of work).<br>
Visualization: <br>
Bar Plot - A bar plot can be used to display the frequency of responses for social activity participation. Each bar would represent a category of participation (e.g., "Never," "Monthly," etc.), with the height of the bars indicating the number of respondents in each category. This visualization provides a clear overview of how respondents engage socially outside of work and can help identify trends in participation related to different income levels.
<br>

**Assumptions:**

- Independence of Observations: Each respondent's income and participation level should be independent. One respondent's answers should not influence another's.
- Sampling Distribution: The sampling distribution of the sample mean should be approximately normal. 
- Homogeneity of Variance: Across different income groups, the variances among groups should be roughly equal.

**Hypothesis: It is expected that higher income levels will lead to increased participation in social activities among respondents.**

**Planned Analysis: One-sided Hypothesis Test** <br>  
Null Hypothesis (
𝐻
0
​
 ): There is no increase in participation in social activities with higher income (i.e., mean participation frequency is constant or decreases).
<br> Alternative Hypothesis (
𝐻
𝑎
​
 ): Higher income leads to increased participation in social activities (i.e., mean participation frequency increases).
P-Value Calculation: Use bootstrapping to create a sampling distribution under the null hypothesis and calculate the p-value for the one-sided test.

If the p-value < 
𝛼
=
0.05
, reject 
𝐻
0
​
  (indicating that income levels influence social activity participation positively).
If the p-value 
≥
𝛼
, fail to reject 
𝐻
0
​
  (suggesting insufficient evidence to support that higher income leads to increased participation).

**Visualization** - Box Plot
<br> A box plot is ideal for comparing the distribution of social activity participation across different income groups. It provides a clear view of the median, quartiles, and potential outliers for each income category, allowing you to see differences in participation frequency easily.
By categorizing income into ranges, the box plot can visually highlight any trends or disparities in social activity participation associated with those income levels.

**Hypothesized Results**  
<br> It is expected that higher income levels will report higher participation levels in community events. So, the null hypothesis is expected to be rejected. If income significantly affects social participation, it may indicate that economic resources support social engagement. If no significant difference is found, this could suggest that social participation is accessible across income levels. Understanding this relationship can inform social programs to enhance inclusivity in community participation. This relationship could provide insights for policymakers and community organizers in designing programs that promote inclusivity and access to social activities, particularly for lower-income populations.

---

### **Analysis 3: Association Between Age and Loneliness**

**Research Question:**  
What is the relationship between age and levels of loneliness?

**Variables:**  
1. **DEMO_age** (independent quantitative variable): Ages of respondents. <br>
Visualization: <br>
Histogram - A histogram is the best visualization method for the ages of the respondents because they can be separated into bins of different age ranges. This makes it easier to analyse trends among different age groups such as teenagers, middle-aged adults, and seniors.
<br>

2. **LONELY_dejong_emotional_social_loneliness_scale_miss** (dependent qualitative variable): The extent to which the following phrase applies to respondents' situations. - "I miss having people around." <br>
Visualization: <br>
Bar Plot - A bar plot can clearly show the frequency of data in each qualitative option that was given to the respondents. These ordinal categories can then be organized into a bar plot with the frequency of data for each, making it easy to see if there were more common answers than others.

**Assumptions:**

- Independence of Observations: Each respondent’s age and loneliness level should be independent. Each individual's responses should not affect others.
- Sampling Distribution: The sampling distribution of the sample mean should be approximately normal.
- Homogeneity of Variance: The variances among groups should be roughly equal.

**Planned Analysis: One-sided Hypothesis Test** <br>  
Null Hypothesis (
𝐻
0
​
 ): There is no increase in loneliness levels with age (i.e., age does not contribute to higher loneliness).
Alternative Hypothesis (
𝐻
𝑎
​
 ): Increased age is associated with higher loneliness levels (i.e., loneliness increases with age).
P-Value Calculation: Use bootstrapping to create a sampling distribution and calculate the p-value for the one-sided hypothesis test.

If the p-value < 
𝛼
=
0.05
, reject 
𝐻
0
​
  (suggesting that age is associated with increased loneliness).
If the p-value 
≥
𝛼
, fail to reject 
𝐻
0
​
  (indicating insufficient evidence to support the claim that age is associated with increased loneliness).

**Visualization** - Violin Plot <br>
It allows you to see the median and interquartile ranges while also illustrating the distribution shape, giving insights into how loneliness levels vary by age.


**Hypothesized Results** <br>

It’s expected that older age groups may report higher loneliness levels, especially relating to missing having people around. So, the null hypothesis is expected to be rejected. This could be due to lifestyle changes since their kids have moved away and they might have less people surrounding them at all times in their homes. Understanding age and loneliness associations can guide mental health and social support initiatives, improving targeted interventions based on age demographics.

---


Project Group Request: Edie Chen, Jason Li, and Zain Elsayed