<a href="https://colab.research.google.com/github/jcdumlao14/ESS11DataAnalysis/blob/main/Part_2_CFA_Inter_Marginal_Effects.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Part 2: CFA Inter-Marginal Effects**

# **Interaction Model Results**

## **Summary of Models 1-6: CVD Risk Factors (with Caveats)**
**Models 1, 5, and 6: Unreliable Results**
Models 1 (BMI x Education), 5 (Main Activity x Education), and 6 (Sleep Quality x Age) produced "NaN" values and extremely large coefficients. This makes them completely unreliable for any valid interpretation. There are severe underlying issues that need to be resolved before these models can be used.

**Model 2: Smoking and Gender Interaction**
- Smoking is correlated with CVD risk.
- Gender also plays a role.
- A significant interaction suggests the effect of smoking on CVD risk may be different for males versus females.

**Model 3: Region and BMI Interaction**
- BMI is correlated to CVD, and has significant interactions in North, West, Unknown Region.
- However the distribution is hard to visualize in the current plots.

**Model 4: Alcohol and Gender Interaction**
- Interactions show significant correlations to CVD risk by weekday alcohol consumption across genders.



# **Marginal Effects Plots**

**Key Observations and Interpretations:**

**Smoking and Gender (Marginal Effects of Smoking on CVD by Gender)**:
- For males, as the smoking level increases from 1 to 5, there seems to be a negative slope, resulting in the probability of CVD for males is likely to decrease with smoking.
- For females, there seems to be a drastic positive increase from level 5 to 9 in smoking.

**Sleep Quality and Age (Marginal Effects of Sleep Quality on CVD at Mean Age)**:
- This plot is essentially a step function. Up to a sleep quality level of around 4, the predicted probability of CVD is near zero.
- At a sleep quality level of 4, the probability sharply jumps to 1. This suggests, if this model were valid, that achieving a certain minimum sleep quality threshold is critical for preventing CVD.


**BMI and Region (Marginal Effects of BMI on CVD by Region)**:
- The region of the West displays that the probability of CVD increases as BMI increases.
- The region of the South also shows the similar trends as the West
- North suggests the probability of CVD decrease when BMI increases
- Unknown region is correlated with a higher slope which may mean it is a more important factor and relation.
- East region has an average horizontal line and the other regions influence.


**Main Activity and Education (Marginal Effects of Main Activity on CVD by Education Level)**:
- Marginal effects have no interactions


**BMI and Education (Marginal Effects of BMI on CVD by Education Level)**:
- Marginal effects have no interactions


**Alcohol Consumption and Gender (Marginal Effects of Weekday Alcohol on CVD by Gender):**
- For males, the curve displays that there is a slightly increasing chance for CVD as you consume alcohol in weekdays
- For females, however, the slope is much higher, indicating alcohol may have more influence over females in terms of causing CVD than males.




# **Partial Correlation Network**

**Key Observations and Interpretations**

- **Partial Correlation Network**: The image depicts a network where nodes represent variables (e.g., lifestyle factors, demographics, health conditions), and edges (lines) represent partial correlations between them. A partial correlation indicates the relationship between two variables after controlling for the influence of other variables in the network. This is important because it shows direct relationships rather than spurious ones.
- **Density**: The network appears moderately dense, meaning there are a good number of connections between nodes. This suggests that many of the variables are interrelated.
- **Layout**: The nodes are positioned around a circle, which makes it harder to see clusters, but it does make some connections clearer between opposing groups.
- **Color Mapping and Betweenness Centrality**: Node color is determined by betweenness centrality, with redder nodes having higher betweenness and bluer nodes having lower betweenness. The color scale indicates the range of betweenness centrality values. This helps to visually identify which nodes act as bridges between different parts of the network.


**Centrality Measures**

| Node           | Degree Centrality | Betweenness Centrality | Eigenvector Centrality |
|----------------|-------------------|------------------------|------------------------|
| CVD            | 0.789             | 0.0198                 | 0.225                 |
| BMI            | 0.789             | 0.0193                 | 0.226                 |
| cgtsmok        | 0.895             | 0.0311                 | 0.256                 |
| alcfreq        | 0.947             | 0.0310                 | 0.275                 |
| etfruit        | 0.842             | 0.0254                 | 0.239                 |
| eatveg         | 0.632             | 0.0107                 | 0.174                 |
| dosprt         | 0.737             | 0.0133                 | 0.213                 |
| mainact        | 0.737             | 0.0109                 | 0.219                 |
| slprl          | 0.895             | 0.0243                 | 0.260                 |
| alcwkdy        | 0.579             | 0.0044                 | 0.168                 |
| alcwknd        | 0.842             | 0.0188                 | 0.247                 |
| ppltrst        | 0.737             | 0.0177                 | 0.206                 |
| pplfair_r      | 0.579             | 0.0065                 | 0.156                 |
| pplhlp_r       | 0.684             | 0.0127                 | 0.191                 |
| agea           | 0.684             | 0.0074                 | 0.202                 |
| gndr           | 0.842             | 0.0243                 | 0.241                 |
| region_North   | 0.895             | 0.0312                 | 0.252                 |
| region_South   | 0.789             | 0.0241                 | 0.221                 |
| region_Unknown | 0.737             | 0.0108                 | 0.217                 |
| region_West    | 0.842             | 0.0247                 | 0.240                 |

# **Centrality Measures and Node Importance:**

- **High Betweenness Centrality Nodes (Red/Orange)**:
  - **alcfreq (Alcohol Frequency)**: A central node in the network, with many connections to other factors. Suggests alcohol frequency directly influences many parts of the whole network.
  - **cgtsmok (Cigarette Smoking)**: Like alcfreq, a central node linking different factors, emphasizing the role of smoking in the health/lifestyle network.
  - **region_North, region_South, region_West**: These "region" nodes being orange suggests that they are bridging variables that are not directly interacting among themselves, but are with other nodes, so the areas of north, south, and west tend to be correlated to different parts of the network.
  - **etfruit (eaten fruits)**, also appears to be important by centrality
- **Low Betweenness Centrality Nodes (Blue)**:
  - **pplfair_r (Perception of Fairness)**: Low betweenness suggests that this node has fewer direct connections to bridge across other areas, which might suggest that it is more influenced by certain nodes over others.
  - **alcwkdy (Alcohol on weekdays)**, similarly to pplfair_r, has fewer bridging components.
  - agea
- **Connecting observations with centrality measures:**
  - **The ppltrst node** (trust in people) appearing as a lighter colour of gray suggests that it plays a somewhat neutral role, influencing different regions but not strongly correlated to specific ones.
  - **BMI and CVD** both have a relatively grey/orange colour, representing that they also occupy important areas of the network, but less important than the nodes with the redder shade.

# **General Interpretations & Hypotheses:**

- **Lifestyle Factors and Health**: The network seems to highlight the strong interconnections between lifestyle factors (diet, alcohol, smoking, exercise) and health outcomes (BMI, potentially CVD, mental wellbeing etc.). The direct effects of these lifestyle factors are highlighted in the graph.
- **Social Determinants**: Social determinants of health (like trust in people, fairness) might have more indirect effects in this network, influencing certain aspects (potentially psychological health or adherence to healthy behaviors) but not acting as central bridges.
- **Region and Lifestyle**: The regional indicators suggest there might be some geographic differences in lifestyle patterns and health, independent of other factors.



**In summary**, this partial correlation network gives a useful overview of the direct relationships between various factors influencing health and lifestyle. The centrality measures, particularly betweenness, highlight the key nodes that act as bridges connecting different parts of the network. Further analysis, with the explicit values and strengths of each measure, and potentially more focused examination of subgroups of variables, would be needed for a more detailed understanding.


#### 👉 See the plot at this link: https://github.com/jcdumlao14/ESS11DataAnalysis/blob/main/CFA_Inter_Marginal_Effects.ipynb