# Statistical Analysis Plan
## by Sam Colgan and Joseph Lee

To address our first objective—which concerns the prevalence of depression among gun owners—we will create a contingency table between depression status (`ADDEPEV3`) and gun ownership status (`FIREARM5`). `ADDEPEV3` is a binary indicator for whether the respondent has ever been told they have a depressive disorder, while `FIREARM5` is a binary indicator for whether the respondent has firearms in the home. We will evaluate whether there is a statistically significant association between the two variables by running a Chi-square test. 

Our second objective addresses whether there is an association between depression (`ADDEPEV3`) / poor mental health status (`_MENT14D`) and gun ownership (`FIREARM5`). `_MENT14D` is an ordinal variable where level 1 corresponds to 0 days of poor mental health status within the past 30 days, level 2 corresponds to 1–13 days, and level 3 corresponds to 14 or more days. 

We will evaluate this objective by running two logistic regression where gun ownership is our dependent variable. In the first regression, depression status is our predictor while our controls are age group (`_AGE80`; 5-year age categories from 18 to 80+), sex (`SEXVAR`; male or female), race/ethnicity (`_RACEGR3`; white, Black, other, multiracial, or Hispanic), health insurance (`PRIMINS1`; private, public, other, or uninsured), highest level of education (`EDUCA`; high school or less, some college, Bachelor’s or more), and income group (`INCOME3`; 11 categories). Our second regression will include the same outcome and covariates as the first regression, but the predictor will change to poor mental health status where 0 days of poor mental health is the reference category. 

Our third objective seeks to investigate the following:  among gun owners, is having depression (`ADDEPEV3`) or poor mental health status (`_MENT14D`) associated with unsafe gun storage practices? Among respondents indicating gun ownership status (`FIREARM5`), unsafe gun storage practices will be defined with `LOADULK2`, a binary “Yes/no” outcome which indicates the respondent having any loaded firearms unlocked. Similar to our second objective, we will evaluate this question using two logistic regression models corresponding to the same two predictors (depression and mental health status). These models reprise the same combination of covariates (age, sex, race/ethnicity, education, income group, insurance status) but will additionally control for the individual having any children less than 18 years of age (`CHILDREN`)—which we anticipate to be associated with more safe gun storage practices. 

All analyses will be performed using the Python programming language within the PyCharm IDE and Jupyter Notebook open-source web application. Packages we will use include: `os` (to set our directories), `pandas` (data loading/exporting, manipulation, and cleaning), `matplotlib` (for graphs and visualization), `scipy.stats` (for hypothesis testing and significance), `sklearn` (for logistic regressions), `statsmodels.formula.api` (for regressions and statistical testing) and `seaborn` (for heatmap creation). 

To visualize our results, we will create regression tables displaying the marginal effects, confidence intervals, and p-values from our logistic models examining the association between mental health status/depression and gun ownership, with covariates included. We will also include a summary statistics table for all our variables, reporting the mean, standard deviation, maximum, minimum, and median for continuous variables and the proportions in each group for categorical variables. To help illustrate the story visually, we will include bar charts visualizing the adjusted probabilities (from our logistic models) of gun ownership and unsafe gun storage by depression status and mental health status. We will also create a heatmap to show the heterogeneity of association across U.S regions (states in the Northeast, Southeast, Midwest, Southwest, and West). On the y-axis will be the region, on the x-axis will be mental health status or depression status. The heat color/legend will display the adjusted probability (from our logistic models) of our outcomes — gun ownership and unsafe gun storage.