# Veteran Suicide Prevention Scratchpad

## Standard Library Imports

In [1]:
# for data manipulation 
import pandas as pd
import numpy as np

# for data visualization
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

## Getting the data

- originally, an xl spreadsheet

- converted over to GoogleSheet 

In [2]:
import re # just in case I need to clean up the data with regex after it comes in
    
# Original Link:
# https://docs.google.com/spreadsheets/d/1xiVtqcMEK50ipQpfgFs7iMqxspAFAVRZvxwItnoacFo/edit?usp=sharing

sheet_id = "1xiVtqcMEK50ipQpfgFs7iMqxspAFAVRZvxwItnoacFo"

va_df= pd.read_csv(f"https://docs.google.com/spreadsheets/d/{sheet_id}/export?format=csv")

pd.set_option("display.max_columns", None)

va_df

Unnamed: 0,Name,Type,Length,Description,Rand Type,SAS Rand Function
0,Alprazolam12,Numeric,8,Alprazolam tx in prior 12 months,Binomial,"Alprazolam12=rand(""Binomial"",p,n);"
1,Alprazolam24,Numeric,8,Alprazolam tx in prior 24 months,Binomial,"Alprazolam24=rand(""Binomial"",p,n);"
2,als12,Numeric,8,ALS dx in prior 12 months,Binomial,"als12=rand(""Binomial"",p,n);"
3,als24,Numeric,8,ALS dx in prior 24 months,Binomial,"als24=rand(""Binomial"",p,n);"
4,ami12,Numeric,8,Acute myocardial infarction in prior 12 months,Binomial,"ami12=rand(""Binomial"",p,n);"
...,...,...,...,...,...,...
466,UCvisits_prior3,Numeric,8,Number Urgent Care visits in past 3 months,Normal,"UCvisits_prior3=rand(""Normal"",mean,std);"
467,UCvisits_prior6,Numeric,8,Number Urgent Care visits in past 6 months,Normal,"UCvisits_prior6=rand(""Normal"",mean,std);"
468,weight_pm,Numeric,8,Weight (person-months),Normal,"weight_pm=rand(""Normal"",mean,std);"
469,white,Numeric,8,"Race: white, non-white or unknown",Normal,"white=rand(""Normal"",mean,std);"


**^^Data as is on import.  No corrections, cleaning, or adjustments made.**

## Dictionary of Terms / Data Dictionary

### Column Data:

**Name** - 

**Type** - 

**Length** - 

**Description** - 

**Rand Type** - 

**SAS Rand Function** - 

### Row Data:

**Alprazolam** -

**Numeric** - 

**ALS (als)** - 

**Binomial** - 

**p** - 

**n** - 

**ami** - 

**UCvisits_Prior** - 

**weight_pm** - 

**white** - 

**YearsSinceFirstUse** - 



In [3]:
va_df.columns

Index(['Name', 'Type', 'Length', 'Description', 'Rand Type',
       'SAS Rand Function'],
      dtype='object')

In [4]:
va_df["Name"].nunique()

471

In [5]:
va_df["Type"].nunique()

1

In [6]:
va_df["Length"].nunique()

1

In [7]:
va_df["Description"].nunique()

470

In [8]:
va_df["Rand Type"].nunique()

2

In [9]:
va_df["SAS Rand Function"].nunique()

470

### Since there is only one unique value in "Type" and "Length" columns, I'm dropping them as they add little to the data

In [10]:
va_df = va_df.drop(columns=["Type", "Length"])

va_df

Unnamed: 0,Name,Description,Rand Type,SAS Rand Function
0,Alprazolam12,Alprazolam tx in prior 12 months,Binomial,"Alprazolam12=rand(""Binomial"",p,n);"
1,Alprazolam24,Alprazolam tx in prior 24 months,Binomial,"Alprazolam24=rand(""Binomial"",p,n);"
2,als12,ALS dx in prior 12 months,Binomial,"als12=rand(""Binomial"",p,n);"
3,als24,ALS dx in prior 24 months,Binomial,"als24=rand(""Binomial"",p,n);"
4,ami12,Acute myocardial infarction in prior 12 months,Binomial,"ami12=rand(""Binomial"",p,n);"
...,...,...,...,...
466,UCvisits_prior3,Number Urgent Care visits in past 3 months,Normal,"UCvisits_prior3=rand(""Normal"",mean,std);"
467,UCvisits_prior6,Number Urgent Care visits in past 6 months,Normal,"UCvisits_prior6=rand(""Normal"",mean,std);"
468,weight_pm,Weight (person-months),Normal,"weight_pm=rand(""Normal"",mean,std);"
469,white,"Race: white, non-white or unknown",Normal,"white=rand(""Normal"",mean,std);"


### VIMP: After several hours of trying to track down further information for the data in the columns, I was unable to find it.  For instance, there is no further description for "Name" other than "string;" similar for the other columns.  As of now, the Department of Veterans Affairs (the owner of the data) has not responded to clarification queries and has since taken this dataset offline.

In [11]:
# Original Link:
# https://docs.google.com/spreadsheets/d/18QZWC80YlnF8eMYUugbdtrnzif9fuANos8XZJ_2j27w/edit?usp=sharing

sheet1_id = "18QZWC80YlnF8eMYUugbdtrnzif9fuANos8XZJ_2j27w"

age_adjusted_df= pd.read_csv(f"https://docs.google.com/spreadsheets/d/{sheet1_id}/export?format=csv")

pd.set_option("display.max_columns", None)

age_adjusted_df

Unnamed: 0,2005-2017 National Suicide Data Appendix,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12
0,"All Veteran Deaths by Suicide, Crude and Age A...",,,,,,,,,,,,
1,,,,,,,,,,,,,
2,2005-2017 National Suicide Data Appendix,,,,,,,,,,,,
3,Year\n of\n Death,Veteran\n Suicide\n Deaths,Veteran\n Population\n Estimate,"Veteran\n Crude\n Rate\n per\n 100,000","Veteran\n Age\n Adjusted\n Rate per\n 100,000",Male\n Veteran\n Suicide\n Deaths,Male\n Veteran\n Population\n Estimate,"Male\n Veteran\n Crude\n Rate\n per\n 100,000",Male\n Veteran\n Age\n Adjusted\n Rate per\n 1...,Female\n Veteran\n Suicide\n Deaths,Female\n Veteran\n Population\n Estimate,"Female\n Veteran\n Crude\n Rate\n per\n 100,000",Female\n Veteran\n Age\n Adjusted\n Rate per\n...
4,2005,5787,24240000,23.9,25.5,5610,22501000,24.9,27.3,177,1739000,10.2,10.4
5,2006,5688,23731000,24,24.8,5527,21992000,25.1,26.8,161,1739000,9.3,9.2
6,2007,5893,23291000,25.3,26.5,5724,21588000,26.5,28.7,169,1703000,9.9,9.7
7,2008,6216,22996000,27,28.4,6024,21322000,28.3,30.8,192,1674000,11.5,11.3
8,2009,6172,22603000,27.3,28.3,5968,20917000,28.5,30.4,204,1686000,12.1,12.2
9,2010,6158,22411000,27.5,28.9,5943,20697000,28.7,31.3,215,1714000,12.5,12.3


In [14]:
# Renaming columns:
age_adjusted_df = age_adjusted_df.rename(
            columns={
                "2005-2017 National Suicide Data Appendix": "Year of Death",
                "Unnamed: 1": "veteran_suicide_deaths",
                "Unnamed: 2": "veteran_population_estimate",
                "Unnamed: 3": "veteran_crude_rate_per_100K",
                "Unnamed: 4": "veteran_age_adjusted_rate_per_100K",
                "Unnamed: 5": "male_veteran_suicide_deaths",
                "Unnamed: 6": "male_veteran_population_estimate",
                "Unnamed: 7": "male_veteran_crude_rate_per_100K",
                "Unnamed: 8": "male_veteran_age_adjusted_rate_per_100K",
                "Unnamed: 9": "female_veteran_suicide_deaths",
                "Unnamed: 10": "female_veteran_population_estimate",
                "Unnamed: 11": "female_veteran_crude_rate_per_100K", 
                "Unnamed: 12": "female_veteran_age_adjusted_rate_per_100K",
            },
        )


age_adjusted_df.head()

Unnamed: 0,Year of Death,veteran_suicide_deaths,veteran_population_estimate,veteran_crude_rate_per_100K,veteran_age_adjusted_rate_per_100K,male_veteran_suicide_deaths,male_veteran_population_estimate,male_veteran_crude_rate_per_100K,male_veteran_age_adjusted_rate_per_100K,female_veteran_suicide_deaths,female_veteran_population_estimate,female_veteran_crude_rate_per_100K,female_veteran_age_adjusted_rate_per_100K
0,"All Veteran Deaths by Suicide, Crude and Age A...",,,,,,,,,,,,
1,,,,,,,,,,,,,
2,2005-2017 National Suicide Data Appendix,,,,,,,,,,,,
3,Year\n of\n Death,Veteran\n Suicide\n Deaths,Veteran\n Population\n Estimate,"Veteran\n Crude\n Rate\n per\n 100,000","Veteran\n Age\n Adjusted\n Rate per\n 100,000",Male\n Veteran\n Suicide\n Deaths,Male\n Veteran\n Population\n Estimate,"Male\n Veteran\n Crude\n Rate\n per\n 100,000",Male\n Veteran\n Age\n Adjusted\n Rate per\n 1...,Female\n Veteran\n Suicide\n Deaths,Female\n Veteran\n Population\n Estimate,"Female\n Veteran\n Crude\n Rate\n per\n 100,000",Female\n Veteran\n Age\n Adjusted\n Rate per\n...
4,2005,5787,24240000,23.9,25.5,5610,22501000,24.9,27.3,177,1739000,10.2,10.4


**^^Dropping rows 0, 1, 2, and 3 because they provide no information other than column names.**

In [15]:
age_adjusted_df = age_adjusted_df.drop([0, 1, 2, 3])

age_adjusted_df

Unnamed: 0,Year of Death,veteran_suicide_deaths,veteran_population_estimate,veteran_crude_rate_per_100K,veteran_age_adjusted_rate_per_100K,male_veteran_suicide_deaths,male_veteran_population_estimate,male_veteran_crude_rate_per_100K,male_veteran_age_adjusted_rate_per_100K,female_veteran_suicide_deaths,female_veteran_population_estimate,female_veteran_crude_rate_per_100K,female_veteran_age_adjusted_rate_per_100K
4,2005,5787,24240000,23.9,25.5,5610,22501000,24.9,27.3,177,1739000,10.2,10.4
5,2006,5688,23731000,24.0,24.8,5527,21992000,25.1,26.8,161,1739000,9.3,9.2
6,2007,5893,23291000,25.3,26.5,5724,21588000,26.5,28.7,169,1703000,9.9,9.7
7,2008,6216,22996000,27.0,28.4,6024,21322000,28.3,30.8,192,1674000,11.5,11.3
8,2009,6172,22603000,27.3,28.3,5968,20917000,28.5,30.4,204,1686000,12.1,12.2
9,2010,6158,22411000,27.5,28.9,5943,20697000,28.7,31.3,215,1714000,12.5,12.3
10,2011,6116,22061000,27.7,29.8,5889,20326000,29.0,32.3,227,1735000,13.1,12.9
11,2012,6065,21765000,27.9,30.3,5846,20017000,29.2,33.1,219,1748000,12.5,12.3
12,2013,6132,21415000,28.6,31.7,5901,19640000,30.0,34.7,231,1775000,13.0,13.0
13,2014,6272,21029000,29.8,32.6,5998,19234000,31.2,35.4,274,1795000,15.3,15.2


### Data Dictionary To Better Interpret the Columns:

**veteran_suicide_deaths** - total number of veteran suicides for that year. 

**veteran_population_estimate** - An estimated number of the *total* veteran population, based on adding the numbers found in the 'male_veteran_population_estimate' and 'female_veteran_poplulation_estimate' columns.

**veteran_crude_rate_per_100K** - This is the number of new suicides reported per 100,000 in the veteran community.  The (apparent) formula for achieving this crude rate is = (# of deaths / total # veterans) * 1,000.  It is considered to be a good measure of the overall veteran suicide rate.  Definition applies to both 'male_veteran_crude_rate' and 'female_veteran_crude_rate.'

**veteran_age_adjusted_rate_per_100K** - A direct method of age-adjusted death rates per 100,000 veterans, males, and females; the death rate that the study population (veterans) would have IF it had the same age distribution as the standard population (non-veterans) - *description courtesy of [North Carolina Public Health](https://schs.dph.ncdhhs.gov/schs/pdf/primer13_2.pdf).*