## Data Collection Procedures for the 2020 N-SSATS
### Field period and reference date
### The survey reference date for the 2020 N-SSATS was March 31, 2020. The field period was from March 31, 2020, through December 14, 2020.
## Survey universe
### The 2020 N-SSATS facility universe totaled 19,926 facilities, including all 18,434 active treatment facilities on SAMHSA’s I-BHS at a point five weeks before the survey reference date, and 1,492facilities that were added by state substance abuse agencies or otherwise discovered during the data collection period.

In [1]:
# Dependencies and Setup
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy.stats as st

In [3]:
file_path = "Resources/NSSATS/NSSATS_PUF_2020_CSV.csv"

data = pd.read_csv(file_path, low_memory=False)
data.head(12)

Unnamed: 0,CASEID,STATE,STFIPS,DETOX,TREATMT,SMISEDSUD,OWNERSHP,FEDOWN,HOSPITAL,LOCS,...,T_CLIHI_X,T_CLIML_D,T_CLIML_O,T_CLIML_X,T_CLIOP_D,T_CLIOP_O,T_CLIOP_X,T_CLIRC_D,T_CLIRC_O,T_CLIRC_X
0,1,AK,2,0,1,1,6,3.0,0,,...,,,,1.0,,,4.0,,,
1,2,AK,2,0,1,1,2,,0,,...,,,,,,,,,,4.0
2,3,AK,2,1,1,0,2,,0,,...,,,1.0,,,1.0,,,,
3,4,AK,2,0,1,1,2,,0,,...,,,,,,,,,,2.0
4,5,AK,2,1,1,0,2,,0,,...,,,,,,,,,,
5,6,AK,2,0,1,1,2,,1,1.0,...,,,,,,,,,,
6,7,AK,2,0,1,0,2,,1,1.0,...,,,,,,,,,,
7,8,AK,2,0,1,1,2,,0,,...,,,,,,,,,,
8,9,AK,2,0,1,0,2,,0,,...,,1.0,,,1.0,,,,,
9,10,AK,2,0,1,0,2,,0,,...,,,,,,,,,,


In [28]:
len(data)

16066

### Initial EDA Guiding Question: Why is the opioid mortality rate so high in WV?

In [11]:
WV = df.loc[df.STATE == 'WV']
WV

Unnamed: 0,CASEID,STATE,STFIPS,DETOX,TREATMT,SMISEDSUD,OWNERSHP,FEDOWN,HOSPITAL,LOCS,...,T_CLIHI_X,T_CLIML_D,T_CLIML_O,T_CLIML_X,T_CLIOP_D,T_CLIOP_O,T_CLIOP_X,T_CLIRC_D,T_CLIRC_O,T_CLIRC_X
15870,15871,WV,54,0,1,1,2,,0,,...,,,,,,,,,,
15871,15872,WV,54,0,1,1,1,,0,,...,,,,,,,,,,
15872,15873,WV,54,1,1,0,1,,0,,...,,,5.0,,,5.0,,,,
15873,15874,WV,54,1,1,0,1,,0,,...,,,4.0,,,4.0,,,,
15874,15875,WV,54,1,1,0,1,,0,,...,,,5.0,,,5.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15991,15992,WV,54,0,1,1,2,,0,,...,,,,,,,,,,
15992,15993,WV,54,1,1,1,2,,0,,...,,,,,,,,,,
15993,15994,WV,54,1,1,1,6,1.0,1,1,...,1.0,,,5.0,,,5.0,,,5.0
15994,15995,WV,54,0,1,1,2,,0,,...,,,,,,,,,1.0,


In [18]:
columns = WV.columns
columns

Index(['CASEID', 'STATE', 'STFIPS', 'DETOX', 'TREATMT', 'SMISEDSUD',
       'OWNERSHP', 'FEDOWN', 'HOSPITAL', 'LOCS',
       ...
       'T_CLIHI_X', 'T_CLIML_D', 'T_CLIML_O', 'T_CLIML_X', 'T_CLIOP_D',
       'T_CLIOP_O', 'T_CLIOP_X', 'T_CLIRC_D', 'T_CLIRC_O', 'T_CLIRC_X'],
      dtype='object', length=262)

In [29]:
n_caseid = WV.CASEID.count()
print(f"There are responses from {n_caseid} West Virginia treatment centers in this dataset.") 

There are responses from 126 West Virginia treatment centers in this dataset.


In [17]:
wv_detox_vc = WV.DETOX.value_counts()
wv_detox_vc

0    86
1    40
Name: DETOX, dtype: int64

### Discover how to calculate percent offering detoxification

In [36]:
wv_percent_detox = round(wv_detox_vc[1]/(wv_detox_vc[0]+wv_detox_vc[1]),2)
wv_percent_detox

0.32

### Question to self: Which boolean (0,1) variables would we like to loop through to calculate variables?

### 1.DETOX: Facility offers detoxification
### 2.TREATMT: Facility offers substance use treatment
### 3.SMISEDSUD: Facility offers treatment for co-occurring serious mental illness (SMI)/serious emotional disturbance (SED) and substance use disorders

### Psuedocode: For every variable_name in my list of Y/N (0/1) variables, calculate value_counts and then calculate % facilities offering said variable

In [48]:
important_variables = ['DETOX', 'TREATMT', 'SMISEDSUD']

stats = {}
for variable in important_variables:
    y = WV[variable].value_counts()
    z = round(y[1]/(y[0]+y[1]),2)
    stats[variable] = z
stats

{'DETOX': 0.32, 'TREATMT': 0.99, 'SMISEDSUD': 0.83}

### Loop through the entire dataset to pull a dictionary of these statistics for each State

### For later...
### ASSESSMENT: Number of assessment and pre-treatment services offered by this facility (0-8) (pg.31)

In [30]:
for col in columns:
    print(col)

CASEID
STATE
STFIPS
DETOX
TREATMT
SMISEDSUD
OWNERSHP
FEDOWN
HOSPITAL
LOCS
LOC5
OTP
ASSESSMENT
TESTING
MEDICAL
TRANSITION
RECOVERY
EDUCATION
ANCILLARY
OTHER_SRVC
PHARMACOTHERAPIES
SRVC89
SRVC90
SRVC1
SRVC2
SRVC107
SRVC91
SRVC93
SRVCEDCON
SRVCORAL
SRVC10
SRVC11
SRVC73
SRVC74
SRVC14
SRVC15
SRVC16
SRVCMETA
SRVCHAV
SRVCHBV
SRVC37
SRVC27
SRVCODED
SRVCOUTCM
SRVC97
SRVC102
SRVC39
SRVC38
SRVC36
SRVCCOACH
SRVC24
SRVC104
SRVC99
SRVC100
SRVC105
SRVC6
SRVC5
SRVC4
SRVC103
SRVCVOCED
SRVC49
SRVC96
SRVC50
SRVC52
SRVC98
SRVC59
SRVC101
SRVC48
SRVC75
SRVC117
SRVC118
SRVC119
SRVC70
SRVC71
SRVC108
SRVC88
SRVC94
SRVC106
SRVC95
SRVC85
SRVC87
SRVC86
SRVC129
SRVC130
SRVCMEDHIV
SRVCMEDHCV
SRVCMEDLOFE
SRVCMEDCLON
SRVC30
SRVC120
SRVC34
SRVC33
SRVC64
SRVC63
SRVC62
SRVC113
SRVC114
SRVC115
SRVC61
SRVC31
SRVCPAINSA
SRVC32
SRVC121
SRVC122
SRVC116
SRVC35
CTYPE4
CTYPEHI1
CTYPEHI2
CTYPE7
CTYPERC1
CTYPERC3
CTYPERC4
CTYPE1
CTYPE6
CTYPEML
CTYPEOP
CTYPE2
CTYPE3
SIGNLANG
LANG
LANG1
LANG2
LANG3
LANG4
LANG5
LANG6
LANG7
LANG8
LAN

In [None]:
df = pd.DataFrame(data)

In [4]:
top_ten_states = ['WV', 'DE', 'NH', 'OH', 'PA', 'KY', 'MD', 'MA', 'ME', 'RI']