CONTENT WARNING: explicit and multiple references to suicide attemps and suicidal ideation

The following is an analysis regarding attempted suicides and suicidal thoughts. Questions around these topics will be directly referenced and mentioned explicitly. While their potential correlations with other parameters will be analysed and discussed at length, suicide attempts and suicidal ideation as concepts will not be discussed. Deaths by suicide will not be referenced or discussed at all.

Please only continue if it comfortable for you to do so.

In [92]:
import pandas as pd
import statsmodels.api as sm
import numpy as np
from sklearn.cross_validation import train_test_split
import seaborn as sb
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE

National Survey on Drug Use and Health 2017 results, from SAMHDA, Substance Abuse and Mental Health Data Archive (https://www.datafiles.samhsa.gov/study-series/national-survey-drug-use-and-health-nsduh-nid13517), after agreeing to their terms of use.

In [36]:
df = pd.DataFrame.from_csv('NSDUH_2017_Tab.tsv',sep='\t')

  """Entry point for launching an IPython kernel.


In [6]:
df

Unnamed: 0_level_0,FILEDATE,CIGEVER,CIGOFRSM,CIGWILYR,CIGTRY,CIGYFU,CIGMFU,CIGREC,CIG30USE,CG30EST,...,POVERTY3,TOOLONG,TROUBUND,PDEN10,COUTYP4,MAIIN102,AIIND102,ANALWT_C,VESTR,VEREP
QUESTID2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
55235143,10/09/2018,1,99,99,13,9999,99,4,93,93,...,3.0,2,2,1,1,2,2,11203.888954,40043,1
13435143,10/09/2018,1,99,99,15,9999,99,1,18,99,...,3.0,1,2,1,1,2,2,9496.462244,40006,2
81345143,10/09/2018,1,99,99,14,9999,99,1,10,99,...,3.0,2,2,1,1,2,2,2943.702802,40030,2
53955143,10/09/2018,1,99,99,16,9999,99,4,93,93,...,3.0,2,2,2,2,2,2,1783.702549,40026,2
51775143,10/09/2018,2,99,99,991,9991,91,91,91,91,...,3.0,1,1,1,1,2,2,31528.749357,40029,1
47796143,10/09/2018,1,99,99,15,9999,99,4,93,93,...,1.0,2,2,3,3,2,2,13593.927387,40035,1
13196143,10/09/2018,1,99,99,15,9999,99,3,93,93,...,3.0,2,2,3,3,2,2,3486.457416,40011,2
81726143,10/09/2018,1,99,99,14,9999,99,3,93,93,...,2.0,2,2,2,2,2,2,782.266930,40032,2
61536143,10/09/2018,1,99,99,985,9998,98,14,93,93,...,1.0,2,2,2,3,2,2,836.875263,40024,2
10636143,10/09/2018,2,4,4,991,9991,91,91,91,91,...,1.0,2,2,1,1,2,2,782.663302,40001,2


2667 columns, that is a lot of parameters, and a huge majority of them are categorical

Decriptions of parameters can be found in codebook

'SUICTHNK' -  At any time in the past 12 months, that is from [DATEFILL] up to and including today, did you seriously think about trying to kill yourself?, yes = 1, no = 2, bunch of other values available

'SUICPLAN' - During the past 12 months, did you make any plans to kill yourself?, yes = 1, no = 2

'SUICTRY' - During the past 12 months, did you try to kill yourself?

'MHSUITHK' - same as SUICTHNK, no = 0, yes = 1, all others = nan

'MHSUIPLN' - same as SUICPLAN, no = 0, yes = 1, all others = nan

'MHSUITRY' - same as SUICTRY, no = 0, yes = 1, all others = nan

'MHSUTK_U' - same as above, but no/unknown are grouped together

'ADWRSTHK' - Did you think about committing suicide? (think about the period of time/most recent period of time when your [FEELNOUN] and other problems were the worst.), yes = 1, no = 2

'ADWRSPLN' - same as above, but Did you make a suicide plan?

'ADWRSATP' - same as above, but Did you make a suicide attempt?

'AD_MDEA9' - ANY THOUGHTS OR PLANS OF SUICIDE, 1 = has symptoms, 2 = does not

'SIMHSUI2' - whether a respondent received their most recent mental health services from at least one inpatient/residential specialty mental health source in the past year because they thought about or tried to kill themselves, 0 = no, 1 = yes

'SOMHSUI' -  whether a respondent received their most recent mental health services from at least one outpatient specialty mental health source in the past year because they thought about or tried to kill themselves, 0 = no, 1 = yes

'SMHSUI2' -  whether a respondent received their most recent mental health services from at least one specialty mental health source in the past year because they thought about or tried to kill themselves, 0 = no, 1 = yes

'YOWRSTHK' - same as 'ADWRSTHK' but for youth (12-17)

'YOWRSPLN' - same as above, but with plan

'YOWRSATP' - same as above, but suicide attempt

'YO_MDEA9' - same as 'ADWRSATP', but for youth (12-17 yrs)

In [7]:
use_col = ['SUICTHNK','SUICPLAN','SUICTRY','MHSUITHK','MHSUIPLN','MHSUITRY','MHSUTK_U','ADWRSTHK','ADWRSPLN',
           'ADWRSATP','AD_MDEA9','SIMHSUI2','SOMHSUI','SMHSUI2','YOWRSTHK','YOWRSPLN','YOWRSATP','YO_MDEA9']
df[use_col].describe()

Unnamed: 0,SUICTHNK,SUICPLAN,SUICTRY,MHSUITHK,MHSUIPLN,MHSUITRY,MHSUTK_U,ADWRSTHK,ADWRSPLN,ADWRSATP,AD_MDEA9,SIMHSUI2,SOMHSUI,SMHSUI2,YOWRSTHK,YOWRSPLN,YOWRSATP,YO_MDEA9
count,56276.0,56276.0,56276.0,42240.0,42237.0,42237.0,42554.0,56276.0,56276.0,56276.0,56276.0,320.0,1792.0,1896.0,56276.0,56276.0,56276.0,56276.0
mean,26.128367,94.519369,94.527063,0.061269,0.020006,0.009612,0.060817,85.948681,93.597146,93.604698,85.303469,0.553125,0.297991,0.313819,94.265939,96.37835,96.382206,94.021235
std,41.965473,20.373581,20.337973,0.239826,0.140023,0.097572,0.238997,33.170977,22.276343,22.245322,33.012253,0.497948,0.457503,0.464166,20.944202,15.750741,15.727177,21.049307
min,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0
25%,2.0,99.0,99.0,0.0,0.0,0.0,0.0,99.0,99.0,99.0,98.0,0.0,0.0,0.0,99.0,99.0,99.0,99.0
50%,2.0,99.0,99.0,0.0,0.0,0.0,0.0,99.0,99.0,99.0,98.0,1.0,0.0,0.0,99.0,99.0,99.0,99.0
75%,2.0,99.0,99.0,0.0,0.0,0.0,0.0,99.0,99.0,99.0,98.0,1.0,1.0,1.0,99.0,99.0,99.0,99.0
max,99.0,99.0,99.0,1.0,1.0,1.0,1.0,99.0,99.0,99.0,99.0,1.0,1.0,1.0,99.0,99.0,99.0,99.0


Note: this is discussed in the codebook, there are several parameters which are derived from a regression model in the 2012 iteration of this study which are recommended to not be used when studying suicidal attempts and ideation (among other things), as they were found to systematically overpredict the occurance of these phenomena. I will not be using them any modeling.

In [8]:
print('Num adults thought about killing themselves in last year: '+str(len(df[df.MHSUITHK==1])))
print('Num adults made plans to kill themselves in last year: '+str(len(df[df.MHSUIPLN==1])))
print('Num adults attemped to kill themselves in last year: '+str(len(df[df.MHSUITRY==1])))
print('Num adults thought at difficult time about killing themslves: '+str(len(df[df.ADWRSTHK==1])))
print('Num adults made plans at difficult time to kill themslves: '+str(len(df[df.ADWRSPLN==1])))
print('Num adults attemped at difficult time to kill themslves: '+str(len(df[df.ADWRSATP==1])))
print('Num adults any thoughts/attempts to kill themslves: '+str(len(df[df.AD_MDEA9==1])))
len(df[(df.ADWRSTHK==1) | (df.ADWRSPLN==1) | (df.ADWRSATP==1) | (df.MHSUITHK==1) | (df.MHSUIPLN==1) | (df.MHSUITRY==1)])

Num adults thought about killing themselves in last year: 2588
Num adults made plans to kill themselves in last year: 845
Num adults attemped to kill themselves in last year: 406
Num adults thought at difficult time about killing themslves: 3121
Num adults made plans at difficult time to kill themslves: 1183
Num adults attemped at difficult time to kill themslves: 761
Num adults any thoughts/attempts to kill themslves: 4988


4153

Why does the sum of all those not equal 4988, all adults who had any suicidal thoughts/attempts?

In [9]:
use_col = ['SUICTHNK','SUICPLAN','SUICTRY','MHSUITHK','MHSUIPLN','MHSUITRY','MHSUTK_U','ADWRSTHK','ADWRSPLN',
           'ADWRSATP','AD_MDEA9','SIMHSUI2','SOMHSUI','SMHSUI2','YOWRSTHK','YOWRSPLN','YOWRSATP','YO_MDEA9',
           'YUHOSUIC','YURSSUIC','ADWRDBTR','ADWRDLOT']

df[df.AD_MDEA9==1][use_col]

Unnamed: 0_level_0,SUICTHNK,SUICPLAN,SUICTRY,MHSUITHK,MHSUIPLN,MHSUITRY,MHSUTK_U,ADWRSTHK,ADWRSPLN,ADWRSATP,...,SOMHSUI,SMHSUI2,YOWRSTHK,YOWRSPLN,YOWRSATP,YO_MDEA9,YUHOSUIC,YURSSUIC,ADWRDBTR,ADWRDLOT
QUESTID2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
37837143,2,99,99,0.0,0.0,0.0,0.0,2,99,99,...,,,99,99,99,99,99,99,2,1
34608143,2,99,99,0.0,0.0,0.0,0.0,1,2,2,...,,,99,99,99,99,99,99,1,1
24160143,2,99,99,0.0,0.0,0.0,0.0,2,99,99,...,,,99,99,99,99,99,99,1,1
41932143,2,99,99,0.0,0.0,0.0,0.0,2,99,99,...,,,99,99,99,99,99,99,1,1
63244143,2,99,99,0.0,0.0,0.0,0.0,1,2,2,...,,,99,99,99,99,99,99,1,1
19246243,2,99,99,0.0,0.0,0.0,0.0,2,99,99,...,,,99,99,99,99,99,99,1,2
49297243,2,99,99,0.0,0.0,0.0,0.0,1,2,2,...,,,99,99,99,99,99,99,1,1
34879243,2,99,99,0.0,0.0,0.0,0.0,1,2,2,...,,,99,99,99,99,99,99,1,1
47040243,2,99,99,0.0,0.0,0.0,0.0,1,2,2,...,,,99,99,99,99,99,99,1,1
24770243,1,1,2,1.0,1.0,0.0,1.0,1,1,2,...,,,99,99,99,99,99,99,1,1


In [10]:
len(df[(df.ADWRSTHK==1) | (df.ADWRSPLN==1) | (df.ADWRSATP==1) | (df.ADWRDBTR==1) | (df.ADWRDLOT==1)])

4988

AD_MDEA9 is only in reference to during a difficult time in person's life

In [11]:
print('Num adults w/ suicidal thoughts last year: '+str(len(df[df.MHSUITHK==1])))
print('Num adults w/ suicidal plan last year: '+str(len(df[df.MHSUIPLN==1])))
print('Num adults w/ suicidal attempt last year: '+str(len(df[df.MHSUITRY==1])))
print('Num adults w/ suicidal thoughts in difficult time: '+str(len(df[df.ADWRSTHK==1])))
print('Num adults w/ suicidal plan in difficult time: '+str(len(df[df.ADWRSPLN==1])))
print('Num adults w/ suicidal attempt in difficult time: '+str(len(df[df.ADWRSATP==1])))
print('Num adults w/ thoughts about death (theirs, others, general) in difficult time: '+str(len(df[df.ADWRDLOT==1])))
print('Num adults w/ thought better if they were dead in difficult time: '+str(len(df[df.ADWRDBTR==1])))
print('Num adults answered yes for past 5 questions: '+str(len(df[df.AD_MDEA9==1])))

Num adults w/ suicidal thoughts last year: 2588
Num adults w/ suicidal plan last year: 845
Num adults w/ suicidal attempt last year: 406
Num adults w/ suicidal thoughts in difficult time: 3121
Num adults w/ suicidal plan in difficult time: 1183
Num adults w/ suicidal attempt in difficult time: 761
Num adults w/ thoughts about death (theirs, others, general) in difficult time: 4084
Num adults w/ thought better if they were dead in difficult time: 3930
Num adults answered yes for past 5 questions: 4988


In [12]:
print('Num adults w/ suicidal thoughts: '+str(len(df[(df.MHSUITHK==1) | (df.ADWRSTHK==1)])))
print('Num adults w/ suicidal plans: '+str(len(df[(df.MHSUIPLN==1) | (df.ADWRSPLN==1)])))
print('Num adults w/ suicidal attempt: '+str(len(df[(df.MHSUITRY==1) | (df.ADWRSATP==1)])))
print('Num adults w/ thoughts of death: '+str(len(df[(df.ADWRDLOT==1) | (df.ADWRDBTR==1)])))
print('Num adults w/ thoughts of death or suicide: '+str(len(df[(df.ADWRDLOT==1) | (df.ADWRDBTR==1) | (df.MHSUITHK==1) | (df.ADWRSTHK==1)])))

Num adults w/ suicidal thoughts: 4153
Num adults w/ suicidal plans: 1522
Num adults w/ suicidal attempt: 946
Num adults w/ thoughts of death: 4909
Num adults w/ thoughts of death or suicide: 5914


In [13]:
df[df.MHSUITRY==1][['MHSUIPLN','MHSUITHK']].describe()

Unnamed: 0,MHSUIPLN,MHSUITHK
count,406.0,406.0
mean,0.864532,1.0
std,0.342645,0.0
min,0.0,1.0
25%,1.0,1.0
50%,1.0,1.0
75%,1.0,1.0
max,1.0,1.0


In [14]:
df[df.MHSUITRY==0][['MHSUIPLN','MHSUITHK']].describe()

Unnamed: 0,MHSUIPLN,MHSUITHK
count,41829.0,41831.0
mean,0.011762,0.052091
std,0.107815,0.222212
min,0.0,0.0
25%,0.0,0.0
50%,0.0,0.0
75%,0.0,0.0
max,1.0,1.0


In [15]:
df[df.ADWRSATP==1][['ADWRSPLN','ADWRSTHK','ADWRDLOT','ADWRDBTR']].describe()

Unnamed: 0,ADWRSPLN,ADWRSTHK,ADWRDLOT,ADWRDBTR
count,761.0,761.0,761.0,761.0
mean,1.254928,1.0,1.093298,1.148489
std,0.436107,0.0,0.291041,3.482328
min,1.0,1.0,1.0,1.0
25%,1.0,1.0,1.0,1.0
50%,1.0,1.0,1.0,1.0
75%,2.0,1.0,1.0,1.0
max,2.0,1.0,2.0,97.0


In [16]:
df[df.ADWRSATP==2][['ADWRSPLN','ADWRSTHK','ADWRDLOT','ADWRDBTR']].describe()

Unnamed: 0,ADWRSPLN,ADWRSTHK,ADWRDLOT,ADWRDBTR
count,2357.0,2357.0,2357.0,2357.0
mean,1.817989,1.0,1.240136,1.13322
std,2.766133,0.0,2.7722,1.935743
min,1.0,1.0,1.0,1.0
25%,1.0,1.0,1.0,1.0
50%,2.0,1.0,1.0,1.0
75%,2.0,1.0,1.0,1.0
max,97.0,1.0,97.0,94.0


In [17]:
df[df.YOWRSATP==1][['YOWRSPLN','YOWRSTHK','YOWRDLOT','YOWRDBTR']].describe()

Unnamed: 0,YOWRSPLN,YOWRSTHK,YOWRDLOT,YOWRDBTR
count,545.0,545.0,545.0,545.0
mean,1.322936,1.0,1.592661,1.033028
std,4.121135,0.0,7.035568,0.178873
min,1.0,1.0,1.0,1.0
25%,1.0,1.0,1.0,1.0
50%,1.0,1.0,1.0,1.0
75%,1.0,1.0,1.0,1.0
max,97.0,1.0,97.0,2.0


In [18]:
df[df.YOWRSATP==2][['YOWRSPLN','YOWRSTHK','YOWRDLOT','YOWRDBTR']].describe()

Unnamed: 0,YOWRSPLN,YOWRSTHK,YOWRDLOT,YOWRDBTR
count,964.0,964.0,964.0,964.0
mean,2.479253,1.0,1.797718,1.582988
std,8.629478,0.0,7.933495,6.727014
min,1.0,1.0,1.0,1.0
25%,1.0,1.0,1.0,1.0
50%,2.0,1.0,1.0,1.0
75%,2.0,1.0,1.0,1.0
max,97.0,1.0,97.0,97.0


From the numbers in the code book, only those who answer they had suicidal ideation can then answer questions about plans and attempts.

As one would suspect, a large percentage of those who have a suicide attempt also had thoughts of a suicide plan.

In [78]:
def mkTrainTest(df):
    rem_val = [81,83,85,89,91,93,94,97,98,99,981,983,985,989,991,993,994,997,998,999,9981,9983,9985,9989,9991,9993,
               9994,9997,9998,9999]

    dfc = df.copy()
    dfc = dfc.replace(rem_val,np.nan)
    df1 = dfc[dfc.MHSUITHK==1]
    df0 = dfc[dfc.MHSUITHK==0]

    #use_df1 = use_df1.replace(rem_val,np.nan)
    #use_df1.dropna(inplace=True)
    #use_df0 = use_df0.replace(rem_val,np.nan)
    #use_df0.dropna(inplace=True)

    one_size = len(df1)
    df0p = df0.sample(one_size)

    y0 = df0p.MHSUITHK
    X0 = df0p.drop(columns=['MHSUITHK','SUICTHNK'])
    #use_X0 = pd.get_dummies(use_X0,columns=use_cols)
    y1 = df1.MHSUITHK
    X1 = df1.drop(columns=['MHSUITHK','SUICTHNK'])
    #use_X1 = pd.get_dummies(use_X1,columns=use_cols)
    
    X_all = pd.concat([X0,X1])
    y_all = pd.concat([y0,y1])

    #X1_train, X1_test, y1_train, y1_test = train_test_split(X1, y1, test_size=0.3, random_state=0)

    #X_train = pd.concat([X0_train,X1_train])
    #X_test = pd.concat([X0_test,X1_test])
    #y_train = pd.concat([y0_train,y1_train])
    #y_test = pd.concat([y0_test,y1_test])
    return train_test_split(X_all, y_all, test_size=0.3, random_state=0)

In [137]:
def featFind(X,y):
    list_p = list(df)[1:]
    find_p = []
    logreg = LogisticRegression()
    rfe = RFE(logreg)
    
    for i in range(3):#len(list_p)/10+1):
        if i == len(list_p)/200+1:
            use_p = list_p[i*200:]
        else:
            use_p = list_p[i*200:(i+1)*200]
            
        use_X = X[use_p]
        use_X = pd.get_dummies(use_X,columns=use_p)
        rfe = rfe.fit(use_X,y.values.ravel())
        tmp_p = np.array(list(use_X))
        find_p.extend(tmp_p[rfe.support_])
        
    return find_p

In [138]:
X_train,X_test,y_train,y_test = mkTrainTest(df)
test=featFind(X_train,y_train)
len(test)
###logreg = LogisticRegression()
###rfe = RFE(logreg)
#rfe = rfe.fit(X_train, y_train.values.ravel())
#print(rfe.support_)
#print(rfe.ranking_)
#print(X_train.columns)

#logit_model=sm.Logit(y_train,X_train)
#result=logit_model.fit()
#print(result.summary2())

2678

In [139]:
test

['CIGEVER_1',
 'CIGEVER_2',
 'CIGTRY_1.0',
 'CIGTRY_6.0',
 'CIGTRY_7.0',
 'CIGTRY_8.0',
 'CIGTRY_9.0',
 'CIGTRY_10.0',
 'CIGTRY_15.0',
 'CIGTRY_17.0',
 'CIGTRY_19.0',
 'CIGTRY_23.0',
 'CIGTRY_24.0',
 'CIGTRY_27.0',
 'CIGTRY_28.0',
 'CIGTRY_29.0',
 'CIGTRY_31.0',
 'CIGTRY_33.0',
 'CIGTRY_35.0',
 'CIGTRY_36.0',
 'CIGTRY_37.0',
 'CIGTRY_39.0',
 'CIGTRY_40.0',
 'CIGTRY_50.0',
 'CIGYFU_2015.0',
 'CIGYFU_2016.0',
 'CIGMFU_2.0',
 'CIGMFU_3.0',
 'CIGMFU_5.0',
 'CIGMFU_6.0',
 'CIGMFU_7.0',
 'CIGMFU_8.0',
 'CIGMFU_9.0',
 'CIGMFU_10.0',
 'CIGREC_4.0',
 'CIGREC_9.0',
 'CIGREC_14.0',
 'CIGREC_19.0',
 'CIG30USE_1.0',
 'CIG30USE_3.0',
 'CIG30USE_4.0',
 'CIG30USE_5.0',
 'CIG30USE_6.0',
 'CIG30USE_8.0',
 'CIG30USE_10.0',
 'CIG30USE_16.0',
 'CIG30USE_17.0',
 'CIG30USE_19.0',
 'CIG30USE_21.0',
 'CIG30USE_22.0',
 'CIG30USE_23.0',
 'CIG30USE_25.0',
 'CIG30USE_26.0',
 'CIG30USE_28.0',
 'CIG30USE_29.0',
 'CIG30USE_30.0',
 'CG30EST_4.0',
 'CIG30AV_2.0',
 'CIG30AV_7.0',
 'CIG30BR2_102.0',
 'CIG30BR2_109.0',
 '

In [None]:
use_col = ['CATAG6','NOMARR2','SERVICE','HEALTH','MOVSINPYR2','SEXATRACT','DIFFHEAR','DIFFSEE','DIFFTHINK','DIFFWALK',
           'DIFFDRESS','DIFFERAND','IRSEX','IRMARIT','IREDUHIGHST2','NEWRACE2','EDUHIGHCAT','EDUSCHLGO','WRKSTATWK2',
           'EDFAM18','PRVHLTIN','HLCNOTYR','IRPINC3','IFAMIN3','INCOME','POVERTY','MAIIN102','BOOKED','TXEVRRCVD',
           'PREGNANT','HTINCHE2','WTPOUND2','INHOSPYR','NMVSOEST','AUINPYR','AUOPTYR','AURXYR','AUUNMTYR','SNYSELL',
           'SNYSTOLE','SNYATTAK','SNRLGIMP','DSTWORST','ADDPREV','AMDELT']