In [2]:
import sqlite3 as sql
import pandas as pd
import numpy as np
import seaborn as sns

In [3]:
con = sql.connect("mental_health.sqlite")
cur = con.cursor()

In [4]:
query = "SELECT name FROM sqlite_master WHERE type='table';"
cur.execute(query)
tables = cur.fetchall()

In [5]:
tables

[('Answer',), ('Question',), ('Survey',)]

In [6]:
columnDict = {}

for i,table in enumerate(tables):
    query = "SELECT * FROM %s;" % table
    cur.execute(query)
    cols = list(cur.description)
    valuelist = []
    for j, col in enumerate(cols):
        collist = list(col)
        valuelist.append(collist[0])
    columnDict[table] = valuelist

columnDict

{('Answer',): ['AnswerText', 'SurveyID', 'UserID', 'QuestionID'],
 ('Question',): ['questiontext', 'questionid'],
 ('Survey',): ['SurveyID', 'Description']}

In [7]:
query = "SELECT SurveyID, Description from Survey"
cur.execute(query)
cur.fetchall()

[(2014, 'mental health survey for 2014'),
 (2016, 'mental health survey for 2016'),
 (2017, 'mental health survey for 2017'),
 (2018, 'mental health survey for 2018'),
 (2019, 'mental health survey for 2019')]

In [8]:
query = "SELECT questiontext from Question"
cur.execute(query)
cur.fetchall()

[('What is your age?',),
 ('What is your gender?',),
 ('What country do you live in?',),
 ('If you live in the United States, which state or territory do you live in?',),
 ('Are you self-employed?',),
 ('Do you have a family history of mental illness?',),
 ('Have you ever sought treatment for a mental health disorder from a mental health professional?',),
 ('How many employees does your company or organization have?',),
 ('Is your employer primarily a tech company/organization?',),
 ('Does your employer provide mental health benefits as part of healthcare coverage?',),
 ('Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources provided by your employer?',),
 ('Would you bring up a mental health issue with a potential employer in an interview?',),
 ('Is your primary role within your company related to tech/IT?',),
 ('Do you know the options for mental health care available under your employer-provided health coverage?',),
 ('Ha

A few choices for the questions to look at:

* 'What is your age?'
* 'What is your gender?'
* 'What is your race?'
* 'Which of the following best describes your work position?'
* 'What country do you live in?'
* 'Is your employer primarily a tech company/organization?'
* 'Are you self-employed?'
* 'How many employees does your company or organization have?'
* 'Do you work remotely?'
* 'Do you work remotely (outside of an office) at least 50% of the time?'
* 'Do you have a family history of mental illness?'
* 'Have you ever sought treatment for a mental health disorder from a mental health professional?'
* 'Do you currently have a mental health disorder?'
* 'If you have a mental health disorder, how often do you feel that it interferes with your work when being treated effectively?'
* 'Are you openly identified at work as a person with a mental health issue?'
* 'Has being identified as a person with a mental health issue affected your career?'
* 'Does your employer provide mental health benefits as part of healthcare coverage?'
* 'Does your employer provide resources to learn more about mental health issues and how to seek help?'
* 'Do you believe your productivity is ever affected by a mental health issue?'
* 'If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?'
* 'Would you feel comfortable discussing a mental health issue with your coworkers?'
* 'Would you feel comfortable discussing a mental health issue with your direct supervisor(s)?'
'

These questions are broadly divided into three categories: general demographics, job questions, and mental health disorder questions. The point is to see if country lived in,gender, age etc has an effect on comfort with sharing mental health issues or going to get help or diagnosed. 

Another dimension of analysis is an overview of the state of mental health awareness and discussion in the industry. Looking at factors like remote work, size of company, self-employment, and whether mental healthcare is covered by the company's insurance are certain questions I want to look at. 

The last portion are mental health questions, looking at the background of the employee's mental health, and their comfort with sharing their mental health. Looking at how mental health may affect their jobs. 

Overall, this is a broad analysis trying to figure out what should be looked at in more detail.


In [14]:
query = """SELECT Q.questiontext as Ques,
A.answertext as Ans,
A.UserID as User
FROM Question as Q
JOIN Answer as A
on Q.questionid = A.QuestionID
WHERE Ques IN ('What is your age?')
"""
cur.execute(query)
cur.fetchall()

[('What is your age?', '-1', 391),
 ('What is your age?', '-1', 716),
 ('What is your age?', '-1', 1128),
 ('What is your age?', '-1', 3447),
 ('What is your age?', '-1', 3449),
 ('What is your age?', '-29', 144),
 ('What is your age?', '0', 3981),
 ('What is your age?', '11', 1091),
 ('What is your age?', '15', 2069),
 ('What is your age?', '17', 1354),
 ('What is your age?', '18', 44),
 ('What is your age?', '18', 94),
 ('What is your age?', '18', 119),
 ('What is your age?', '18', 283),
 ('What is your age?', '18', 288),
 ('What is your age?', '18', 479),
 ('What is your age?', '18', 483),
 ('What is your age?', '18', 2715),
 ('What is your age?', '18', 2803),
 ('What is your age?', '19', 76),
 ('What is your age?', '19', 130),
 ('What is your age?', '19', 151),
 ('What is your age?', '19', 593),
 ('What is your age?', '19', 683),
 ('What is your age?', '19', 750),
 ('What is your age?', '19', 992),
 ('What is your age?', '19', 1021),
 ('What is your age?', '19', 1028),
 ('What is y

In [13]:
query = """SELECT Q.questiontext as Ques,
AVG(A.answertext) as Ans,
A.UserID as User
FROM Question as Q
JOIN Answer as A
on Q.questionid = A.QuestionID
WHERE Ques IN ('What is your age?') AND A.answertext > 0
"""
cur.execute(query)
cur.fetchall()

[('What is your age?', 33.91536273115221, 391)]

We see a few of the answers were nonsense so the conditiion that the original responses needed to be positiive. I am assuming the eleven and fifteen year old are legitimate responses because I cannot investigate it. 

Regardless, the result shows that the age skews young. 