## Unit 1 Capstone

### Assignment

The main component of this capstone is an experimentation RFC. Using the data set you selected, propose and outline an experiment plan. The plan should consist of three key components:

- Analysis that highlights your experimental hypothesis.
- A rollout plan showing how you would implement and rollout the experiment
- An evaluation plan showing what constitutes success in this experiment

Your experiment should be as real as possible. Though you obviously will not have access to the full production environment to deploy your experiment, it should be feasible and of interest to the parties involved with your actual data source.

### Exploring the Data

For this capstone project, I will be using 2014 data from the Mental Health in Tech survey.  This survey measures attitudes towatd mental health and frequency of mental health disorders in the tech workplace.

More information on the mental health in tech survey [here](https://www.kaggle.com/osmi/mental-health-in-tech-survey/data).

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

In [2]:
# Uploading file.

df = pd.read_csv('datafiles/mental_health_tech_survey.csv')

In [3]:
# Let's see what we're working with.

df.head()

Unnamed: 0,Timestamp,Age,Gender,Country,state,self_employed,family_history,treatment,work_interfere,no_employees,...,leave,mental_health_consequence,phys_health_consequence,coworkers,supervisor,mental_health_interview,phys_health_interview,mental_vs_physical,obs_consequence,comments
0,2014-08-27 11:29:31,37,Female,United States,IL,,No,Yes,Often,6-25,...,Somewhat easy,No,No,Some of them,Yes,No,Maybe,Yes,No,
1,2014-08-27 11:29:37,44,M,United States,IN,,No,No,Rarely,More than 1000,...,Don't know,Maybe,No,No,No,No,No,Don't know,No,
2,2014-08-27 11:29:44,32,Male,Canada,,,No,No,Rarely,6-25,...,Somewhat difficult,No,No,Yes,Yes,Yes,Yes,No,No,
3,2014-08-27 11:29:46,31,Male,United Kingdom,,,Yes,Yes,Often,26-100,...,Somewhat difficult,Yes,Yes,Some of them,No,Maybe,Maybe,No,Yes,
4,2014-08-27 11:30:22,31,Male,United States,TX,,No,No,Never,100-500,...,Don't know,No,No,Some of them,Yes,Yes,Yes,Don't know,No,


In [4]:
# How is Gender being categorized?

df['Gender'].unique()

array(['Female', 'M', 'Male', 'male', 'female', 'm', 'Male-ish', 'maile',
       'Trans-female', 'Cis Female', 'F', 'something kinda male?',
       'Cis Male', 'Woman', 'f', 'Mal', 'Male (CIS)', 'queer/she/they',
       'non-binary', 'Femake', 'woman', 'Make', 'Nah', 'All', 'Enby',
       'fluid', 'Genderqueer', 'Female ', 'Androgyne', 'Agender',
       'cis-female/femme', 'Guy (-ish) ^_^', 'male leaning androgynous',
       'Male ', 'Man', 'Trans woman', 'msle', 'Neuter', 'Female (trans)',
       'queer', 'Female (cis)', 'Mail', 'cis male', 'A little about you',
       'Malr', 'p', 'femail', 'Cis Man',
       'ostensibly male, unsure what that really means'], dtype=object)

In [5]:
# Looks like people wrote in responses.  Data cleaning time!

df['Gender'] = df['Gender'].str.lower()
df['Gender'] = df['Gender'].replace('m','male')
df['Gender'] = df['Gender'].replace('f','female')

df['Gender'] = df['Gender'].str.strip()

df['Gender'] = df['Gender'].apply(lambda x: str(x).replace('cis ',''))
df['Gender'] = df['Gender'].apply(lambda x: str(x).replace('(cis)',''))

df['Gender'] = df['Gender'].apply(lambda x: str(x).replace('make','male'))
df['Gender'] = df['Gender'].apply(lambda x: str(x).replace('mail','male'))
df['Gender'] = df['Gender'].apply(lambda x: str(x).replace('mal','male'))
df['Gender'] = df['Gender'].apply(lambda x: str(x).replace('malr','male'))
df['Gender'] = df['Gender'].apply(lambda x: str(x).replace('woman','female'))
df['Gender'] = df['Gender'].apply(lambda x: str(x).replace('maler','male'))
df['Gender'] = df['Gender'].apply(lambda x: str(x).replace('msle','male'))
df['Gender'] = df['Gender'].apply(lambda x: str(x).replace('man','male'))
df['Gender'] = df['Gender'].apply(lambda x: str(x).replace('cis-female/femme','female'))
df['Gender'] = df['Gender'].apply(lambda x: str(x).replace('malee','male'))

df['Gender'] = df['Gender'].str.strip()

df['Gender'].value_counts()

male                                              989
female                                            246
female (trans)                                      2
androgyne                                           1
all                                                 1
enby                                                1
ostensibly male, unsure what that really means      1
p                                                   1
trans-female                                        1
agender                                             1
male-ish                                            1
trans female                                        1
male leaning androgynous                            1
queer/she/they                                      1
cis-female/femme                                    1
genderqueer                                         1
queer                                               1
nah                                                 1
malee                       

There are 24 individuals who don't identify as male or female.  Because they don't make up a huge percentage of the survey respondents, I am going to aggregate them into one "other" category.  

In [6]:
# Changing those who didn't reply "male" or "female" to "other".

df['Gender'] = df['Gender'].apply(lambda x: 'other' if x != 'male' and x != 'female' else x)

df['Gender'].value_counts()

male      989
female    246
other      24
Name: Gender, dtype: int64

### What is the frequency of mental health issues in tech?

When trying to determine the frequency of mental health conditions in the tech workplace, the most relevant survey question is this one: Have you sought treatment for a mental health condition?  

It is worth mentioning that this question presents some limitations.

First, the question does not ask whether or not the respondent has a mental health issue, but rather if s/he has gone in for treatment.  A person who may have a mental health condition but is forgoing treatment would respond "no."  In this case, the data would be under-reporting the frequency of mental health conditions in tech.

Second, respondents are not randomly selected take this survey, but are instead self-selected. People who have mental health issues are more likely to complete surveys about mental health.  Therefore, the data would be over-reporting the frequency of mental health conditions in tech.



In [7]:
df['treatment'].value_counts()

Yes    637
No     622
Name: treatment, dtype: int64

In [8]:
# Changing treatment responses to binary.

df['treatment'] = df.treatment.map({'Yes':1, 'No':0})

In [9]:
# Proportion of survey respondents that say they have sought treatment for a mental health condition.

df['treatment'].mean()

0.50595710881652101

Over 50% of respondents say they have sought mental health treatment in the past.  According to [National Institute of Health](https://www.nimh.nih.gov/health/statistics/mental-illness.shtml), 18.3% of Americans have a mental illness.  Because this survey involves self-selection, it is not surprising that the percentage of respondents for the Mental Health in tech survey is higher than the national average.

In [10]:
# Proportion of people who say they went for mental health treatment by gender.

df.groupby('Gender')['treatment'].mean()

Gender
female    0.686992
male      0.453994
other     0.791667
Name: treatment, dtype: float64

While the majority of survey respondents are male, more females than males say they have sought mental health treatment.  This is on par with [existing research](https://www.theguardian.com/society/2016/nov/05/men-less-likely-to-get-help--mental-health) that men are less likely to seek mental health treatment than women are.  Note that the study is based in the UK.

73% of survey respondents in the "other" category have sought mental health treatment.  This is also on par with [existing research](https://www.healthypeople.gov/2020/topics-objectives/topic/lesbian-gay-bisexual-and-transgender-health#one).  Most respondents in the "Other" category identify as LGBT.  LGBT individuals, who are more likely to experience societal stigma, discrimination, and denial of rights, have higher rates of mental health issues. 

### Proportion Test
To compare proportions of those who have sought treatment between how they responded to each categorical question, I will be running the z-test.  

In [11]:
# Adapted from https://github.com/Volodymyrk/stats-testing-in-python/blob/master/03%20-%20AB%20testing%20Proportions%20with%20z-test.ipynb

def ztest_proportion(p1, n1, p2, n2, one_sided=False):

    p = (p1*n1+p2*n2)/(n1+n2)
    se = p*(1-p)*(1/n1+1/n2)
    se = np.sqrt(se)
    
    z = (p1-p2)/se
    p = 1-stats.norm.cdf(abs(z))
    p *= 2-one_sided # if not one_sided: p *= 2

    print(' z-stat = {z} \n p-value = {p}'.format(z=z,p=p))
    
    return z, p

#### P-value correction 
Since I am running z-test six times, I need to adjust the p-value cut-off to determine confidence levels. This is known as the [Bonferri Correction](https://en.wikipedia.org/wiki/Bonferroni_correction).

In [12]:
new_pvalue = 0.05/8

print(new_pvalue)

0.00625


Resulting p-values that are lower than 0.00625 signifie that, given the two samples are the same, there is a 0.625% probability that we would expect this z-value or more extreme due to random chance.  This would lead us to reject the null hypothesis that the two group are statistically similar.  

Now we are ready to look at the different groups!

#### Benefits
Proportion of people who say they went for mental health treatment by whether or not the workplace provides mental health benefits.


In [13]:
df.benefits.value_counts()

Yes           477
Don't know    408
No            374
Name: benefits, dtype: int64

In [14]:
# Proportion of people who say they went for mental health treatment by 
#     whether or not the workplace provides mental health benefits.

df.groupby('benefits')['treatment'].mean()

benefits
Don't know    0.370098
No            0.483957
Yes           0.639413
Name: treatment, dtype: float64

In [15]:
# Is the propotional difference in No and Yes groups statistically significant?
p1 = 0.639413
n1 = 477
p2 = 0.483957
n2 = 374

ztest_proportion(p1, n1, p2, n2, one_sided=False)

 z-stat = 4.54781385680887 
 p-value = 5.420604620276492e-06


(4.5478138568088697, 5.4206046202764924e-06)

In [16]:
# Is the difference in No and Don't Know groups statistically significant?
p3 = 0.370098
n3 = 408

ztest_proportion(p2, n2, p3, n3, one_sided=False)

 z-stat = 3.217817004814399 
 p-value = 0.00129170200204487


(3.2178170048143988, 0.00129170200204487)

The proportional differences of respondents who said they sought treatment between those who said their workplaces offer benefits is statistically different than those who said their workplaces do not.  The proportions is also significantly different between the No and Don't Know groups. 

#### Care Options 
Proportion of people who say they went for mental health treatment by whether or not the workplace provides mental health care options.

In [17]:
df.care_options.value_counts()

No          501
Yes         444
Not sure    314
Name: care_options, dtype: int64

In [18]:
# Proportion of people who say they went for mental health treatment by 
#     whether or not the workplace provides mental health care options.

df.groupby('care_options')['treatment'].mean()

care_options
No          0.413174
Not sure    0.391720
Yes         0.691441
Name: treatment, dtype: float64

In [19]:
# Is the difference in No and Yes groups statistically significant?
p1 = 0.691441
n1 = 444
p2 = 0.413174
n2 = 504

ztest_proportion(p1, n1, p2, n2, one_sided=False)

 z-stat = 8.583101451121674 
 p-value = 0.0


(8.5831014511216743, 0.0)

In [20]:
# Is the difference in No and Not Sure groups statistically significant?
p3 = 0.391720
n3 = 314

ztest_proportion(p3, n3, p2, n2, one_sided=False)

 z-stat = -0.6079057315668314 
 p-value = 0.5432499979706544


(-0.60790573156683136, 0.54324999797065443)

Again, the proportional differences of respondents who said they sought treatment between those who said their workplaces offer care options is statistically different than those who said their workplaces do not.  The proportions is not significantly different between the No and Not Sure groups. 

#### Wellness Program
Proportion of people who say they went for mental health treatment by whether or not the workplace discussed mental health as part of an employee wellness program

In [21]:
df.wellness_program.value_counts()

No            842
Yes           229
Don't know    188
Name: wellness_program, dtype: int64

In [22]:
# Proportion of people who say they went for mental health treatment by 
#     whether or not the workplace discussed mental health as part of an employee wellness program
df.groupby('wellness_program')['treatment'].mean()

wellness_program
Don't know    0.430851
No            0.498812
Yes           0.593886
Name: treatment, dtype: float64

In [23]:
# Is the difference in No and Yes groups statistically significant?
p1 = 0.593886
n1 = 229
p2 = 0.498812
n2 = 842

ztest_proportion(p1, n1, p2, n2, one_sided=False)

 z-stat = 2.5532260434106693 
 p-value = 0.010673020435427283


(2.5532260434106693, 0.010673020435427283)

In [29]:
# Is the difference in No and Not Sure groups statistically significant?
p3 = 0.430851
n3 = 188

ztest_proportion(p3, n3, p2, n2, one_sided=False)

 z-stat = -1.6697029056054142 
 p-value = 0.09497815859499625


(-1.6697029056054142, 0.094978158594996254)

The proportional differences of respondents who said they sought treatment between those who said their workplaces offer care options is NOT statistically different than those who said their workplaces do not.  The proportions is also not significantly different between the No and Don't Know groups. 

#### Seek Help
Proportion of people who say they went for mental health treatment by whether or not the workplace provides resources to learn more about mental health and how to seek help.


In [25]:
df.seek_help.value_counts()

No            646
Don't know    363
Yes           250
Name: seek_help, dtype: int64

In [26]:
# Proportion of people who say they went for mental health treatment by 
#     whether or not the workplace provides resources to learn more about mental health and
#     how to seek help.

df.groupby('seek_help')['treatment'].mean()

seek_help
Don't know    0.4573
No            0.5000
Yes           0.5920
Name: treatment, dtype: float64

In [27]:
# Is the difference in No and Yes groups statistically significant?
p1 = 0.5920
n1 = 250
p2 = .5000
n2 = 646

ztest_proportion(p1, n1, p2, n2, one_sided=False)

 z-stat = 2.473564144179719 
 p-value = 0.013377278748244015


(2.4735641441797189, 0.013377278748244015)

In [28]:
# Is the difference in No and Not Sure groups statistically significant?
p3 = 0.4573
n3 = 363

ztest_proportion(p3, n3, p2, n2, one_sided=False)

 z-stat = -1.3025275890371322 
 p-value = 0.19273609305531458


(-1.3025275890371322, 0.19273609305531458)

The proportional differences of respondents who said they sought treatment between those who said their workplaces provide resources is NOT statistically different than those who said their workplaces do not.  The proportions is also not significantly different between the No and Don't Know groups. 

#### Z-Test conclusions

Respondents who have sought mental health treatments are significantly more likely to say that their employers offer mental health benefits and care options.  Comparing proportional differences of respondents who have access to mental health benefits and those who do not, respondents with access are 15.54% more likely to have sought mental health treatment.  Comparing the differences of respondents with access to care options and those who do not, repondents with access are 27.8% more likely to have sought mental health treatment.  The pvalues are both medical benefits and care options are statistically significant at 5.6e-06 and 0.0, respectively.

These results could signify that respondents who need mental health services are more aware of the mental health benefits offered by their workplaces.  Or, it could mean that workplaces that offer mental health services are more likely to have employees that seek mental health treatment.  Because this is a survey, it is impossible to establish temporality. 

On the other hand, z-tests show that the proportional differences for those who have sought mental health treatment and if their workplaces offered mental health wellness programs and resources to seek help were not statistically significant.  It is worth noting that most respondents said that their workplaces do not offer wellness programs or mental health resources, compared to mental health benefits and care options.

The "Don't Know" and "Not Sure" responses also make up a good proportion of the response data, so we are going to get less definitive answers as to whether or not having more mental health support is correlated with more employees seeking out mental health treatment.  Which cases is this the most signiicance.  Include the numbers.

It would be interesting to run an experiment to determine: 
1. If a company adopts a mental health wellness program, would employees use the services?  
2. Given that employees are utilizing mental health services, would overall job satisfaction and productivity improve?

## Mental Health in Tech: Experimentation RFC

### Setting

Our theoretical company, BookFace, is concerned about the mental health of their employees.  Currently, BookFace offers employees health insurance and provide physical wellnesss programs that promotes exercise and healthy eating.  While health insurance include mental health benefits and care options, some members of the leadership team believe that more can be done to support their employees' mental wellness.  Of course, robust mental health programs means additional costs.  It is important to learn whether or not such a program would bring additional employee satisfaction and productivity. 

Thus, they designed a mental health experiment to determine 
    1. What would usage of mental health services may look like?
    2. How mental health benefits could improve overall job satisfaction and productivity?

To do this, they will implement a mental health wellness program in randomly selected offices in North America. They will measure employee sick days as a proxy for productivity.  

### Experiment overivew:

BookFace has 30 locations across North America.  16 of these locations will be randomly assigned to either the test or control groups.

For offices in the control group, employees will learn about their current health benefits, including any mental health service offerings with their health insurance plan.  To insure similarities across control offices, employees would be shown an informational video during a staff meeting and reminded that the office HR representative will be available to answer any questions publically or privately.   

Offices in the test group will adopt a more comprehensive mental health wellness program.  This program includes a "wellness week," in which leadership and employees openly discuss and address mental health issues in the workplace to combat stigma around mental health conditions.  For the duration of the experiment, a mental health professional will be available to the staff, offering three free counselling session.  Employees will be insured that the names of the clients and the nature of the visit will be kept confidential.  BookFace will also be offering biweekly meditation sessions for staff members.

### Hypotheses
Hypothesis:
Average of employee sick days will drop for test groups relative to that of control groups.

Null Hypothesis:
There is no difference in the change of average employee sick days between test and control groups.

### Success metrics
Primary success metric:  Change in average number of sick days.
Secondary metrics:  Number of visits to mental health professional (Validation/Manipulation check).  [Employee satisfaction survey](https://www.surveymonkey.com/r/Job-Satisfaction-Survey-Template).

### A/A Testing with smaller test group
To work out any potential issues in implementation, we would first need to test the program on two test offices and two control offices.  By implementing A/A testing, we could compare the results between the two test office and between the control offices to identify any biases.  

Elements of the experiment we would check for include:
    - The content provided
    - The amount of notice given to the offices to prepare for the experiment
    - The date and time the experiment is run
    - How the experiment fits into the normal work schedule (contextual bias)
    - Office size and location to assess any sampling biases
    - Demographic information of employees to asses any sampling biases

### Timeline
Week 0 - Send employee satisfaction survey for baseline.

Week 1 - Implement AA test for 2 test groups and 2 control groups.

Week 2 - Wellness Week for test group concludes.  Start evaluating results of AA test.

Week 3 - Expand experiment to 6 additional test groups and 6 control groups.

Week 4 - Wellness Week concludes for test groups.

End of experiment (6 months) - Send out satisfaction survey again 

### Evaluation plan
We will be using average sick days to determine if implementing a wellness program has increase workplace productivity.  We will calcuate the change in sick days for each office before and after experiment, being sure we are comparing similar time periods (e.g. If the experiment is run from January 2018 - June 2018, we would compare average sick days during this time period with that of January 2017 - June 2018).  Using a t-test, if the change in average sick days is statistically significant between test and control groups, we can rule out the null hypothesis.

We will also be comparing results of the employee satisfaction survey before and after to experiment to determine if employees in the test group are more satisfied with their workplace that employees in the control group by the end of the experiment.  It is important to beware of the [Hawthorne Effect](https://en.wikipedia.org/wiki/Hawthorne_effect), in which individuals may modify their answers in response to their awareness of being observed.  With the test group undergoing a more intensive experience, it would not be surprising to see this reflected in the Employee Satisfaction survey.

We would be unable to note whether or not employees in the control group sought mental health treatment after they were shown the informational video.  For the test groups, we would ask the mental health professionals how many employees have sought their services and how many sessions were scheduled and attended.  For confidentiality, we will not be collecting information on the names of the employees and the content of those sessions.  