***Group Project | Data Science Bertelsmann and Udacity Scholarship***

***Work Environment***

Our goal is to explore a possible relation between positive behaviors related to wellness and self-care in the workplace and job satisfaction. 
For this purpose we’ve created a collaborative kernel and a public notebook on Kaggle based on the [Stackoverflow developer survey 2018 Dataset](https://insights.stackoverflow.com/survey/2018/). 

From the dataset we’ve analyzed the responses of the devoloper regarding items related to well-being and job satisfaction.
**
*Hypothesis***
*is there a relation between well-being and job satisfaction among developers?*

> Of employers offering wellness programs, 67% reported increased employee satisfaction, 66% reported increased productivity, 63% reported increased financial sustainability and growth, and 50% reported decreased absenteeism.** *IFEBP***

> “89% of workers at companies that support well-being efforts are more likely to recommend their company as a good place to work.” 
> ***American Psychological Association***


More and more researches’re analyzing the relation between well-being and job satisfaction, some of which reported that the adoption of behaviors that promote workers' well-being can lead to increased job satisfaction and even productivity. Starting from this we hypothesized that there may be a relationship between the answer to the items related well-being and those related to job satisfaction in the [Stackoverflow developer survey 2018 Dataset](https://insights.stackoverflow.com/survey/2018/).

***Analysis***
*Our Kernel workflow*

*Load dataset:*  Stackoverflow Developer Survey 2018 Dataset on Kaggle.
*Cleaning data:* convertion of the answers* into numerical values, drop nulls and fillna.
*Aggregation data:* aggregation of the items into two “TotPoints” one for job satisfaction 
and one for well-being.
*Normalizing data:* normalization of  “Job Satisfaction TotPoints” and “Well-being TotPoints”.
*Data visualization:* visuazation on a plot of the two scores.
*Data correlation:* application of the Spearman and Pearson correlation coefficents on the dataframe.

*As a *job satisfaction's* indicators we've considerate the answer to the items "**JobSatisfaction**" and "**CareerSatisfaction**", instead for *well-being* we've considerate "**SkipMeals**", "**ErgonomicDevices**", "**Exercise**" and "**HoursOutside**".

**Conclusion**
*what we’ve reached so far and a look into the future*

From the analyzes made so far, the distribution of the frequencies relative to the total job satisfaction score seems to be asymmetric with the peak on the right, while that of the relative frequencies the well-being total score to take the form of a normal curve. Between the two total scores there isn’t a significant correlation. We will try to apply further statistical analysis techniques to see if the results will change and if there’re gender-based differences. 

In the future, we would like to develop a specif survey to be administred to scholarship students, to explore a possible relation between participation in initiatives such as the "wellness week" and course satisfaction.
**
Team “DevsWellbeing”**

@elisaromondia |  @Viola P | @BarbaraC | @Gianni Latorre

**TODO**

1. trasforming the answer to the item *ErgonomicDevices* in numerical value, +1 for each answer. (done)
2. adding *ErgonimicDevices* point to *TotWellbeing* (done)
3. normalize the *TotSatisfaction* and *TotWellbeing* points in order to compare them (done)
4. create a graph (two example done)
5. drop NaN (done)
6. explore difference between female and male (*Gender*)
7. explore data applying further statistical analysis techniques

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import scipy.stats as stats
import pylab as pl

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.


'**SkipMeals'** (Never (46%)NA (27%)1 - 2 times pe... (18%)3 - 4 times pe... (4%)Daily or almos... (4%)) 
"Never" = 1, "1-2" = -1, "3-4" = -2 , "Daily"= -3

**'ErgonomicDevices'** (NA (66%)Ergonomic keyb... (10%)Ergonomic desk (10%)Wrist/hand sup... (3%)Standing desk;... (3%))
** **Ergonomic keyboard= +1**,  **Ergonomic desk= +1**,  **Wrist= +1**,  **Standing= +1** **

**'Exercise'** (I don't typica... (27%)NA (27%)1 - 2 times pe... (21%)3 - 4 times pe... (14%)Daily or almos... (10%)) 
''I don't'' = 0, ''1-2''= +1, ''3-4'' +2, ''Daily'' +3

**'HoursOutside' **(1 - 2 hours (28%)NA (27%)30 - 59 minutes (24%)Less than 30 m... (11%)3 - 4 hours (7%)
''Less''=0, ''59 minutes''=1, ''1-2''=2, ''3-4''=3

>>>**TOTAL WELLNESS/SELFCARE POINTS**


**'JobSatisfaction'** (Moderately sat +2, Extremely sat +3, Slightly sat +1, Moderately dis -2, Extremely dis -3, Slightly dis -1)

** 'CareerSatisfaction' ** (Moderately sat +2, Extremely sat +3, Slightly sat +1, Moderately dis -2, Extremely dis -3, Slightly dis -1)

>>>**TOTAL SATISFACTION POINTS**

//

**'HoursComputer'** (9 - 12 hours (38%)NA (27%)5 - 8 hours (22%)Over 12 hours (10%)1 - 4 hours (2%)) 

In [None]:
#load dataset

x = pd.read_csv('../input/survey_results_public.csv')

In [None]:
#cleaning data

cleanup_js = {"JobSatisfaction": {"Extremely satisfied": 3, "Moderately satisfied": 2, "Extremely dissatisfied": -3, "Moderately dissatisfied": -2, "Slightly satisfied": 1, "Slightly dissatisfied": -1, "Neither satisfied nor dissatisfied": 0},
             "CareerSatisfaction": {"Extremely satisfied": 3, "Moderately satisfied": 2, "Extremely dissatisfied": -3, "Moderately dissatisfied": -2, "Slightly satisfied": 1, "Slightly dissatisfied": -1, "Neither satisfied nor dissatisfied": 0},
             "Exercise": {"I don't typically exercise": 0, "1 - 2 times per week": 1, "3 - 4 times per week": 2, "Daily or almost every day": 3},
             "HoursOutside": {"Less than 30 minutes": 0, "30 - 59 minutes": 1, "1 - 2 hours": 2, "3 - 4 hours": 3, "Over 4 hours": 4},
             "SkipMeals": {"Never": 1, "1 - 2 times per week": -1, "3 - 4 times per week": -2, "Daily or almost every day": -3}}

x.replace(cleanup_js, inplace=True)

In [None]:
#drop row with NaN

x.dropna(subset=['JobSatisfaction', 'Exercise', 'SkipMeals', 'HoursOutside', 'CareerSatisfaction'], how='any', inplace=True)

In [None]:
#fillna in ErgonomicDevices

x.ErgonomicDevices = x.ErgonomicDevices.fillna("no")

In [None]:
#calculates ErgonomicDevices points

x['ErgPoints'] = np.where(x['ErgonomicDevices'] == 'no', 0, x['ErgonomicDevices'].str.count(';') + 1)

In [None]:
#data aggregation

x['TotSatisfaction'] = x['JobSatisfaction'] + x['CareerSatisfaction']

x['TotWellbeing'] = x['Exercise'] + x['HoursOutside'] + x['SkipMeals'] +x['ErgPoints']

In [None]:
#show the table

x.loc[1:100, x.columns.isin(list(['ErgPoints', 'TotWellbeing', 'TotSatisfaction', 'SkipMeals', 'ErgonomicDevices', 'Exercise', 'HoursOutside', 'JobSatisfaction', 'CareerSatisfaction', 'Gender']))]

In [None]:
#normalize TotSatisfaction points

x['TotSatisfactionNorm'] = [float(i)/max(x['TotSatisfaction']) for i in x['TotSatisfaction']]
print(x['TotSatisfactionNorm'])

In [None]:
#normalize TotWellbeing points

x['TotWellbeingNorm'] = [float(i)/max(x['TotWellbeing']) for i in x['TotWellbeing']]
print(x['TotWellbeingNorm'])

In [None]:
#show the table

x.loc[1:100, x.columns.isin(list(['TotWellbeingNorm', 'TotSatisfactionNorm', 'Gender']))].sort_values(by='TotWellbeingNorm', ascending=False)

In [None]:
#show plot

import seaborn as sns

sns.kdeplot(x['TotWellbeingNorm'])
sns.kdeplot(x['TotSatisfactionNorm'])

In [None]:
#show graph_hist

plt.hist(x['TotWellbeingNorm'], bins=50, histtype='stepfilled', normed=True, color='b', label='Wellbeing')
plt.hist(x['TotSatisfactionNorm'], bins=50, histtype='stepfilled', normed=True, color='r', alpha=0.5, label='Satisfaction')
plt.title("Gaussian/Uniform Histogram")
plt.xlabel("Points")
plt.ylabel("Frequencies")
plt.legend()
plt.show()

In [None]:
#test spearman correlation

data = x[['TotSatisfactionNorm','TotWellbeingNorm', 'JobSatisfaction', 'CareerSatisfaction', 'HoursOutside', 'SkipMeals', 'Exercise', 'TotWellbeing', 'TotSatisfaction']]
correlation = data.corr(method='spearman')
print(correlation)

In [None]:
#test pearson correlation

data = x[['TotSatisfactionNorm','TotWellbeingNorm', 'JobSatisfaction', 'CareerSatisfaction', 'HoursOutside', 'SkipMeals', 'Exercise', 'TotWellbeing', 'TotSatisfaction']]
correlation = data.corr(method='pearson')
print(correlation)

**Bibliography**

[Workplace Well-being Linked to Senior Leadership Support, New Survey Finds, *American Psychological Association* ](http://www.apa.org/news/press/releases/2016/06/workplace-well-being.aspx)

[Workplace Wellness Trends 2017, * IFEBP,*](http://www.ifebp.org/bookstore/workplacewellness/Pages/default.aspx)

[Psychological Well-Being and Job Satisfaction as Predictors of Job Performance, *T.A. Wright, R. Croparanzo*](https://www.researchgate.net/profile/Russell_Cropanzano/publication/12655756_Psychological_well-being_and_job_satisfaction_as_predictors_of_job_performance/links/0deec533ae242ccfd3000000.pdf/)

[Investigating the Psychological Well-Being and Job Satisfaction Levels in Different Occupations, *İ.Y. İşgör, N. K. Haspolat*](https://files.eric.ed.gov/fulltext/EJ1121589.pdf)

[A New Way of Examining Job Satisfaction and Employee Well-Being: The Value of Employee Attributed Importance, *R.L Maxwell*](http://www.eawop.org/ckeditor_assets/attachments/772/rosanna_l_maxwell_final_version.pdf?1482168576/)

[25 Fascinating Statistics About Workplace Wellness, *Elizabeth The* ](https://risepeople.com/blog/fascinating-workplace-wellness-statistics/)