# **Statistical inference project**

In [None]:
library(cowplot)
library(datateachr)
library(digest)
library(infer)
library(repr)
library(taxyvr)
library(tidyverse)
library(dplyr)

UsageError: Cell magic `%%r` not found.


**Title: Comparing life expectancy between workers and non-workers.**

**Introduction** ***Edit as needed*** t

Historically, it is known that long work hours causes stress, and stress decreases health by introducing symptoms such as depression, anxiety, and poor sleep. (Betterhealth, 2012). We are interested in finding if more work leads to decreased life expectancy. Our response variable will be average life expectancy and our scale parameter will be standard deviation and not standard error because we have access to all the data. The dataset used for this analysis will be called healthy_lifestyle_city_2021.csv. This dataset includes info about 44 cities and were compared using 11 metrics. This dataset originated by many non-profit organizations, but was put together at the end by Lenstore, a for-profit company.

This dataset has 12 columns, but we will only need the name of the city name, the happiness level and the annual average hours worked. We will split the dataset into 2 groups, workers and non-workers. Workers will have average annual work hours to be >50 percentile of this dataset and the non-workers will have annual average work hours <50 percentile.

**Preliminary Results** ***Edit as needed*** mom

Our data can be read from the web. We will do that then clean up the data.

In [None]:
link <- "https://raw.githubusercontent.com/zhong-test/data-for-class/master/healthy_lifestyle_city_2021.csv"
data <- read.csv(link)
head(data)

Unnamed: 0_level_0,City,Rank,Sunshine.hours.City.,Cost.of.a.bottle.of.water.City.,Obesity.levels.Country.,Life.expectancy.years...Country.,Pollution.Index.score...City.,Annual.avg..hours.worked,Happiness.levels.Country.,Outdoor.activities.City.,Number.of.take.out.places.City.,Cost.of.a.monthly.gym.membership.City.
Unnamed: 0_level_1,<chr>,<int>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<dbl>,<int>,<int>,<chr>
1,Amsterdam,1,1858,£1.92,20.40%,81.2,30.93,1434,7.44,422,1048,£34.90
2,Sydney,2,2636,£1.48,29.00%,82.1,26.86,1712,7.22,406,1103,£41.66
3,Vienna,3,1884,£1.94,20.10%,81.0,17.33,1501,7.29,132,1008,£25.74
4,Stockholm,4,1821,£1.72,20.60%,81.8,19.63,1452,7.35,129,598,£37.31
5,Copenhagen,5,1630,£2.19,19.70%,79.8,21.24,1380,7.64,154,523,£32.53
6,Helsinki,6,1662,£1.60,22.20%,80.4,13.08,1540,7.8,113,309,£35.23


We will now tidy the data

In [None]:
tidy <- data %>% 
        select(City, `Life.expectancy.years...Country.`,`Annual.avg..hours.worked`) %>% 
        reframe(Life = `Life.expectancy.years...Country.`, 
                  Work = as.numeric(`Annual.avg..hours.worked`)) %>% # renamed columns 
        filter(!is.na(Work)) 
head(tidy)

[1m[22m[36mℹ[39m In argument: `Work = as.numeric(Annual.avg..hours.worked)`.
[33m![39m NAs introduced by coercion”


Unnamed: 0_level_0,Life,Work
Unnamed: 0_level_1,<dbl>,<dbl>
1,81.2,1434
2,82.1,1712
3,81.0,1501
4,81.8,1452
5,79.8,1380
6,80.4,1540


We will now separate the cities into 2 groups, high work and low work. We can visualize this first.

In [None]:
# Finding median for work hours. This will be needed for separting the cities into 2 groups
median <- median(tidy$Work)
median

# separating the cities into 2 groups, evenly divided
group <- tidy %>%
    mutate(hi_work = Work > median)
head(group)

Unnamed: 0_level_0,Life,Work,hi_work
Unnamed: 0_level_1,<dbl>,<dbl>,<lgl>
1,81.2,1434,False
2,82.1,1712,True
3,81.0,1501,False
4,81.8,1452,False
5,79.8,1380,False
6,80.4,1540,False


We now compare the estimates for expected life age.

In [None]:
hi_worker_life <- group %>% filter(hi_work == TRUE) %>% summarise(mean = mean(Life)) %>% pull(mean)
lo_worker_life <- group %>% filter(hi_work == FALSE) %>% summarise(mean = mean(Life)) %>% pull(mean)
hi_worker_life
lo_worker_life

Initial estimates are interesting, but this could have been from luck.

**Methods & Plan**

**TODO**

Reflections:

* What do you expect to find?
We expect there to be a relationship between long work hours and decreased life expectancy. We anticipate that cities with longer work hours may have lower average life expectancies. We are essentially testing the hypothesis that increased work hours might have a negative impact on the overall health/ life expectancy of the people living in these cities. 
* What impact could such findings have?
We know that our time on Earth is limited so if we realize that something might be negatively affecting that limited time, we would start to make lifestyle changes in hopes to counter it. The findings of this study could have significant impacts on public health, labor policies, and societal well-being. If our research shows that longer work hours might lead to reduced life expectancies, it would signify the need for us as a society and as individuals to reevaluate our approach to work-life balance and lifestyle choices. This might lead to discussions and potential policy changes regarding working hours, overtime regulations, and the importance of maintaining a healthy work-life balance. It could also raise awareness about the importance of mental and physical health, as well as work-related stress management. 
* What future questions could this lead to?
Some possible future questions and areas of research could be: 
    - Studies on specific occupational health risks. Are certain industries or types of jobs more likely to contribute to decreased life expectancy due to long work hours? 
    - What possible strategies can be implemented to mitigate the negative effects of long work hours on health? This could include policies for work-hour limits, stress management programs, etc.
    - Do certain cities or countries manage to maintain longer work hours without significant negative health impacts? This would help in the research for formulating better strategies for improving productivity without rising the employees’ health.
    - Do longer work hours directly cause reduced life expectancy, or are there other contributing factors that should be considered and if so which ones?


**References**

Our data was from Lenstore and Lenstore used data from:
    https://www.gfmag.com/global-data/non-economic-data/best-cities-to-live
    
    https://ourworldindata.org/obesity
    
    http://happyplanetindex.org/countries
    
    https://en.wikipedia.org/wiki/List_of_cities_by_sunshine_duration
    
    https://www.numbeo.com/pollution/rankings.jsp
    
    https://worldhappiness.report
    
    https://www.numbeo.com/cost-of-living
    
    https://worldpopulationreview.com/country-rankings/average-work-week-by-country
    
    https://data.oecd.org/emp/hours-worked.htm
    
    https://www.tripadvisor.co.uk
    
    
Publications:

https://www.betterhealth.vic.gov.au/health/healthyliving/work-related-stress

https://www.ctpublic.org/news/2022-01-04/chronic-stress-can-reduce-lifespan-says-recent-yale-study