Project proposal
Reqs:

**Specific expectations for the proposal:**

Each group is expected to prepare a 1 page (max 500 words) written proposal that identifies the dataset they plan to work on, as well as the question they would like to answer using that dataset for their group project. The proposal should be done in a Jupyter notebook, and then submitted both as an .html file (File -> Download As -> HTML) and an .ipynb file that is reproducible (i.e. works and runs without any additional files.)

Each proposal should include the following sections:

**Title**a

**Introduction:**

Sleep is a fundamental aspect of human health and plays a crucial role in our overall well-being. Understanding the factors that influence sleep patterns and their impact on daily habits and health outcomes is of utmost importance. 
This proposal outlines a data analysis project utilizing the Sleep Health and Lifestyle Dataset, a comprehensive collection of information pertaining to sleep, lifestyle factors, cardiovascular health, and sleep disorders. 


By leveraging the diverse information contained in this dataset, we can… answer *Insert research question*


 uncover valuable findings that contribute to the broader understanding of sleep health and its interconnectedness with various aspects of individuals' lives. 

- Provide some relevant background information on the topic so that someone unfamiliar with it will be prepared to understand the rest of your proposal
- Clearly state the question you will try to answer with your project
- Identify and describe the dataset that will be used to answer the question

**Preliminary exploratory data analysis:**
- Demonstrate that the dataset can be read from the web into R 
- Clean and wrangle your data into a tidy format
- Using only training data, summarize the data in at least one table (this is exploratory data analysis). An example of a useful table could be one that reports the number of observations in each class, the means of the predictor variables you plan to use in your analysis and how many rows have missing data. 
- Using only training data, visualize the data with at least one plot relevant to the analysis you plan to do (this is exploratory data analysis). An example of a useful visualization could be one that compares the distributions of each of the predictor variables you plan to use in your analysis.

**Methods:**
- Explain how you will conduct either your data analysis and which variables/columns you will use. Note - you do not need to use all variables/columns that exist in the raw data set. In fact, that's often not a good idea. For each variable think: is this a useful variable for prediction?
- Describe at least one way that you will visualize the results


**Expected outcomes and significance:**
- What do you expect to find?
- What impact could such findings have?
- What future questions could this lead to?

Please submit your group project proposal. Only one member of your team needs to submit. 

In [42]:
library(tidyverse)

In [43]:
library(tidyverse)
sleep_data <- read_csv("Sleep_health_and_lifestyle_dataset.csv", skip = 1)
sleep_data

[1mRows: [22m[34m748[39m [1mColumns: [22m[34m13[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (5): Gender, Occupation, BMI Category, Blood Pressure, Sleep Disorder
[32mdbl[39m (8): Person ID, Age, Sleep Duration, Quality of Sleep, Physical Activity...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
<dbl>,<chr>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<chr>
,,,,,,,,,,,,
1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
,,,,,,,,,,,,
2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
,,,,,,,,,,,,
3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
,,,,,,,,,,,,
4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
,,,,,,,,,,,,
5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea


In [44]:
colnames(sleep_data) <- make.names(colnames(sleep_data))
sleep_data

Person.ID,Gender,Age,Occupation,Sleep.Duration,Quality.of.Sleep,Physical.Activity.Level,Stress.Level,BMI.Category,Blood.Pressure,Heart.Rate,Daily.Steps,Sleep.Disorder
<dbl>,<chr>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<chr>
,,,,,,,,,,,,
1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
,,,,,,,,,,,,
2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
,,,,,,,,,,,,
3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
,,,,,,,,,,,,
4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
,,,,,,,,,,,,
5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea


In [45]:
sleep_data <- na.omit(sleep_data)
head(sleep_data)

Person.ID,Gender,Age,Occupation,Sleep.Duration,Quality.of.Sleep,Physical.Activity.Level,Stress.Level,BMI.Category,Blood.Pressure,Heart.Rate,Daily.Steps,Sleep.Disorder
<dbl>,<chr>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<chr>
1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
6,Male,28,Software Engineer,5.9,4,30,8,Obese,140/90,85,3000,Insomnia


In [48]:
sleep_data <- separate(sleep_data,
    col = Blood.Pressure,
    into = c("systolic.bp", "diastolic.bp"), #character vector of new column names
    sep = "/", #e.g. "/"
    convert = TRUE) 
sleep_data

Person.ID,Gender,Age,Occupation,Sleep.Duration,Quality.of.Sleep,Physical.Activity.Level,Stress.Level,BMI.Category,systolic.bp,diastolic.bp,Heart.Rate,Daily.Steps,Sleep.Disorder
<dbl>,<chr>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<int>,<int>,<dbl>,<dbl>,<chr>
1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126,83,77,4200,
2,Male,28,Doctor,6.2,6,60,8,Normal,125,80,75,10000,
3,Male,28,Doctor,6.2,6,60,8,Normal,125,80,75,10000,
4,Male,28,Sales Representative,5.9,4,30,8,Obese,140,90,85,3000,Sleep Apnea
5,Male,28,Sales Representative,5.9,4,30,8,Obese,140,90,85,3000,Sleep Apnea
6,Male,28,Software Engineer,5.9,4,30,8,Obese,140,90,85,3000,Insomnia
7,Male,29,Teacher,6.3,6,40,7,Obese,140,90,82,3500,Insomnia
8,Male,29,Doctor,7.8,7,75,6,Normal,120,80,70,8000,
9,Male,29,Doctor,7.8,7,75,6,Normal,120,80,70,8000,
10,Male,29,Doctor,7.8,7,75,6,Normal,120,80,70,8000,


In [50]:
sleep_data_2 <- sleep_data |>
    select(-Gender, -Occupation)
head(sleep_data_2)

Person.ID,Age,Sleep.Duration,Quality.of.Sleep,Physical.Activity.Level,Stress.Level,BMI.Category,systolic.bp,diastolic.bp,Heart.Rate,Daily.Steps,Sleep.Disorder
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<int>,<int>,<dbl>,<dbl>,<chr>
1,27,6.1,6,42,6,Overweight,126,83,77,4200,
2,28,6.2,6,60,8,Normal,125,80,75,10000,
3,28,6.2,6,60,8,Normal,125,80,75,10000,
4,28,5.9,4,30,8,Obese,140,90,85,3000,Sleep Apnea
5,28,5.9,4,30,8,Obese,140,90,85,3000,Sleep Apnea
6,28,5.9,4,30,8,Obese,140,90,85,3000,Insomnia
