Project proposal
Reqs:

**Specific expectations for the proposal:**

Each group is expected to prepare a 1 page (max 500 words) written proposal that identifies the dataset they plan to work on, as well as the question they would like to answer using that dataset for their group project. The proposal should be done in a Jupyter notebook, and then submitted both as an .html file (File -> Download As -> HTML) and an .ipynb file that is reproducible (i.e. works and runs without any additional files.)

Each proposal should include the following sections:

**Title**

**Introduction:**

Sleep is a fundamental aspect of human life and is critical to physical and mental health. However, many people suffer from sleep disorders, which involve problems with the quality, timing, and amount of sleep. The two most common sleep disorders are sleep apnea, where breathing frequently stops during sleep, and insomnia, where people have difficulty falling and staying asleep. Sleep disorders can be influenced by lifestyle and are often comorbid with chronic health disorders.

In our project, we will utilise the Sleep Health and Lifestyle Dataset. This dataset is a collection of information pertaining to sleep, lifestyle factors, cardiovascular health, and sleep disorders. Using the dataset, we aim to determine: “How can lifestyle and physiological measures be used to determine the absence or presence of sleep disorders?” 

**Preliminary exploratory data analysis:**
- Demonstrate that the dataset can be read from the web into R 
- Clean and wrangle your data into a tidy format
- Using only training data, summarize the data in at least one table (this is exploratory data analysis). An example of a useful table could be one that reports the number of observations in each class, the means of the predictor variables you plan to use in your analysis and how many rows have missing data. 
- Using only training data, visualize the data with at least one plot relevant to the analysis you plan to do (this is exploratory data analysis). An example of a useful visualization could be one that compares the distributions of each of the predictor variables you plan to use in your analysis.

**Methods:**
- Explain how you will conduct either your data analysis and which variables/columns you will use. Note - you do not need to use all variables/columns that exist in the raw data set. In fact, that's often not a good idea. For each variable think: is this a useful variable for prediction?
- Describe at least one way that you will visualize the results


**Expected outcomes and significance:**
From our project, we expect to create a classification model that is able to predict an unclassified observation’s presence of a sleep disorder based on given physiological measures and lifestyle factors, as well as find the best K parameter that would provide the most accurate classifying model. These findings may help determine common factors associated with sleep disorder patients and which factors are strongly or weakly linked.

Future questions our project could lead to are:
- What forms of interventions could be implemented to help the populations most affected by sleep disorders?
- Is there one factor that best predicts the presence of sleep disorders?
- Are there any other factors relating to lifestyle habits that also correlate with sleep disorder diagnosis?
 

In [1]:
library(tidyverse)
sleep_data <- read_csv("Sleep_health_and_lifestyle_dataset.csv", skip = 1)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.7     [32m✔[39m [34mdplyr  [39m 1.0.9
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

[1mRows: [22m[34m748[39m [1mColumns: [22m[34m13[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (5): Gender, Occupation, BMI Category, Blood Pressure, Sleep Disorder
[32mdbl[39m (8): Person ID, Age, Sleep Duration, Quality of Sleep, Physical 

In [2]:
colnames(sleep_data) <- make.names(colnames(sleep_data))

In [3]:
sleep_data <- na.omit(sleep_data)

In [4]:
sleep_data <- separate(sleep_data,
    col = Blood.Pressure,
    into = c("systolic.bp", "diastolic.bp"), #character vector of new column names
    sep = "/", #e.g. "/"
    convert = TRUE) 

In [5]:
sleep_data_2 <- sleep_data |>
    select(-Gender, -Occupation)
head(sleep_data_2)

Person.ID,Age,Sleep.Duration,Quality.of.Sleep,Physical.Activity.Level,Stress.Level,BMI.Category,systolic.bp,diastolic.bp,Heart.Rate,Daily.Steps,Sleep.Disorder
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<int>,<int>,<dbl>,<dbl>,<chr>
1,27,6.1,6,42,6,Overweight,126,83,77,4200,
2,28,6.2,6,60,8,Normal,125,80,75,10000,
3,28,6.2,6,60,8,Normal,125,80,75,10000,
4,28,5.9,4,30,8,Obese,140,90,85,3000,Sleep Apnea
5,28,5.9,4,30,8,Obese,140,90,85,3000,Sleep Apnea
6,28,5.9,4,30,8,Obese,140,90,85,3000,Insomnia


In [10]:
sleep_apnea_untidy <- sleep_data_2 |>
filter(Sleep.Disorder == "Sleep Apnea") |>
select(-Sleep.Disorder)
sleep_apnea_tidy <- data.frame(sleep_apnea_untidy, Sleep.Disorder = "Sleep Disorder")

insomnia_untidy <- sleep_data_2 |>
filter(Sleep.Disorder == "Insomnia") |>
select(-Sleep.Disorder)
insomnia_tidy <- data.frame(insomnia_untidy, Sleep.Disorder = "Sleep Disorder")

none_untidy <- sleep_data_2 |>
filter(Sleep.Disorder == "None") |>
select(-Sleep.Disorder)
none_tidy <- data.frame(none_untidy, Sleep.Disorder = "None")

tidy_sleep_data <- rbind(sleep_apnea_tidy, insomnia_tidy, none_tidy)

tidy_sleep_data

Person.ID,Age,Sleep.Duration,Quality.of.Sleep,Physical.Activity.Level,Stress.Level,BMI.Category,systolic.bp,diastolic.bp,Heart.Rate,Daily.Steps,Sleep.Disorder
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<int>,<int>,<dbl>,<dbl>,<chr>
4,28,5.9,4,30,8,Obese,140,90,85,3000,Sleep Disorder
5,28,5.9,4,30,8,Obese,140,90,85,3000,Sleep Disorder
17,29,6.5,5,40,7,Normal Weight,132,87,80,4000,Sleep Disorder
18,29,6.0,6,30,8,Normal,120,80,70,8000,Sleep Disorder
31,30,6.4,5,35,7,Normal Weight,130,86,78,4100,Sleep Disorder
50,31,7.7,7,75,6,Normal,120,80,70,8000,Sleep Disorder
81,34,5.8,4,32,8,Overweight,131,86,81,5200,Sleep Disorder
82,34,5.8,4,32,8,Overweight,131,86,81,5200,Sleep Disorder
94,35,7.4,7,60,5,Obese,135,88,84,3300,Sleep Disorder
104,36,6.6,5,35,7,Overweight,129,84,74,4800,Sleep Disorder
