<div style="text-align: center"> <h1>Project Report</h1></div>

***

<div style="text-align: center"> <h2>Predicting Occupation Using Knn-Classifciation</h2></div>

<h4> Introduction </h4>

A person's occupation has a significant impact on their lifestyle and health. Overall well-being can be influenced by a number of work-related factors, including physical demands, irregular hours, sedentary positions, and stress related to one's job. Maintaining a healthy balance between work obligations and personal well-being is essential for a long-lasting and satisfying career. The following suggestion is based on this awareness:

Our work is founded on the concept that different occupational categories may show certain patterns in health and wellness parameters, allowing one to infer a person's career from variables like stress and sleep habits.

Our research aims to determine whether it is possible to predict an individual's occupation using predictor variables from the dataset of interest. 

The Sleep Health and Lifestyle Dataset includes data on gender, age, occupation, sleep metrics, physical activity, stress levels, BMI, cardiovascular health, and sleep disorders. The dataset's columns consist of Person ID, Gender, Age, Occupation, Sleep Duration, Quality of Sleep, Physical Activity Level, Stress Level, BMI Category, Blood Pressure, Heart Rate, Daily Steps, and Sleep Disorder Status (None, Insomnia, Sleep Apnea). This dataset facilitates in-depth analysis of sleep patterns, lifestyle factors, cardiovascular health, and sleep disorders for a diverse population.
The Hypothesis underlying our study is that distinct occupational categories may exhibit specific patterns in health and wellness factors, making it possible to infer a person's profession based on factors such as sleep and stress levels. 


In [4]:
library(repr)
library(tidyverse)
library(tidymodels)
url  <- "https://raw.githubusercontent.com/hmza-exe/DSCI-100-GroupProject_003-12/main/Sleep_health_and_lifestyle_dataset.csv"
sleep_health_data <- read_csv(url) |> 
                        rename("person_id" = "Person ID",
                               "gender" = "Gender",
                               "age" = "Age", "occupation" = "Occupation",
                               "sleep_duration" = "Sleep Duration",
                               "quality_of_sleep" = "Quality of Sleep",
                               "physical_activity_level" = "Physical Activity Level",
                               "stress_level" = "Stress Level",
                               "bmi_category" = "BMI Category",
                               "blood_pressure" = "Blood Pressure",
                               "heart_rate" = "Heart Rate",
                               "daily_steps" = "Daily Steps",
                               "sleep_disorder" = "Sleep Disorder")|>
                        select(occupation, daily_steps, physical_activity_level, stress_level, quality_of_sleep, sleep_duration)
head(sleep_health_data)
tail(sleep_health_data)

[1mRows: [22m[34m374[39m [1mColumns: [22m[34m13[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (5): Gender, Occupation, BMI Category, Blood Pressure, Sleep Disorder
[32mdbl[39m (8): Person ID, Age, Sleep Duration, Quality of Sleep, Physical Activity...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


occupation,daily_steps,physical_activity_level,stress_level,quality_of_sleep,sleep_duration
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Software Engineer,4200,42,6,6,6.1
Doctor,10000,60,8,6,6.2
Doctor,10000,60,8,6,6.2
Sales Representative,3000,30,8,4,5.9
Sales Representative,3000,30,8,4,5.9
Software Engineer,3000,30,8,4,5.9


occupation,daily_steps,physical_activity_level,stress_level,quality_of_sleep,sleep_duration
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Nurse,7000,75,3,9,8.1
Nurse,7000,75,3,9,8.1
Nurse,7000,75,3,9,8.0
Nurse,7000,75,3,9,8.1
Nurse,7000,75,3,9,8.1
Nurse,7000,75,3,9,8.1


In [6]:
sleep_health_data_scaled <- sleep_health_data |>
                        mutate(scaled_daily_steps = scale(daily_steps, center = TRUE),
                               scaled_physical_activity = scale(physical_activity_level, center = TRUE),
                                scaled_stress_level = scale(stress_level, center = TRUE),
                              scaled_quality_of_sleep = scale(quality_of_sleep, center = TRUE),
                              scaled_sleep_duration = scale(sleep_duration, center = TRUE))
sleep_health_data_scaled

occupation,daily_steps,physical_activity_level,stress_level,quality_of_sleep,sleep_duration,scaled_daily_steps,scaled_physical_activity,scaled_stress_level,scaled_quality_of_sleep,scaled_sleep_duration
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,"<dbl[,1]>","<dbl[,1]>","<dbl[,1]>","<dbl[,1]>","<dbl[,1]>"
Software Engineer,4200,42,6,6,6.1,-1.6174174,-0.82431400,0.3465563,-1.0968108,-1.2971493
Doctor,10000,60,8,6,6.2,1.9674419,0.03979093,1.4736175,-1.0968108,-1.1714669
Doctor,10000,60,8,6,6.2,1.9674419,0.03979093,1.4736175,-1.0968108,-1.1714669
Sales Representative,3000,30,8,4,5.9,-2.3591124,-1.40038394,1.4736175,-2.7677161,-1.5485140
Sales Representative,3000,30,8,4,5.9,-2.3591124,-1.40038394,1.4736175,-2.7677161,-1.5485140
Software Engineer,3000,30,8,4,5.9,-2.3591124,-1.40038394,1.4736175,-2.7677161,-1.5485140
Teacher,3500,40,7,6,6.3,-2.0500728,-0.92032565,0.9100869,-1.0968108,-1.0457846
Doctor,8000,75,6,7,7.8,0.7312835,0.75987836,0.3465563,-0.2613582,0.8394505
Doctor,8000,75,6,7,7.8,0.7312835,0.75987836,0.3465563,-0.2613582,0.8394505
Doctor,8000,75,6,7,7.8,0.7312835,0.75987836,0.3465563,-0.2613582,0.8394505


<h4>Discussion</h4>

<h4>References</h4>

Tharmalingam, L. (2023, September 18). Sleep health and lifestyle dataset. Kaggle. https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset 