# COGS 108 - Project Proposal

## Authors

- Jacob Lee: Conceptualization, Background research, Methodology, Analysis, Writing - original draft
- Travis Dao: Software, Visualization, Data curation, Analysis, Experimental investigation, Writing - review & editing
- Ranya Tashkandy: Project administration, Software, Visualization, Analysis, Writing - original draft
- Steven Bui: Project administration, Software, Data curation, Analysis, Writing - review & editing

## Research Question

To what extent does sleep quality predict academic performance among university students?

This project will use data representing sleep quality with insomnia serving as the primary metric, and this data will come in the form of self-reported questionnaires from students in which they rate the severity of their insomnia. For data representing academic performance, we can use self-reported data on academic performance including metrics such as assignment completion, focus, and motivation. 

## Background and Prior Work

Sleep is a periodically recurring state of rest that occurs in the human body every 24 hours and lasts for several hours. The state of sleep is naturally induced by the brain, and is characterized by a lowering of both its activity and receptivity to stimuli. <a name="cite_ref-1"></a>[<sup>1</sup>](#cite_note-1) The functions of sleep with respect to the brain are numerous, including the processes of cleaning out accumulated toxins and waste products, consolidating information acquired during daytime, and repairing neurons from damage caused by free radicals. These processes are crucial to maintaining the ability of the brain to function properly, making sleep a strong influencing factor of someone’s behavior in most situations including an academic setting, where cognitive functioning seems to be a relevant factor in academic performance. It follows that sleep quality can work as a metric to predict the academic performance of students, which we plan on testing in our project.  

There are numerous studies existing on the impact of sleep on cognitive functioning in general. For example, a study by Garcia et al. found that all basic cognitive processes of people were detrimentally affected after 24 hours or more without sleep, including attention, working memory, and executive functions such as cognitive flexibility and inhibition of irrelevant sensory stimuli. However, the researchers also found that these processes were differentially affected by sleep deprivation. Attention related processes were found to be impacted the most, while working memory was impacted moderately and cognitive flexibility, one component of executive functioning, was not found to be impacted at all. <a name="cite_ref-2"></a>[<sup>2</sup>](#cite_note-2) These researchers were able to quantify the effects of sleep deprivation through cognitive tests given before and after the participants were subjected to the sleep deprivation.

Narrowing down further, researchers from the University of Washington directly examined the impact of sleep on academic performance in their study “Student’s Sleep and Academic Performance”. This project used multiple parameters to investigate sleep’s impact, including novel ones such as chronotype, or whether one is an early bird or a night owl, differences in sleep timing between school days and the weekend, and more conventional parameters such as “variability of sleep onset, offset, duration, etc”. <a name="cite_ref-3"></a>[<sup>3</sup>](#cite_note-3)

Overall, it has been found through research that sleep has a dominant influence on cognition and therefore behavior and its consequences, and therefore seems promising as a predictor of academic performance. We aim to replicate the findings of our fellow sleep researchers with our own project correlating sleep metrics with academic outcomes. 


1. <a name="cite_note-1"></a> [^](#cite_ref-1) Kalat, J. (2018). *Biological Psychology*. Cengage Learning.  
2. <a name="cite_note-2"></a> [^](#cite_ref-2) García, A., et al. (2021). *Effects of Sleep Deprivation on Cognitive Performance*. *Frontiers in Psychology*. https://pmc.ncbi.nlm.nih.gov/articles/PMC8340886/  
3. <a name="cite_note-3"></a> [^](#cite_ref-3) University of Washington eScience Institute. (2014). *Students’ Sleep and Academic Performance*. https://escience.washington.edu/incubator-14-sleep/


## Hypothesis


We predict a strong positive correlation between sleep quality and academic performance. Specifically, students with poor sleep quality will report lower academic performance, focus, and motivation. 

This expectation is based on prior research, showing that insufficient or poor quality sleep impairs attention, working memory and executive function, which are essential cognitive processes for learning and academic success.

## Data

The ideal dataset would include a variety of variables that capture both sleep quality and relevant demographic information. Key variables would include self-reported sleep quality, possibly supplemented by scientifically measured data such as average sleep duration or sleep consistency, along with demographic details like age, school year, and gender. To ensure meaningful analysis, the dataset should contain a good number of observations, at least 500 students, to provide sufficient statistical power and represent a range of experiences. The data would be collected primarily through self-reported surveys, allowing students to share their sleep habits, lifestyle factors, and academic details. Ideally, this information would be stored in a structured format such as a CSV file, either compiled from new survey responses or sourced from an existing public dataset available online for research use.

The first dataset is hosted on Mendeley Data (URL: https://data.mendeley.com/datasets/5mvrx4v62z/3 ). It is freely accessible and does not appear to require special permissions beyond citing the source. The dataset includes variables such as Sleep_Hours, Difficulty_Falling_Asleep, Stress_Level, Academic_Performance, Caffeine_Consumption, and Electronic_Device_Use. These variables are highly relevant because they link sleep behaviors and lifestyle factors to academic outcomes, which aligns strongly with my research question.

The second dataset is available on Kaggle (URL: https://www.kaggle.com/datasets/arsalanjamal002/student-sleep-patterns ). It is also free to access (though a Kaggle account may be needed) and contains variables including Sleep_Duration, Sleep_Quality, Study_Hours, Screen_Time, Caffeine_Intake, and Physical_Activity. These variables are valuable because they allow exploration of how lifestyle behaviors like screen time, caffeine intake and exercise correlate with both sleep metrics and study-related activities, giving a rich dataset for modeling and statistical analysis.

## Ethics 

Instructions: Keep the contents of this cell. For each item on the checklist
-  put an X there if you've considered the item
-  IF THE ITEM IS RELEVANT place a short paragraph after the checklist item discussing the issue.
  
Items on this checklist are meant to provoke discussion among good-faith actors who take their ethical responsibilities seriously. Your teams will document these discussions and decisions for posterity using this section.  You don't have to solve these problems, you just have to acknowledge any potential harm no matter how unlikely.

Here is a [list of real world examples](https://deon.drivendata.org/examples/) for each item in the checklist that can refer to.

[![Deon badge](https://img.shields.io/badge/ethics%20checklist-deon-brightgreen.svg?style=popout-square)](http://deon.drivendata.org/)

### A. Data Collection
 - [X] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?

> Example of how to use the checkbox, and also of how you can put in a short paragraph that discusses the way this checklist item affects your project.  Remove this paragraph and the X in the checkbox before you fill this out for your project

 - [X] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?
 - [ ] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?
 - [ ] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?

### B. Data Storage
 - [ ] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?
 - [ ] **B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed?
 - [ ] **B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed?

### C. Analysis
 - [X] **C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)?
 - [X] **C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)?
 - [X] **C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data?
 - [ ] **C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis?
 - [X] **C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future?

### D. Modeling
 - [X] **D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory?
 - [ ] **D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)?
 - [ ] **D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics?
 - [ ] **D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed?
 - [X] **D.5 Communicate limitations**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?

### E. Deployment
 - [ ] **E.1 Monitoring and evaluation**: Do we have a clear plan to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)?
 - [ ] **E.2 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)?
 - [ ] **E.3 Roll back**: Is there a way to turn off or roll back the model in production if necessary?
 - [ ] **E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed?


## Team Expectations 

- Communicate early and often
   - whenever a task is accomplished
   - Literally any doubts/questions
   - Availability -> (not) being able to do something
  
- Collaborative environment
   - Actively contribute
   - Actively give updates
  
- Respect
   - Respectful discussion
   - Respectful of each others time

- Accountability
   - Do your part in the project
   - Give high quality effort

## Project Timeline Proposal

| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 10/29  |  Before 11:59 PM | Do background research on topic | Discuss ideal dataset(s) and ethics; draft project proposal |  
| 11/7  |  Before 11:59 PM |  Edit, finalize, and submit proposal; Search for datasets  | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part   | 
| 11/14  | Before 11:59 PM  | Import & Wrangle Data; EDA | Review/Edit wrangling/EDA; Discuss Analysis Plan   |
| 11/21  | Before 11:59 PM  | Finalize wrangling/EDA; Begin Analysis | Discuss/edit Analysis; Complete project check-in |
| 11/29  | Before 11:59 PM  | Complete analysis; Draft results/conclusion/discussion | Discuss/edit full project |
| 12/6  | Before 11:59 PM  | NA | Turn in Final Project & Group Project Surveys |