# Data Visualisation Coursework 2
### March 2023

## Introduction

The COVID-19 pandemic needs no introduction. Beyond the millions of deaths worldwide, the pandemic has had far-reaching secondary effects. The pandemic is now considered not only a health crisis, but also a social and economic crisis. The repercussions of which, have yet to be fully explored and understood. 

In early 2020 until mid 2021 most parts of the United States were subject to shelter-in-place orders by local government. The result of which on physical health has not been widely studied quantitatively but the cultural and societal shift has been obvious. 

The pandemic has caused disruptions in daily routines, including changes in work, school, and exercise patterns. Many people have been working from home and spending more time indoors, leading to more sedentary behavior and less physical activity. Additionally, the pandemic has caused an increase in stress and anxiety levels, which can lead to overeating and unhealthy food choices.

According to a study published in the Journal of the American Medical Association (JAMA) in March 2021 (Source: https://www.cdc.gov/nchs/products/databriefs/db360.htm), the prevalence of obesity in the United States increased from 42.4% in 2017-2018 to 44.9% in 2019-2020. The study suggests that the COVID-19 pandemic may have contributed to this increase in obesity rates.

Furthermore, the pandemic has disproportionately affected certain populations, including those with lower incomes, who are more likely to have limited access to healthy foods and safe places to exercise. These disparities can exacerbate existing health inequities related to obesity. The COVID-19 pandemic has had a significant impact on obesity rates in the United States, highlighting the importance of maintaining healthy habits, even during times of stress and uncertainty.

The goal of this project is to learn about how the COVID-19 pandemic and its associated lockdowns and cultural/social shifts have affected the physical health of average Americans. From the data we gather ourselves, as well as data provided by the *Centers for Disease Control and Prevention* (CDC) we will attempt to learn more about how Americans have a whole have been affected by the COVID-19 pandemic and which groups of Americans have been most affected. 

## Research  topic

Research Questions
- population and sampling method

The United States currently has a population of roughly 332.4 million. With such a large population and the standard 95% confidence level and 5% margin of area, a calculated ideal sample of 385 individuals should be used. 

- explicitly stated research question(s)

In our efforts to quantify exactly how the COVID-19 pandemic has affected the average American we attempt to answer the following more specific questions:

1. Has the pandemic and its associated lockdowns increased or decreased the average American’s daily physical activity, if at all. E.g. Average minutes of physical activity per day
2. Has the pandemic and its associated lockdowns increased or decreased the health of the average American, if at all. E.g. Obesity statistics
3. How has the pandemic and its associated lockdowns affected Americans by state. 
4. How has the pandemic and its associated lockdowns affected Americans by demographic. E.g. gender/race/socioeconomic strata

- scope (should be appropriate for the assignment)

domain concepts
- clearly define important terms and concepts in the study

## Data collection / Survey design

Data sources [5%]
- briefly explain your survey methodology
- briefly explain how you use external data in this project


- where/how did you find it?
- how/why was the data initially collected?
- are the any ethical or legal issues?

During the course of this survey project we will attempt to gather as much meaningful data as possible using the JISC survey, however we will also be using a secondary data source alongside our survey data to help complete the picture and allow us to draw further conclusions. 

We have chosen to use data provided by a United States federal agency called the Centers for Disease Control and Prevention (CDC). The CDC has gained prominence in the US since the beginning of the pandemic for its extensive work in testing and halting the spread of the coronavirus epidemic. However, the agency is more traditionally associated with surveying public health. Consequently the agency records an abundance of public health statistics that can be used to paint a picture of the average American’s health before, during, and after the pandemic. 

As with almost all government collected public data, the CDC data is made available on its website (https://chronicdata.cdc.gov) public use. For the purposes of this survey we will be combining several data sets that survey public health that cover a variety of topics. These include such areas as obesity rates, average physical exercise, nutritional gain, as well as public policy information. 

The CDC has in the past been involved with some controversies regarding its handling of the initial stages of the COVID-19 pandemic, along with some very public clashes with then-president Trump. However, these controversies have never been associated with the quality and validity of its data. The CDC exists fundamentally as an apolitical entity and its collected data can be reasonably assumed to be free of bias and to be of good quality. 

- critically evaluate your data, is the data trustworthy, and valid for your purposes?

## Data overview and pre-processing

data types and pre-processing
- brief description of variables and data types
- describe and justify data cleaning and pre-processing (i.e. tidy data)
- handing of missing or erroneous data

data summary statistics
- number of survey responses
- number of observations in external data
- summary of demographics and key variables
- use of tables or easily understandable quantities in prose

In [2]:
# Import necessary packages for data processing and visualization

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
# Import the raw data

import 

## Analysis and results

- visualise individual variables
- visualise relationships between variables
- aim for high quality explanatory visualisation that describe or tell a story about the behaviour or phenomena under investigation
- marks will be awarded for (see rubric for more detail):
- appropriate plots for variable data types
- presentation quality
- visual communication
- methodical data visualisation process

## Conclusions

- summarise key findings
- future directions
- evaluate your process and visualisations
- things to improve and/or pointers to future research

## References

## Appendices

## Word Count

The following code will count the number of words in Markdown cells. Code cells are not included.

- `Main word count` is the number of words in the main body of the text, *excluding* references or appendices.
- `References and appendices word count` is the number of words in any references or appendices.

Only `Main word count` relates to the assignment word limit. There is no limit to the number of words that can be included in references or appendices. Please note that appendices should only be used to provide context or supporting information. *No marks will be directly awarded for material submitted in appendices*.

Important:

- Please do not modify the word count code!
- To exclude references from your word count **you must** have a cell that starts with the text `## References`. Everything below this cell will not count towards the main word count.
- If you are submitting additional material as appendices **you must** have a cell that starts with the text `## Appendices`. Everything below this cell will not count towards the main word count. If you do not have any appendices you can delete the `## Appendices` cell.
- Code comments should only be used to explain details of the implementation, not for discussing your findings. All analysis commentary **must** be written in Markdown cells. *No marks will be awarded for analysis discussion submitted as comments in code cells*.

In [1]:
%%js

// Run this cell to update your word count.

function wordcount() {
    let wordCount = 0
    let extraCount = 0
    let mainBody = true

    let cells = Jupyter.notebook.get_cells()
    cells.forEach((cell) => {
        if (cell.cell_type == 'markdown') {
            let text = cell.get_text()
            // Stop counting as main body when get to References or Appendices.
            if (text.startsWith('## References') ||
                text.startsWith('## Appendices')) {
                mainBody = false
            }
            if (text.startsWith('## Word Count')) {
                text = ''
            }
            if (text) {
                let words = text.toLowerCase().match(/\b[a-z\d]+\b/g)
                if (words) {
                    let cellCount = words.length
                    if (mainBody) {
                        wordCount += cellCount
                    } else {
                        extraCount += cellCount
                    }
                }
            }
        }
    })
    return [wordCount, extraCount]
}

let wc = wordcount()
element.append(`Main word count: ${wc[0]} (References and appendices word count: ${wc[1]})`)

<IPython.core.display.Javascript object>