# Ethics of Health Data Science Homework

 ## Misuse of Health Data


**Tuskegee Syphilis Study**: this was a study conducted by the USPHS with the goal of understanding the natural history of syphilis. However, there were major issues of health data misuse and ethical violations involved. Participants joined and participated in the study under the the belief that they were part of an experiment aimed at discovering treatment for the disease, meaning their data was access under false pretenses.

Participants did not provide **informed consent**, data was consistently taken and used without their permission, and the data were used to track the natural history of syphilis, rather than find a treatment. This means that utilization of this data was entirely unauthorized. 

Along with the misuses of health data, other ethical issues in this study include:
- Deception
- Withholding of treatment
- Exploitation of a vulnerable population
- Violations of beneficence and nonmaleficence

> **Prevention Measures**: This situation could have been prevented if there were stronger regulations about informed consent, better measures protecting vulnerable populations, and increased oversight in institutional research.

---

 ## Breaches in Health Data Security


**UCLA Health Network Breach**: in 2015, the UCLA Health System was the target of a cyber-attack where the hackers were able to access the sensitive information of around **4.5 million individuals**, including both patients and staff members. The attack was a multi-month process, with suspicious activity starting in September and not being discovered until mid-May.

The attack increased the risk of identity theft, insurance fraud, and financial harm for all individuals exposed and caused serious reputational damage to UCLA. 

> **Prevention Measures**: This breach could have been prevented if UCLA implemented stronger cybersecurity measures, better database monitoring systems, and more regular updates.

---

 ## Bias in Health Data


**Framingham Heart Study**: the original cohort was almost entirely white and middle-class, with limited representation of women and minority populations. This meant that the data generated was **not representative of the broader US population**. The study, being as influential as it was, reinforced a one-size-fits-all model for CVD which lasted for decades. This has led to hosts of inaccurate risk estimations and delayed recognition of actual patterns in non-cohort-related populations.

> **Prevention Measures**: This could have been prevented if the original researchers included a more diverse sample, over-sampled or engaged in outreach activities for underrepresented groups, and critically evaluated potential biases in their experiment.

---

 ## Anonymization of Health Data


**BRFSS**: the CDC's BRFSS is one of the largest health surveys in the United States. Because hundreds of thousands of individuals participate in the BRFSS, anonymization is critically important.

The survey uses multiple of the techniques discussed, but the most important ones are **data masking** and **generalization**:
- All identifying information of participants is removed from the publicly available datasets
- Geographic data is reported at the state or county level
- Participants are largely grouped by ranged characteristics

---

## GWU OHR

GWU's **Office of Human Research** supports the workings of all GWU IRBs and is therefore responsible for the supporting of all human subject research. Their website offers comprehensive documentation about types of research which need IRB approval, the process of getting and functioning with IRB approval, and navigating problems in research.

The OHR also has many resources available for students, faculty, and staff, including classroom/department visits and individual appointments.

---

## Study Questions

### Question 1
**What is the primary goal of data anonymization?**

> To protect against the identification of individuals and subsequent exploitative or otherwise negative implications of such.

### Question 2
**What is the Havasupai Tribe case about?**

> Health scientists at ASU used genetic sequence data from members of the Havasupai Tribe to conduct research beyond the scope of the lab's original agreement with the tribe. This was a violation of informed consent and a clear misuse of health data, as permission for additional research was not granted in the contract.

### Question 3
**What was the largest health data breach in history?**

> In 2015, the health insurance company **Anthem Inc.** suffered a cyber attack involving the compromising of **78 million individuals** (Names, birthdays, Social Security numbers). The breach demonstrated the reality of data threats and was later attributed to state sponsored actors.

### Question 4
**What is the pulse oximeter controversy about?**

> Studies found that pulse oximeters **overestimate oxygen levels in Black patients** because the device was developed using data primarily from light-skinned individuals. This bias can delay diagnosis and treatment of low oxygen levels, showing the need for more inclusive medical data and device testing.

### Question 5
**What is the role of an Institutional Review Board (IRB) in research?**

> IRBs review and monitor research involving human subjects to ensure studies are ethical and protect participants' rights, safety, and privacy.

### Question 6
**What is the difference between spreadsheet software and database software for health data management?**

> **Spreadsheets** store data in tables and are best for small datasets and basic analysis (usually more error-prone and harder to scale). 
> 
> **Databases** manage larger and more complex datasets, usually with increased security, organization, and multi-user access, making them better for healthcare data management and other large health-related datasets.

### Question 7
**What is a conflict of interest in public health research?**

> A COI is an instance in which someone at any step in the research process has motivations or potential motivations which are different from the pure goal of the research. These are traditionally **monetary, career, or social incentives** wherein the individual could derive some excess utility.

### Question 8
**What is informed consent in the context of health data collection?**

> Informed consent in health data collection is the process of ensuring that individuals clearly understand how their health information will be collected, used, stored, and shared before they agree to participate. It requires that participation is **voluntary** and that individuals are given enough information about risks, benefits, and privacy protections to make an informed decision.

### Question 9
**What is data masking in the context of de-identification of health data?**

> The process of **replacing or removing identifiable information** within health data with random characters or blanks to protect participant privacy.

### Question 10
**What is the principle of $k$-anonymity in the context of de-identification of health data?**

> The idea that if individual data is bracketed into groups with at least $k$ members, then any one individual will be impossible to identify.

### Question 11
**What is the main concern about conflicts of interest in public health research?**

> The primary concern about COIs is that they **bias research findings and decision-making**, potentially leading to inaccurate evidence and harmful policies or interventions based on said evidence. They can also reduce public trust in health research and institutions if people believe results are influenced by personal, financial, or professional gain.

### Question 12
**What is the first step in managing conflicts of interest in public health research?**

> **Transparency**: all COIs should be disclosed at the outset of the research and updated throughout the process.

### Question 13
**What is the role of independent oversight in managing conflicts of interest in public health research?**

> To search for, review, and evaluate any potential conflicts of interest of the researchers. It is functionally just more scrutiny to prevent COIs.

### Question 14
**What is one strategy for mitigating conflicts of interest in public health research?**

> **Education**: researchers and third parties should be formally trained to identify and manage conflicts of interest.

### Question 15
**What is the potential impact of not properly managing conflicts of interest in public health research?**

> The results of the research may be fraudulent, in which case interventions developed based on said research could have **serious efficacy or safety concerns**.

AI acknowledgment: 

VS Code's github copilot extension, using the Claude Opus 4.6 Model was used to enhance the visual quality of this notebook. NOTE: None of the actual writing was changed. The following prompt was given: "without altering the actual content or organization of this notebook, increase the aestheticness of this notebook."