<a href="https://colab.research.google.com/github/frank-895/machine_learning_journey/blob/main/ethics/AI_ethics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Science Ethics

These are my notes about data science ethics, and area I consider important especially when studying groundbreaking technologies like machine learning.

These notes are from the FastAI bonus chapter on ethics, specifically this YouTube Video: [Ethics for Data Science](https://www.youtube.com/watch?v=krIVOb23EH8).

This notebook focuses on current ethical dilemmas surrounding data science - not future ethical problems. While technology has a done a lot of good for the world, it has also done a lot of bad and their are lots of examples of unethical behaviour in the past and in the present, without having to look to the future.

## What is Ethics?

> Ethics is the discipline dealing with what is good and bad; a set of moral principles.

Ethics is not fixed, like religion or law.
1. It is a set of well-founded standard of right and wrong, prescribing what humans ought to do.
2. It is the study and development of one's ethical standards.

3 case studies to be aware of:
1. **Feedback loops**: "when your model is controlling the next round of data you get. The data that is returned quickly becomes flawed by the software itself."

Feedback loops are common with recommendation systems, as they return what the user likes... but also what they are exposed to. It can reinforce and recommend damaging videos/articles/images etc.

2. **No reporting**: Systems can go wrong - it is important their is reporting and feedback systems in place, to allow errors to be identified and addressed.

3. **Bias**: Shows up in advertising all the times. Ad systems can discriminate by race and gender.

## Why is ethics in data science important?

Data collection has played a pivotal role in genocides, including the holocaust.

IBM used data science to decide whether people where Jewish and whether they should be executed. They produced computer systems that were used for genocide - these machines required constant maintenance and an ongoing relationship between user and vendor.

This is an important reminder how technology can be used for harm and why it is critical that ethics are considered when using technology.

## Unintended Consequences

How could my tech be used in the wrong hands? Could it be used by harassers, authoritarian governments, or for propaganga/disinformation.

## Recourse and Accountability

Errors in databases and datasets can have a devestating impact on people's lives. They can also be misused, such as confidential information being abused.

## Feedback Loops and Metrics

> Reliance of metrics is a fundamental challenge for AI.

Choosing appropriate metrics is very important when building an AI model, as deep learning is very very effective at optimising metrics. While this is the strength of deep learning, it is also a fundamental challenge, as inappropriate metrics can have a devastating impact.

**Overemphasising metrics** can lead to:
- manipulation
- gaming
- focus on short-term goals
- unexpected negative consequences

Let's have a look at an example, from the 2000s when the UK started focusing intensely on numbers to improve performance in the healthcare system. This project was called "*What's measured is what matters*".

One of the metrics was around emergency department (ED) wait times. By **overemphasising** this metric, the following issues occured:
1. Scheduled operations were cancelled to draft extra staff to ED.
2. Patients were required to wait in queues of ambulances.
3. Stretchers were turned into "beds" by putting them in hallways.
4. There were big discrepancies in numbers reported by hospitals vs by patients.

This is an example of **gaming** occuring due to metrics. The healthcare system did not actually improve, if anything it got worse, as the people in charge were manipulating processes to optimise a single metric.

An essay grading software had similar issues in America. Metrics for grading an essay included sentence length, vocabulary, spelling, subject-verb agreement - because these are metrics that are easy to measure. Therefore, it was not possible for the software to measure hard-to-quantify qualitities like creativity.

As such, gibberish essays with sophisticated words scored best - an example of poorly chosen metrics not representative of what a **good** submission **actually** looks like.

Goodhart's Law is an important reminder why not to over-rely on metrics:

> "When a measure becomes a target, it ceases to be a good measure."

A metric is just a proxy for what you care about.

## Feedback Loops

Our online environments are susceptible to feedback loops.

For example, recommendation systems use watch time as a **proxy** for how interested we are in something. This leads to conspiracy content performing well, as it encourages it viewers to keep "uncovering" more "information". This was not an intended consequence when recommendation algorithms were originally built, but an unintended consequence now widely exploited.

Our online environments are designed to be addictive and content creators are always trying to **game** the metrics to improve their performance. This makes choosing appropriate metrics even harder.

A good quote from James Grimmelman:
> "These platforms are structurally at war with themselves. The same characteristics that make outrageous & offensive content unaccpetable are what make it go viral in the first place."

In this way, disinformation is built into modern tech companies and into their business models.

## How is speed/hypergrowth related to data ethics?

- Super-fast growth requries automation & reliance on metrics.
- Prioritising speed above all else doesn't leave time to reflect on ethics.
- Problems happen or surface on a large scale if the company grows too quickly.

## Bias

Commercial computer vision products perform significantly better on men and on white people. This research was conducted on several large commerical products and they all showed this significant bias.

**What is the source of this problem?**
- Generally, unrepresentative datasets which were primarily built on white men. When the benchmark contains bias, this will be perpetuated on a larger scale, as the algorithm will optimise to this biased dataset.
- Blackbox algorithms can be trained on many variables and cannot be analysed to check for bias.
- Generally, bias in technology is sourced from bias in real-life - but, it has the potential to amplify it, especially if algorithms are trained to optimise biased metrics or benchmarks.

Historical bias is:
> "a fundamental, structural issue with the first step of the data generation process and can exist even given perfect sampling and feature selection."

An example of this is with the [COMPAS recidivism algorithm](https://www.theatlantic.com/technology/archive/2018/01/equivant-compas-algorithm/550646/) used in the US to predict whether someone will re-offend to decide if they should pay bail. This algorithm was found to not only be supremely racist but also to be no more effective then guessing. It was upheld even after extensive research demonstrating its flaws.

Measurement bias is:
> when data collection methods systematically distort the true values of what is being measured.

An example of this is in this paper: [Does Machine Learning Automate Moral Hazard and Error?](https://scholar.harvard.edu/files/sendhil/files/aer.p20171084.pdf) This paper discusses an algorithm suggested to predict a person's risk of stroke to improve efficiency in the ED. What they found was that a number of irrelevant factors where most predictive of stroke, like "accidental injury" and "colonoscopy".

*Why is this?* The researchers hadn't measured the chance of stroke, but the chance someone had symptoms, went to the doctor, got tests and recieved a diagnosis. And this is influenced by **MANY** other factors then just the chance of stroke including race, class, gender, and health insurance.

Humans are very biased, see [these researched and peer reviewed examples](https://www.nytimes.com/2015/01/04/upshot/the-measuring-sticks-of-racial-bias-.html) of racial bias:

>"When doctors were shown patient histories and asked to make judgments about heart disease, they were much less likely to recommend cardiac catheterization (a helpful procedure) to black patients"

> "When whites and blacks were sent to bargain for a used car, blacks were offered initial prices roughly $700 higher, and they received far smaller concessions."

**If humans are biased, why does algorithmic bias matter?**
1. **Machine learning can amplify bias** - [Bias in bios](https://arxiv.org/abs/1901.09451) showed that the gender imbalance in medicine was amplified when asking an algorithm to predict a person's job title.
2. **Algorithms are used differently than human decision makers** - people are more likely to assume algorithms are objective, algorithms are more likely to implemented with no appeals process, algorithms are often used at scale, and algorithmic systems are cheap.
3. **Machine learning can create feedback loops**.
4. **Technology is power. And that comes with responsibility**.

## What can we do as engineers?

- Vet the company you're joining for their ethics. We normally have lots of options and we can use our skill as leverage.
- While pressure from management might give us some leeway for unethical behaviour, it is important to be personally accountable for our actions and the harms it may cause.
- Talk to experts **and** people directly impacted by technology. Get feedback before and after release.