In [1]:
#hide
! [ -e /content ] && pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

# Data Ethics

## Chapter notes

Centrality of metrics in driving a financially important system.

Meetup’s algorithm -> a company not just unthinkingly optimize a metric but considering its impact.

Quote from Evan Estola:
> “You need to decide which feature not to use in your algorithm... the most optimal algorithm is perhaps not the best one to launch into production."

Anticipate feedback loops, and “take positive action to break it when you see the first signs of it in your own projects.”

Ethics is complicated and context-dependent. It involves the perspectives of many stakeholders. Ethics is a muscle that you have to develop and practice. 

One very natural reaction to considering these issues is: "So what? What's that got to do with me? I'm a data scientist, not a politician. I'm not one of the senior 

executives at my company who make the decisions about what we do. I'm just trying to build the most predictive model I can."

No one is better placed to inform everyone involved in this chain about the capabilities, constraints, and details of your work than you are. – call to arms. Encourage ethical behaviour at the time.

Now, as you are collecting your data and developing your model, you are making lots of decisions. What level of aggregation will you store your data at? What loss function should you use? What validation and training sets should you use? Should you focus on simplicity of implementation, speed of inference, or accuracy of the model? How will your model handle out-of-domain data items? Can it be fine-tuned, or must it be retrained from scratch over time?

Data scientists need to be part of a cross-disciplinary team. And researchers need to work closely with the kinds of people who will end up using their research.
It's the kind of work that tends to be highly appreciated by senior executives, even if it is sometimes considered rather uncomfortable by middle management.

*The statement above was an interesting read in the context of the firing of Timnit Gebru by Google (on the same day in 2020).*

## Lecture notes

### Feedback loops

Can occur when your model controls the next round of data you get.

Key difference between academic science and/or social science vs. community practice -> an awareness that you are an actor within the system. Scientific perspective is that you are observing data, while in machine learning the model you build and the product it sits within is likely interacting with the real world. And therefore the model affects what the data looks like. That is a key point for my own PhD research.

### Getting specific about bias

Centre for Applied Ethics – focused on immediate harms, not a long time in the future (i.e. AI ethics)

Harini Suresh and John V Guttag ‘A Framework for Understanding Unintended Consequences of Machine Learning’, or ‘The Problem with “Biased Data”’ blog post.

Clean, simplified, and logical abstractions that computer scientists deal with.

Kristian Lum, Elizabeth Bender and Terrence Wilkerson -> see FAccT paper ‘Translating to Computer Science’. In the paper Does Machine Learning Automate Moral Hazard and Error why is sinusitis found to be predictive of a stroke? -> people that utilise health care a lot go in when they have sinusitis, and stroke. We are not measuring stroke, we are measuring health care use.

Mitigate against measurement bias -> subtle issue, could it be improved through system dnamics. A single black member of a jury -> even a tiny bit of diversity improves bias.

### Difference between humans and machines

How are machines and people different, in terms of their use for making decisions?
* ML can create feedback loops
* ML can amplify bias.
* Algorithms used very differently to humans in practice.
* People assume algorithms are objective or error-free
* Often used without appeals process
* Often used at scale
* Algorithmic systems are cheap

See Cathy O’Neils comments in WMD book -> privileged are processed by people, poor are processed by algorithms.
* Technology is power -> with that comes responsibility.
Domain experts important to find when developing algorithms, for feedback. But how should they be engaged?

Questions to ask:
* Should we do this?
* What bias is in the data? All data contains bias.
* Can the cost and data be audited?
* Really important to investigate error rates for different sub-groups.
* What is the accuracy of a simple rule-based alternative (a baseline)
* What processes are in place to handle appeals or mistakes?
* How diverse is the team that built it?

### Ethical Foundations

Platform neutrality -> no given design decisions.
Markkula Centre – see consequentialist questions:
* who will be directly and indirectly affected by the relevant technology
* [See others at 1:35:00 in Lesson 5 of fast.ai course]

Expanding the ethical circle:
* See [Markkula Centre document](https://www.scu.edu/ethics-in-technology-practice/ethical-toolkit/)
* [Diverse Voices Methodology](https://techpolicylab.uw.edu/project/diverse-voices/)

### Role of policy

See datasheets for datasets document -> stories of regulation of three industries (car safety, electronics industry, other). See 99% invisible podcast for car safety podcast.

## Questionnaire

1. Does ethics provide a list of "right answers"?
1. How can working with people of different backgrounds help when considering ethical questions?
1. What was the role of IBM in Nazi Germany? Why did the company participate as it did? Why did the workers participate?
1. What was the role of the first person jailed in the Volkswagen diesel scandal?
1. What was the problem with a database of suspected gang members maintained by California law enforcement officials?
1. Why did YouTube's recommendation algorithm recommend videos of partially clothed children to pedophiles, even though no employee at Google had programmed this feature?
1. What are the problems with the centrality of metrics?
1. Why did Meetup.com not include gender in its recommendation system for tech meetups?
1. What are the six types of bias in machine learning, according to Suresh and Guttag?
1. Give two examples of historical race bias in the US.
1. Where are most images in ImageNet from?
1. In the paper ["Does Machine Learning Automate Moral Hazard and Error"](https://scholar.harvard.edu/files/sendhil/files/aer.p20171084.pdf) why is sinusitis found to be predictive of a stroke?
1. What is representation bias?
1. How are machines and people different, in terms of their use for making decisions?
1. Is disinformation the same as "fake news"?
1. Why is disinformation through auto-generated text a particularly significant issue?
1. What are the five ethical lenses described by the Markkula Center?
1. Where is policy an appropriate tool for addressing data ethics issues?

### Further Research:

1. Read the article "What Happens When an Algorithm Cuts Your Healthcare". How could problems like this be avoided in the future?
1. Research to find out more about YouTube's recommendation system and its societal impacts. Do you think recommendation systems must always have feedback loops with negative results? What approaches could Google take to avoid them? What about the government?
1. Read the paper ["Discrimination in Online Ad Delivery"](https://arxiv.org/abs/1301.6822). Do you think Google should be considered responsible for what happened to Dr. Sweeney? What would be an appropriate response?
1. How can a cross-disciplinary team help avoid negative consequences?
1. Read the paper "Does Machine Learning Automate Moral Hazard and Error". What actions do you think should be taken to deal with the issues identified in this paper?
1. Read the article "How Will We Prevent AI-Based Forgery?" Do you think Etzioni's proposed approach could work? Why?
1. Complete the section "Analyze a Project You Are Working On" in this chapter.
1. Consider whether your team could be more diverse. If so, what approaches might help?

## Deep Learning in Practice: That's a Wrap!

Congratulations! You've made it to the end of the first section of the book. In this section we've tried to show you what deep learning can do, and how you can use it to create real applications and products. At this point, you will get a lot more out of the book if you spend some time trying out what you've learned. Perhaps you have already been doing this as you go along—in which case, great! If not, that's no problem either... Now is a great time to start experimenting yourself.

If you haven't been to the [book's website](https://book.fast.ai) yet, head over there now. It's really important that you get yourself set up to run the notebooks. Becoming an effective deep learning practitioner is all about practice, so you need to be training models. So, please go get the notebooks running now if you haven't already! And also have a look on the website for any important updates or notices; deep learning changes fast, and we can't change the words that are printed in this book, so the website is where you need to look to ensure you have the most up-to-date information.

Make sure that you have completed the following steps:

- Connect to one of the GPU Jupyter servers recommended on the book's website.
- Run the first notebook yourself.
- Upload an image that you find in the first notebook; then try a few different images of different kinds to see what happens.
- Run the second notebook, collecting your own dataset based on image search queries that you come up with.
- Think about how you can use deep learning to help you with your own projects, including what kinds of data you could use, what kinds of problems may come up, and how you might be able to mitigate these issues in practice.

In the next section of the book you will learn about how and why deep learning works, instead of just seeing how you can use it in practice. Understanding the how and why is important for both practitioners and researchers, because in this fairly new field nearly every project requires some level of customization and debugging. The better you understand the foundations of deep learning, the better your models will be. These foundations are less important for executives, product managers, and so forth (although still useful, so feel free to keep reading!), but they are critical for anybody who is actually training and deploying models themselves.