# Day 4: Pre-class Assignment

---


### <p style="text-align: right;"> &#9989; Put your name here</p>

## Introduction to Data Ethics: Data and Algorithmic Bias

The following is a series of important points to keep in mind whenever you are thinking in data science terms, regardless if you are focused on biology or not.

<img src="https://imgs.xkcd.com/comics/selection_bias.png" style="display:block; margin-left: auto; margin-right: auto; width: 30%" alt="A comic depicting a lecturer showing that everybody already knows the term sampling bias; however, the lecture happens at a statistics conference.">
<p style="font-size:0.85em; text-align: center;">Credits: <a href="https://xkcd.com/2618/" target="_blank">xkcd</a></p>

### Learning goals for today's pre-class assignment

* Identify how bias occurs in data and algorithms
* Understand the impact data and algorithmic bias has on people
* Apply practices to look for and minimize bias in your work and others
* Construct your personal academic integrity statement
* Practice with lists and loops

### Assignment instructions

Read this notebook, watch the videos below and complete the assigned programming problems.  Please get started early, and come to office hours if you have any questions!

**This assignment is due by 11:59 p.m. the day before class** and should be uploaded into the "Pre-Class Assignments" dropbox folder for Day 3.  Submission instructions can be found at the end of the Notebook.

----

# 1. Introduction

Data and algorithms are everywhere, and we encounter them everyday. Streaming services use the information you've provided on previous shows and movies you've watched to give more accurate recommendations. Advertisements are customized to us based on our search histories. As students dealing with data and constructing your own algorithms, you'll have even more involvement in these processes. 

&#9989;&nbsp; **Question**: 

**Give an example when you've interacted with data or algorithms outside of this class:**

<font size="+3">&#9998;</font> *Write your response here*

We give a lot of power to data and algorithms. Perhaps you've heard someone say, "look at the data/numbers," or "it's just fact." Data-driven and evidence-based thinking is a very important skill that can lead to well-informed, insightful decisions. However, we must also recognize limitations data may have. 

While using data, it is important to ask ourselves:

1) **Who collected this data, and do they have a motivation to highlight a certain perspective?**
  * This is similar to watching the news- each network has it's own bias and leanings. Several news channels might tell the same news story very differently. It is our job to tease out the most complete story we can. Data is not neutral.
  
2) **Who/What does this data exclude?**

  * Often times people, regions, and species who have been historically marginalized find themselves erased in data. For example in plant science, when analyzing ~300,000 published papers between 2000 and 2020, we can find striking geographical biases that are correlated with national affluence. Gender imbalances were also evident, with far more papers led by authors with masculine names than those by authors with feminine names. Last, there are substantial taxonomic sampling gaps. The vast majority of surveyed studies focused on major crop and model species, and the remaining biodiversity accounted for only a fraction of publications [(Marks *et al* 2023)](https://doi.org/10.1073/pnas.2217564120). 
 


---

# 2. What is data bias?

<img src="https://images.prismic.io/sketchplanations/f2fdb7cb-f126-4897-ad78-4fd11c743172_SP+723+-+Sampling+bias.png" style="display:block; margin-left: auto; margin-right: auto; width: 40%" alt="A comic depicting that data shows that 100% of the people who respond surveys did respond the sent survey.">
<p style="font-size:0.85em; text-align: center;">Credits: <a href="https://sketchplanations.com/sampling-bias" target="_blank">sketchplanations</a></p>

Bias is defined to be "*prejudice in favor of or against one thing, person, or group compared with another, usually in a way considered to be unfair.*" Data bias occurs when parts of a dataset are overemphasized, underemphasized, or are completely nonexistent. 

---

&#9989;&nbsp;**Task 1**:

**Watch the video below and answer the reflection questions below.**

In [None]:
# Imports the functionality that we need to display YouTube videos in a Jupyter Notebook.  
# You need to run this cell before you run ANY of the YouTube videos.

from IPython.display import YouTubeVideo  

# Video on how algorithms spread bias
YouTubeVideo("1z9KsNoAmFA",width=640,height=360)  

**Write a paragraph reflecting on the video. Be prepared to discuss these videos and your reflections in class. Consider answering the questions below, but you are not limited to them:**

1. Which example of data and algorithmic bias was most impactful and surprising to you? Why?

2. If you were explaining this video to a friend who hadn't watched it, what would you tell them? What are the major takeaways?

3. Will this change how you engage with data and algorithms going forward? If so, why and how? If not, why not?

4. What is something you can do to fight against bias in algorithms? Both as a user of algorithms, and someone who could help create algorithms in the future.

<font size="+3">&#9998;</font> *Write your response here*

**Data bias can lead to well intentioned algorithms outputting biased results. When those results and biased data are used in the algorithm, it perpetuates a cycle of bias.**

&#9989;&nbsp;**Task 2**:

Choose **at least one** of the examples below of real algorithmic bias. Some of them were actually referred to in the TEDx Talk above. 

* [Amazon hiring Algorithm bias](https://www.theguardian.com/technology/2018/oct/10/amazon-hiring-ai-gender-bias-recruiting-engine)
* [Health Care risk algorithm bias](https://www.scientificamerican.com/article/racial-bias-found-in-a-major-health-care-risk-algorithm/)
* [Twitter Chatbot turning racist](https://spectrum.ieee.org/in-2016-microsofts-racist-chatbot-revealed-the-dangers-of-online-conversation)
* [Predictive Policing](https://www.latimes.com/california/story/2022-07-04/researchers-use-ai-to-predict-crime-biased-policing)
* [Health Insurance Premium Increases](https://www.propublica.org/article/health-insurers-are-vacuuming-up-details-about-you-and-it-could-raise-your-rates)
* [How Uber Profits While Its Drivers Aren't Earning Money](https://www.vice.com/en/article/wnxd84/how-uber-profits-even-while-its-drivers-arent-earning-money)

Based on what you read, try addressing **at least three** of the questions below:

* How is data being used?
* How does the actual usage of data relate to its intended usage?
* Who owns and/or controls the data?
* Who benefits from the data usage?
* How is the data usage related to bias?

<font size="+3">&#9998;</font> *Write your response here*

---

# 3. Craft Your Personal Academic Integrity Statement

After spending some time thinking about data and algorithm bias and some of the ethical implications of that bias, it's worth spending sometime thinking about how you, personally, are going to approach your development as a scientist, especially in the context of your work in this course.

As you work to develop your computational skills and learn to write evermore complex code needed for evermore complex experimental setups. You will like find yourself searching the internet for help. **This is a completely authentic part of data science**. However, **it is important that you use the resources you find on the internet in transparent and honest ways**. This includes being thoughtful about how to give credit to the code authors and websites you lean on when you need to figure out something new.

Along these lines, Mizzou has enacted a [Standard of Conduct for Academic Integrity](https://oai.missouri.edu/students/). It is important for you to be aware of this standard as it acknowledges some of our shared code of ethics as members of MU. Academic integrity is the foundation for university success and future success. Learning how to express original ideas, cite works, work independently, and report results accurately and honestly are skills that carry students beyond their academic career.

---

**[Mizzou's Academic Integrity Honor Pledge](https://oai.missouri.edu/about/academic-integrity/honor-pledge/):**

*I strive to uphold the University values of respect, responsibility, discovery, and excellence. On my honor, I pledge that I have neither given nor received unauthorized assistance on this work.*

---



&#9989;&nbsp; **Task 3: Your pledge**

In the cell below, craft a personal statement of commitment to academic honesty and integrity. As part of this statement, address the following components:

1. Why is integrity important to you?
2. What values motivate the work that you do?
3. Commitment to conducting yourself with integrity
4. Acknowledgement that you are aware of [Mizzou's ethical standards for integrity](https://oai.missouri.edu/students/).

**IMPORTANT:** This personal integrity statement will be placed on all of your major homework and exams. (You will be asked to paste your statement into each assignment as an acknowledgement of your committment to ethical behavior).

Your personal statement may share elements with those of your peers, but it should also, ideally, be unique to you. Try to make something that has personal meaning!

---------------
<font size="+3" color="#009600">&#9998;</font> *Put your statement here*

I, `_________`, commit to `_______`

-------------


If you need a starting place, here is a sample personal statement:

**Integrity Pledge:**

*I, [name], value the opportunity to receive a collegiate education.  Because of this value and the sacrifices of people who have made this possible for me, I commit to studying to the best of my ability, submitting work that is my own, and citing sources when I receive help.  I acknowledge I am aware of the University of Missouri policy concerning academic honesty, plagiarism, and cheating.*

Additionally, there are numerous examples of such statements on the [internet](https://www.google.com/search?client=firefox-b-1-d&q=personal+academic+integrity+statement). You may find them useful for inspiration.

---

# 4. More Practice With Variables, Lists, and Loops

## (Not required, but could be useful for building your skills and providing additional preparation for class)

If you have some extra time and want some extra practice building on your new Python skills, you are encouraged to work through the following examples and exercises. **It is not required that you complete this section to get credit for this pre-class assignment.** However, **you will be writing more lists and loops in class for Day 4**, so if you feel like you need to spend some time practicing this, you may wish to do so.

### REMINDER on Generative AI Usage

To ensure that you are starting to build a strong basis in foundational concepts, please do not use Generative AI (chatGPT, Dall-E, Claude, Co-pilot, etc.) at this time. We will introduce how to use them in support of your learning soon!

However, feel free to post in Slack, talk with your peers and instructors, and use the resources below:
* [Stack Overflow](https://stackoverflow.com/)
* [r/learnpython](https://www.reddit.com/r/learnpython/)
* [pythontutor](https://pythontutor.com/python-compiler.html#mode=edit)
* [W3Schools](https://www.w3schools.com/python/default.asp)
* [programiz](https://www.programiz.com/python-programming)

## 4.1 Variables

Review the following code for examples of how variables can be defined, used, and manipulated.

In [None]:
int_var = 3   # Integer variable
float_var = 15.75  # floating point variable
str_var = 'Truman the Tiger'  # string variable

print('1:', 'An integer plus a float works in python:',int_var+float_var)

#You can not do math with strings, but you can concatenate strings (if you turn your variables into strings first)
new_str_var = str_var +' has won '+str(int_var)+' Best Mascot National Championships.'
print('2:',new_str_var)

# or you can just use a print statement with commas to make meaningful debugging and result statements
print('3:',str_var,'is',float_var+int_var,'times better than Big Jay.')

print('4: The value of int_var:', int_var)

&#9989;&nbsp; **Task 4**

Write a print statement that concatenates all of the following strings to show the complete quote

In [None]:
q1 = "including those at MU games, hospitals, schools, community events and campus gatherings."
q2 = 'Truman was first acclaimed the "Best Mascot in the Nation" in 2004'
q3 = "In a typical year, Truman makes more than 400 appearances, "
q4 = "and repeated the honor in 2014 and 2024."

In [None]:
# put your code here

## 4.2 Lists

A list stores a series of items in a particular order. You access items using an index, or with a `for` loop (ex: `for val in list:`)

In [None]:
list_ex = []   # initialize an empty list
list_ex.append('Truman the Tiger')  # append an item to a list
list_ex.append('The Columns')
list_ex.append('Mizzou')
list_ex.append('Columbia')
print('Print 1:',list_ex)  # print contents of variable or whole blist
list_ex.remove('Mizzou')  # remove specific entry from list, but only first entry with this value
print('Print 2:',list_ex)  # print contents of variable or list
list_ex.append('Show-Me State')
print('Print 3:',list_ex)
print('Print 4:',list_ex[3])  # print the 4th value in the list 'list_ex'

**Note:** An important concept with lists is that they have values stored at specific indexes.  It is important to remember the idea of an **Index** (which is the location) and the **Value** (which is the value of the single variable at that index).  

<img src="https://railsware.com/blog/wp-content/uploads/2018/10/positive-indexes.png" style="display:block; margin-left: auto; margin-right: auto; width: 70%" alt="A comic depicting that data shows that 100% of the people who respond surveys did respond the sent survey.">
<p style="font-size:0.85em; text-align: center;">Credits: <a href="https://railsware.com/blog/python-for-machine-learning-indexing-and-slicing-for-lists-tuples-strings-and-other-sequential-types/" target="_blank">railsware.com</a></p>

To access an element by its index we need to use square brackets.



In [None]:
# Example of Values and Indexes
index = 1
print(list_ex[index],'is the value at the', index, 'index.')

## 4.3 Loops

So far, we have learned:
- `for` loops  (repeats a block of code the number of times described in the "for" statement)
- `while` loops (repeats a block of code as long as a certain condition is true.)

In [None]:
# First Loop Type 
for value1 in list_ex:   # loop through all the entries in list "list_ex"
    print('Current entry in variable value is:', value1)  # for each iteration, variable named "value1" 
                                                        #      will be assigned the next entry in "list_ex"

In [None]:
# Second Loop Type
for index1 in range(len(list_ex)): # loop through integers from 0 to length of list "list_ex"
                                # for each iteration, variable named "index1"
                                #      will be assigned the next integer in 0 to length of list "list_ex"
    str_now = list_ex[index1]   # assign a variable the content of the index1-th entry of list "list_ex"
    print('The',index1,'entry in list_ex is',str_now)

In [None]:
# Third Loop Type
index1 = 0
while index1 < len(list_ex):  # perform a while loop until index1 is equal to or greater than the length of list "list_ex"
    str_now = list_ex[index1]
    print('The',index1,'entry in list_ex is',str_now)
    index1 += 1               # increment whatever is in index1 by +1
    # Note this is the identical result as the for loop in cell above

&#9989;&nbsp; **Task 5**

Write a loop using one of the types above that prints the entries in list_ex in **reverse order**. There is more than one way to tackle this problem!

In [None]:
# Put your code here

&#9989;&nbsp; **Task 5 (continued)**

If you were able to successfully print the list in reverse order, describe how you came up with your solution. If not, describe where you are stuck and what you have tried so far.

<font size="+3">&#9998;</font> *Write your response here*

---
## Follow-up Questions

Copy and paste the following questions into the appropriate box in the assignment survey include below and answer them there. (Note: You'll have to fill out the section number and the assignment number and go to the "NEXT" section of the survey to paste in these questions.)

1. In your own words, how would your define algorithmic bias?

2. What is one example of something we can do as either users or creators of algorithms and data to help avoid algorithmic bias?

3. How are you feeling about your ability to work with lists and loops in Python?

---

### Assignment wrap-up

Please fill out form from the link below. You must log-in using your MU credentials. **You must completely fill this out in order to receive credit for the assignment!** 

#### https://forms.office.com/r/37zmzq3PT8

In [None]:
# Click on the link above if this cell fails to produce a survey form.

from IPython.display import HTML
HTML(
"""
<iframe 
	src="https://forms.office.com/r/37zmzq3PT8" 
	width="800px" 
	height="600px" 
	frameborder="0" 
	marginheight="0" 
	marginwidth="0">
	Click the link above if this cell fails to produce a survey
</iframe>
"""
)

### Congratulations, you're done!

Submit this assignment by uploading it to the course Canvas web page.  Go to the "Pre-class assignments" folder, find the appropriate submission folder link, and upload it there.

See you in class!

Material drawn with permission from:
<br>
&#169; Copyright 2023. Department of Computational Mathematics, Science and Engineering at Michigan State University 

Adapted for:
<br>
&#169; Copyright 2026,  Division of Plant Science & Technology&mdash;University of Missouri