# AI Incidents

## Introduction

The advent of artificial intelligence has brought about profound transformations across numerous industries, ushering in a new era of possibilities. However, alongside the immense benefits, it is imperative to address the potential risks and challenges associated with the deployment of AI systems. AI incidents, encompassing a wide range of issues such as algorithmic bias and unintended consequences, have emerged as critical concerns in recent times. Exploring these incidents and comprehending their implications is of paramount importance to foster the development of robust and ethically responsible AI systems.

The focus on this project is to explore available data, clean and wrangle it and ultimately visualise key findings such as:

- How many AI incidents happened over time?
- Who are the developers/deployers responsible for the AI system?
- What parties are affected the most?

For the EDA and the Data Cleaning/Preperation I will use Python, for the visualisation Tableau Desktop.

Data Source:
McGregor, S. (2021) Preventing Repeated Real World AI Failures by Cataloging Incidents: The AI Incident Database. In Proceedings of the Thirty-Third Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-21). Virtual Conference. https://incidentdatabase.ai/research/snapshots/

## EDA 

At first, I import the incidents dataset into a dataframe and get an idea about its contents.

In [1]:
import pandas as pd

incidents = pd.read_csv("incidents.csv")

incidents.head()

Unnamed: 0,_id,incident_id,date,reports,Alleged deployer of AI system,Alleged developer of AI system,Alleged harmed or nearly harmed parties,description,title
0,ObjectId(625763db343edc875fe639ff),1,2015-05-19,"[1,2,3,4,5,6,7,8,9,10,11,12,14,15]","[""youtube""]","[""youtube""]","[""children""]",YouTube’s content filtering and recommendation...,Google’s YouTube Kids App Presents Inappropria...
1,ObjectId(625763dc343edc875fe63a00),2,2018-12-05,"[139,141,142,143,144,145,146,148,149,150,151,1...","[""amazon""]","[""amazon""]","[""warehouse-workers""]",Twenty-four Amazon workers in New Jersey were ...,Warehouse robot ruptures can of bear spray and...
2,ObjectId(625763dc343edc875fe63a01),3,2018-10-27,"[372,373,374,375,376,377,378,379,380,381,382,3...","[""boeing""]","[""boeing""]","[""airplane-passengers"",""airplane-crew""]","A Boeing 737 crashed into the sea, killing 189...",Crashes with Maneuvering Characteristics Augme...
3,ObjectId(625763dc343edc875fe63a02),4,2018-03-18,"[629,630,631,632,633,634,635,636,637,638,639,6...","[""uber""]","[""uber""]","[""elaine-herzberg"",""pedestrians""]",An Uber autonomous vehicle (AV) in autonomous ...,Uber AV Killed Pedestrian in Arizona
4,ObjectId(625763dc343edc875fe63a03),5,2015-07-13,"[767,768,769,770,771,772,773,774,775,776,777,778]","[""hospitals"",""doctors""]","[""intuitive-surgical""]","[""patients""]",Study on database reports of robotic surgery m...,Collection of Robotic Surgery Malfunctions


For further visualization purposes, I decided to use Tableau. I need all columns except the reports one, so I will especially focus on the respective deployer, developer and harmed parties. The column titles are too clunky for data operations, and the variables are coded with extra chars [""]. 
Also, when there are at least two parties involved, the column contains each party as a different variable, which will hinder further data operations. For this reason, I want the deployer, developer and harmed parties columns to have one value, and create a new row with the same incident id and details for further values.


## Cleaning and Exporting


In [30]:
# Rename columns using the rename() method and assign the result back to the incidents DF
incidents.rename(columns={"Alleged deployer of AI system": "deployer",
                          "Alleged developer of AI system": "developer",
                          "Alleged harmed or nearly harmed parties": "harmed_parties"}, inplace=True)

## Seperating the multiple values in the columns

# Create a new DF to store modified rows
new_rows = []

# Iterate over each row in the original incidents DF
for _, row in incidents.iterrows():
    deployers = row['deployer'].split(',')  # Split deployer values
    developers = row['developer'].split(',')  # Split developer values
    harmed_parties = row['harmed_parties'].split(',')  # Split harmed_parties values
    
    # Create new rows for each combination of deployer, developer, and harmed_party values
    for deployer in deployers:
        for developer in developers:
            for harmed_party in harmed_parties:
                new_row = row.copy()  # Create a copy of the original row
                new_row['deployer'] = deployer.strip()  # Update deployer value
                new_row['developer'] = developer.strip()  # Update developer value
                new_row['harmed_parties'] = harmed_party.strip()  # Update harmed_parties value
                new_rows.append(new_row)  # Append the new row to the list

# Create the new DF with modified rows
cleaned_incidents = pd.DataFrame(new_rows)

# Get rid of the [""] in the values themselves

cleaned_incidents["deployer"] = cleaned_incidents["deployer"].str.replace("[\[\"\]]","", regex=True)
cleaned_incidents["developer"] = cleaned_incidents["developer"].str.replace("[\[\"\]]","", regex=True)
cleaned_incidents["harmed_parties"] = cleaned_incidents["harmed_parties"].str.replace("[\[\"\]]","", regex=True)


# Save it as .csv so I can feed Tableau with it

#cleaned_incidents.to_csv("cleaned_incidents.csv", index=False)

cleaned_incidents



Unnamed: 0,_id,incident_id,date,reports,deployer,developer,harmed_parties,description,title
0,ObjectId(625763db343edc875fe639ff),1,2015-05-19,"[1,2,3,4,5,6,7,8,9,10,11,12,14,15]",youtube,youtube,children,YouTube’s content filtering and recommendation...,Google’s YouTube Kids App Presents Inappropria...
1,ObjectId(625763dc343edc875fe63a00),2,2018-12-05,"[139,141,142,143,144,145,146,148,149,150,151,1...",amazon,amazon,warehouse-workers,Twenty-four Amazon workers in New Jersey were ...,Warehouse robot ruptures can of bear spray and...
2,ObjectId(625763dc343edc875fe63a01),3,2018-10-27,"[372,373,374,375,376,377,378,379,380,381,382,3...",boeing,boeing,airplane-passengers,"A Boeing 737 crashed into the sea, killing 189...",Crashes with Maneuvering Characteristics Augme...
2,ObjectId(625763dc343edc875fe63a01),3,2018-10-27,"[372,373,374,375,376,377,378,379,380,381,382,3...",boeing,boeing,airplane-crew,"A Boeing 737 crashed into the sea, killing 189...",Crashes with Maneuvering Characteristics Augme...
3,ObjectId(625763dc343edc875fe63a02),4,2018-03-18,"[629,630,631,632,633,634,635,636,637,638,639,6...",uber,uber,elaine-herzberg,An Uber autonomous vehicle (AV) in autonomous ...,Uber AV Killed Pedestrian in Arizona
...,...,...,...,...,...,...,...,...,...
526,ObjectId(646b2692c8905efc00635a19),537,2023-01-20,"[2991,2992]",scammers,unknown,destefanos-family,A mother in Arizona received a ransom call fro...,Mother in Arizona Received Fake Ransom Call Fe...
527,ObjectId(646b2e36c8905efc0066a5e9),538,2023-05-15,"[2993,2994,2995,2996,2997]",openai,openai,texas-aandm-university-students,A Texas A&M-Commerce professor informed his cl...,Texas A&M Professor Misused ChatGPT to Detect ...
528,ObjectId(646b42470cd415467ed122fb),539,2023-03-11,[3000],snapchat,snapchat,minors,Snapchat's ChatGPT-powered My AI was reported ...,Snapchat's My AI Reported for Lacking Protecti...
528,ObjectId(646b42470cd415467ed122fb),539,2023-03-11,[3000],snapchat,openai,minors,Snapchat's ChatGPT-powered My AI was reported ...,Snapchat's My AI Reported for Lacking Protecti...


I noticed that Facebook/Meta/Instagram are reported seperately, so I decided to put them all into one Umbrella, since it is the same entity.

In [31]:
# How many occurences are there?

#print(cleaned_incidents["developer"].unique())
dev_counts = cleaned_incidents["developer"].value_counts(normalize=False).to_dict()

print("Occurences of 'meta' in developers: " + str(dev_counts.get('meta',0)))
print("Occurences of 'facebook'  in developers: " + str(dev_counts.get('facebook',0)))
print("Occurences of 'instagram'  in developers: " + str(dev_counts.get('instagram',0)))



#cleaned_incidents.head(50) 
# FB has bought Instagram 2012
# FB became Meta Oct. 2021

Occurences of 'meta' in developers: 40
Occurences of 'facebook'  in developers: 194
Occurences of 'instagram'  in developers: 61


In [32]:
cleaned_incidents['developer'] = cleaned_incidents['developer'].replace({'meta': 'meta-fb-instagram', 'facebook': 'meta-fb-instagram', 'instagram': 'meta-fb-instagram'})

print(cleaned_incidents["developer"].value_counts())


meta-fb-instagram         295
unknown                   127
shotspotter               120
twitter                    87
youtube                    76
                         ... 
stephan-de-vries            1
serve-robotics              1
waymo                       1
allen-institute-for-ai      1
snapchat                    1
Name: developer, Length: 271, dtype: int64


In [33]:
depl_counts = cleaned_incidents["deployer"].value_counts(normalize=False).to_dict()
print("Occurences of 'meta' in deployers: " + str(depl_counts.get('meta',0)))
print("Occurences of 'facebook'  in deployers: " + str(depl_counts.get('facebook',0)))
print("Occurences of 'instagram'  in deployers: " + str(depl_counts.get('instagram',0)))

Occurences of 'meta' in deployers: 49
Occurences of 'facebook'  in deployers: 203
Occurences of 'instagram'  in deployers: 70


In [35]:
cleaned_incidents['deployer'] = cleaned_incidents['deployer'].replace({'meta': 'meta-fb-instagram', 'facebook': 'meta-fb-instagram', 'instagram': 'meta-fb-instagram'})

print(cleaned_incidents["deployer"].value_counts())

cleaned_incidents.to_csv("cleaned_incidents.csv", index=False)

meta-fb-instagram                     322
twitter                                87
youtube                                76
tesla                                  66
stability-ai                           56
                                     ... 
uk-department-of-work-and-pensions      1
metropolitan-police-service             1
boston-dynamics                         1
naver                                   1
wechat                                  1
Name: deployer, Length: 345, dtype: int64


In [36]:
print(cleaned_incidents["harmed_parties"].value_counts())

facebook-users                     79
twitter-users                      54
minority-groups                    40
linkedin-users                     38
instagram-users                    34
                                   ..
gao-yaning                          1
yoshihiro-umeda                     1
tumblr-content-creators             1
tumblr-users                        1
texas-aandm-university-students     1
Name: harmed_parties, Length: 780, dtype: int64


## Next steps (End of part 1)

I have a perfectly usable, easy to read dataset now that I can use for visualisation.

Part 2 (Visualisation): https://public.tableau.com/app/profile/marky.quarky/viz/AIincidentsFinal/X_Dashboard?publish=yes