<!--

# Final Report - Work in Progress
- Research Hypothesis / Questions:
    - Is Formula 1 fandom Toxic?
    - Are there specific groups that show more toxic behaviour then others?
    - Is the toxicity a "self-made" problem of Formula 1?
- APIs: Youtube
    - (Not reddit as post are often off topic especially during the off season, that we are currently in)
- Methods:
    - TBD
    - Dictionary
        - Formula 1 specific words that are toxic
        - racism / ethnic slurs -> [@ethnic_slurs]
        - toxicity -> [@orthrus-lexicon_orthrus_2022]
        - hate speech -> [@van_der_vegt_grievance_2021]
        - insults -> [@van_der_vegt_grievance_2021]
    - Transformer classifier
        - sentiment -> cardiffnlp/twitter-roberta-base-sentiment-latest [@tweet_sentiment_classifier]
        - racism -> jaumefib/datathon-against-racism [@hate_speech_classifier]
        - hate speech -> Hate-speech-CNERG/dehatebert-mono-english [@racism_classifier]
    - statistical analysis
        - group toxic behavior by drivers and teams
        - group by topics
            - topic modelling?
- Contents:
    - Introduction
        - What is Formula 1
        - Why do we need to analyze this
        - introduce the three research questions / hypothesis
    - Fundamentals
        - Formula 1
        - What is fandom
          - 
        - Defining toxic fan behavior
        - Youtube API
        - Maybe explaining the used methods?
    - Concept
        - What will be done
        - How will i be doing it
    - Creating the Dataset
        - Explain Dataset creation
    - Applying Method 1
    - Applying Method 2
    - Results

-->

# Analysing Toxicity in Formula 1 Fandom - Computational Analysis of Communications Final
Author: Leon Knorr

Matr-Nr: 1902854

## Introduction
Formula 1 is the highest class of international racing for open-wheel single-seater formula racing cars and is generally considered the most competitive, fastest and hardest class of motor racing. Since it’s first season in 1950, Formula 1 is visiting a diverse list of many different countries, where the best drivers in the world are racing against each other in teams of two drivers to determine the best driver and the best team on the Formula 1 grid [@about_f1]. These events are visited by thousands of Fans, with millions more following them on television and social media. With the 2021 season being one of the closest and most entertaining seasons in the history of Formula 1, where Red Bulls Max Verstappen beat Mercedes driver Lewis Hamilton in the grand finale of the season under controversial circumstances after a full season of controversy, drama and intense on track battles and with the release of Netflix Drive To Survive, Formula 1s popularity is growing rapidly. But, reports of Toxic and abusive Fan behavior at events and in comment sections on social media are accumulating, and casts an ugly shadow over Formula 1s latest successes [@woodhouse_scary_2022].
As the reports over toxic and abusive fan behaviours in social media and at live events are rising, Formula 1 as well as Fans and drivers are taking a stand against toxicity in the Formula 1 community. However, an independent and scientific analysis of this topic is missing and therefore the accusations are sort of hanging in the air without a solid scientific foundation. Therefore, in order to tackle this problem research into the toxicity of Formula 1 fandom is a necassety to gain valuable insights into understanding the problem, where it originates from and to build a foundation for future measures to make attending Formula 1 events as well as the media around it a safer and more enjoyable experience. To take the first step into this direction, this thesis will analyse Youtube comments of the Formula 1 channel in order to determine:

- If the Formula 1 fandom is toxic
- Are there specific groups that are more toxic then others?
- Is the toxicity a "self-made" problem of Formula 1 and where is the toxicity originating from?

## Fundamentals
In this chapter the necessary fundamental knowledge is presented.

### Formula 1
Formula 1 is the worlds most prestigous motor racing competition, as well as the world's most popular annual sporting series [@about_f1]. It marks the highest class of international open-wheel single-seater formula racing. The first Formula 1 competition was held in 1950, since then the competiton for the world drivers championship (wdc) which determines the worlds best driver and the world constructors championship (wcc) which determines the best team, is held annualy and is sanctioned by the Fédération Internationale de l'Automobile (FIA). During the competition (also called a season), Formula 1 visits a variety of different countries and racing tracks, each event (Grands Prix) is attended by thousands of people with millions watching from home [@formula_1_2023]. All rights of the Formula 1 brand and the competition itself is owned by Formula One World Championship Limited, which is a corporation, that provides media distribution and promotion services, besides that, it controls the contracts, distribtution, and commercial management of rights and licenses of formula 1 [@formula_1_limited_company_profile]. The term Formula 1 is used to describe the corporation, as well as the competition, as they can't exist without each other.

### What is Fandom
According to Cornel Sandvoss Fandom is a community of people that are regularly, consuming a given popular narrative or text with great emotional involvement [@toxic_fandom]. The members of the community are called fans, which is a short form of "fanatic" [@arouh_toxic_2020]. In other words, a fandom is a community of people that are fanatic about a popular narrative or text such as a tv series, movie franchise or sports.

Becoming a fan starts with the adoption of a fan identity about a fan object, thus fandom can be a powerful of defining the self. The fan object can be anything that people can be fanatic about, this may be a simple object such as trains or a virtual asset such as a movie franchise. Therefore, by taking part in a fandom, people are expressing themselfs through an identity they've chosen for themselfs. As a result, fans may lead to see the fan object as an extension of themselfs and thus react personally threatened if the fan object is facing a threat such as accusations etc [@toxic_fandom]. In addition to creating a strong part of their own identity, fans feel more connected or socialised through their fandom, as studies indicate, that even if fans don't interact with other members of a fan community, they still perceive themselfs as part of that community. Because of that, fans not only become personally invested in their fandom, they become socially invested as well [@toxic_fandom].

As a result of the strong connection fans build up to their fan object, the time-frame in which this self identity has been chosen is also playing a role. As an example, many people build a fandom in their childhood about a tv series, franchise or sport, this often leads to them feeling entitled to having their fan object preserved as they deem acceptable. This behaviour is also called fan entitlement. A good example for this behaviour are the news movies and series in the Lord of the Rings and Star Wars franchises, as most fan communities of these franchises have been outraged about the new characters and story lines, where many people claimed that this "ruined their childhood" [@toxic_fandom].

From an economic point of view, fandom and fan cultures are seen as the ideal costumers. They are eager to get their hands on the newest products and they are stable with re-occuring purchases, since intense consumption is considered a part of the fan identity [@arouh_toxic_2020].

### Defining Toxic Fan behaviour

### Youtube API

## Concept

## The Dataset

In [44]:
from dotenv import dotenv_values
import googleapiclient.discovery
import pandas as pd

api_keys = dotenv_values("keys.env")
api_service_name = "youtube"
api_version = "v3"
api_key = api_keys["YOUTUBE_API_KEY"]
max_results = 1000
youtube_api = googleapiclient.discovery.build(api_service_name, api_version, developerKey = api_key)

In [None]:
Formula1_official_channel = youtube_api.channels().list(part='snippet' ,forUsername='Formula1').execute()['items'][0]
videos_after_2020 = youtube_api.search().list(channelId=Formula1_official_channel["id"],
        maxResults=max_results,
        publishedAfter="2020-01-01T00:00:00Z",
        part='id').execute()
video_ids_after_2020 = [item['id']['videoId'] for item in videos_after_2020['items']]
while len(video_ids_after_2020) < max_results and "nextPageToken" in videos_after_2020.keys():
        videos_after_2020 = youtube_api.search().list(channelId=Formula1_official_channel["id"],
        maxResults=max_results,
        publishedAfter="2020-01-01T00:00:00Z",
        part='id',
        pageToken=videos_after_2020["nextPageToken"]).execute()
        video_ids_after_2020 = video_ids_after_2020 + [item['id']['videoId'] for item in videos_after_2020['items']]


In [None]:
df_list = []
for video_id in video_ids_after_2020:
    video_data = youtube_api.videos().list(part='snippet, statistics', id=video_id).execute()
    snippet = video_data['items'][0]['snippet']
    statistics = video_data['items'][0]['statistics']
    df_list.append(
    {
        "video_id":video_id,
        "title": snippet['title'],
        "description": snippet['description'],
        "channel": snippet['channelTitle'],
        "published_at": snippet['publishedAt'],
        "tags": snippet['tags'] if "tags" in snippet.keys() else None,
        "like_count": statistics['likeCount'],
        "favorite_count": statistics['favoriteCount'],
        "comment_count": statistics['commentCount'] if "commentCount" in statistics.keys() else 0
    })

videos = pd.DataFrame(df_list)
videos

In [46]:
video_ids_after_2020 = videos.video_id.to_list()
video_ids_after_2020

['FZjG3oft5rs',
 'iAn--HeOcfE',
 'GxPDQV1GGEI',
 'Ko4ylgDpABQ',
 'Mu-zHpC0Y58',
 'lX0rXb7gqrU',
 'xl3S8qMjx64',
 'Z8wPGQhw4Pg',
 '7rmSEh_v0Hk',
 'epufjHjbVVY',
 'Laz6i7970Ys',
 'KUbJ63SwnCo',
 'eioKgQUICjA',
 '6dg-1fTYMvA',
 's5VCm-YxrKQ',
 '-gSXfbAZQLY',
 'c4TPEGBdQtY',
 'Hbpyr1CWhBw',
 'xMWweGKHvpM',
 '5GfseCOTOHM',
 'jk1v9CKCJmQ',
 'MZUxwPHVpA0',
 'UeQ3yC7Du8A',
 'whGhECSp5LE',
 'suuTl7w5Hd8',
 'g156vj8y-oU',
 'RTVcVAxYuwQ',
 '7DzBLDgzFmk',
 'm8tGGT-N9AA',
 'fXxGYbiumr8',
 'jlSXQuVnHAE',
 '3DQqMtf_mTw',
 'W0yiezif3iU',
 'htQQwZA41rM',
 '12lU7KQNFq8',
 '18GzBdVJUlA',
 't4_dKbILzNU',
 'Hjg89OewDp8',
 'ZbGi5zC--jU',
 '9IQ-Zj8W20s',
 'bvGbEopdo84',
 'N-255QvCrWY',
 'ZFT_a00Z1D8',
 'A9scBg_Hs2k',
 'yjZfW-yTpZ4',
 'I1WEmbI12H4',
 '5daN3RDsP80',
 'IuzMtAieIFk',
 'uhrAgoHCoyo',
 'Sb3tUAAnjzI',
 'v0G051xP7Y4',
 '0GW-ROqLRTI',
 '_n_jYp5yigA',
 'ejKobaJwB6s',
 'adgoQxJA4Ek',
 'F2MWJz--Aug',
 'RhM_yNDPWjU',
 'mV51HqkmAjM',
 'YFWUKn6t_dw',
 '9uq1Y2MvBJY',
 '7L89zGBSmp8',
 '8HP-8Hn1U6I',
 '5Smtk_

In [47]:
df_list_comments = []
for video_id in video_ids_after_2020:
    if videos.loc[videos['video_id'] == video_id].comment_count.iloc[0] == 0:
        continue
    top_level_comments = youtube_api.commentThreads().list(part="snippet",
        maxResults=50,
        order="relevance",
        videoId=video_id).execute()['items']
    for top_level_comment in top_level_comments:
        replies = youtube_api.comments().list(part="snippet",
            maxResults=50,
            parentId=top_level_comment['snippet']['topLevelComment']['id']).execute()['items']
        df_list_comments.append(
        {
            "video_id": video_id,
            "id": top_level_comment['snippet']['topLevelComment']['id'],
            "text": top_level_comment['snippet']['topLevelComment']['snippet']['textDisplay'],
            "user": top_level_comment['snippet']['topLevelComment']['snippet']['authorChannelId']['value'],
            "like_count": top_level_comment['snippet']['topLevelComment']['snippet']['likeCount'],
            "published_at": top_level_comment['snippet']['topLevelComment']['snippet']['publishedAt'],
            "reply_count": top_level_comment['snippet']['totalReplyCount']
        })
        for reply in replies:
            df_list_comments.append(
            {
                "video_id": video_id,
                "id": reply['id'],
                "text": reply['snippet']['textDisplay'],
                "user": reply['snippet']['authorChannelId']['value'],
                "like_count": reply['snippet']['likeCount'],
                "published_at": reply['snippet']['publishedAt'],
                "reply_count": 0
            })

comment_df: pd.DataFrame = pd.DataFrame(df_list_comments)
comment_df

HttpError: <HttpError 403 when requesting https://youtube.googleapis.com/youtube/v3/comments?part=snippet&maxResults=50&parentId=UgzGiDkS5UhFiWKkCRV4AaABAg&key=AIzaSyBQgZnbvFl3B5qhwq8mNsLA3VnVbshF4IU&alt=json returned "The request cannot be completed because you have exceeded your <a href="/youtube/v3/getting-started#quota">quota</a>.". Details: "[{'message': 'The request cannot be completed because you have exceeded your <a href="/youtube/v3/getting-started#quota">quota</a>.', 'domain': 'youtube.quota', 'reason': 'quotaExceeded'}]">

In [None]:
videos.to_pickle("datasets/video_data.pkl")
comment_df.to_pickle("datasets/comment_data.pkl")

In [48]:
videos: pd.DataFrame = pd.read_pickle("datasets/video_data.pkl")
comment_df: pd.DataFrame = pd.read_pickle("datasets/comment_data.pkl")

In [43]:
videos

Unnamed: 0,video_id,title,description,channel,published_at,tags,like_count,favorite_count,comment_count
0,FZjG3oft5rs,Max Verstappen - On The Brink of 2021 Glory: A...,"On the brink of glory. At just 24 years old, M...",FORMULA 1,2021-12-03T19:46:38Z,"[F1, Formula One, Formula 1, Sports, Sport, Ac...",24442,0,1155
1,iAn--HeOcfE,How The Team Battle For 3rd Was Won | Jolyon P...,Jolyon Palmer digs deeper into how McLaren got...,FORMULA 1,2020-12-16T12:00:10Z,"[F1, Formula One, Formula 1, Sports, Sport, Ac...",4796,0,329
2,GxPDQV1GGEI,Top 5 Formula 3 Moments | 2020 Belgian Grand Prix,"For more F1® videos, visit http://www.Formula1...",FORMULA 1,2020-09-01T14:43:00Z,"[F1, Formula One, Formula 1, Sports, Sport, Ac...",3253,0,118
3,Ko4ylgDpABQ,Verstappen And Hamilton's Incident | Jolyon Pa...,Former F1 racer Jolyon Palmer looks back on th...,FORMULA 1,2021-12-07T18:45:00Z,"[F1, Formula One, Formula 1, Sports, Sport, Ac...",12448,0,3121
4,Mu-zHpC0Y58,New Driver And A New Engine For McLaren | 2021...,Third in the Constructor Standings last season...,FORMULA 1,2021-03-22T14:15:31Z,"[F1, Formula One, Formula 1, Sports, Sport, Ac...",10248,0,461
...,...,...,...,...,...,...,...,...,...
487,l-doHT5cxkA,F1 LIVE: Austrian GP Post-Race Show,Join us LIVE for the Post Race Show in Austria...,FORMULA 1,2021-07-04T15:40:45Z,"[F1, Formula One, Formula 1, Sports, Sport, Ac...",13143,0,609
488,V3w-0z-iOGw,"Title Contender Chaos, Stealing The P1 Board A...",It was a dramatic penultimate round of Formula...,FORMULA 1,2021-09-07T19:00:07Z,"[F1, Formula One, Formula 1, Sports, Sport, Ac...",2005,0,53
489,Xs2ogi1x0tw,"F1 Esports Pro Series 2020: FINAL RACE, Round 12",One more time! We head to Brazil for the final...,FORMULA 1,2020-12-17T21:27:29Z,"[F1, Formula One, Formula 1, Sports, Sport, Ac...",10865,0,134
490,v6asa9zegAs,Mercedes Launch their 2022 Car: The W13,Mercedes have been a force to be reckoned with...,FORMULA 1,2022-02-18T09:53:46Z,"[F1, Formula One, Formula 1, Sports, Sport, Ac...",18172,0,1088


### Dataset limitations

## Dictionary Analysis

### Othrus Lexicon for Toxicity

In [49]:
comment_df

Unnamed: 0,video_id,id,text,user,like_count,published_at,reply_count
0,FZjG3oft5rs,UgznjAR9SoXoE0gpoK54AaABAg,THAT SUPER MAX AT THE END WAS FANTASTIC,UCmCBpDZeM9LbTeWbMPW1N2A,4626,2021-12-03T19:48:05Z,48
1,FZjG3oft5rs,UgznjAR9SoXoE0gpoK54AaABAg.9VVbyYu52Uy9eDYvXWpXVD,@IIIlllIII I got cold,UCQtwk7iBtv0hTwedp13yDKA,0,2022-08-02T08:40:19Z,0
2,FZjG3oft5rs,UgznjAR9SoXoE0gpoK54AaABAg.9VVbyYu52Uy9Xuy0JALbx9,@Miz is Awesome oh yes.,UCuzQCuC86Pmhq9ixSff1z-A,0,2022-02-01T18:23:44Z,0
3,FZjG3oft5rs,UgwRx2dFc5pnjsGjCdh4AaABAg,Super Max at the Dutch GP and now Super Max us...,UCWW1z4S5ez9wKuCG0FfliDg,3038,2021-12-03T19:59:34Z,10
4,FZjG3oft5rs,UgwRx2dFc5pnjsGjCdh4AaABAg.9VVdHdnh2yK9W3Eu2Izk79,@TheConfidentNoob song was made in 2016,UCkJ_kGEIYAMwW4UoDBNrq5A,0,2021-12-17T15:50:24Z,0
...,...,...,...,...,...,...,...
2704,g5yQmp1ctXk,UgxgY_-JNRHqISd0XBJ4AaABAg.9iKEXyhtqsg9jp6bIKVrqo,Agree,UCY5xRnwZI_yXUme24Ho4ECQ,1,2022-12-19T14:40:15Z,0
2705,g5yQmp1ctXk,UgxgY_-JNRHqISd0XBJ4AaABAg.9iKEXyhtqsg9jOyrDWrGHk,Best comment I’ve seen 😂😂😂,UCBJBBdoGcKzkr-dtD7lgesw,2,2022-12-09T01:44:17Z,0
2706,g5yQmp1ctXk,Ugyrxo9tmlohfPOhLhl4AaABAg,Seb with ferrari was just something else for me ❤,UCKnlidGRJNAfAl5Pt32Na0A,648,2022-11-09T20:17:18Z,6
2707,g5yQmp1ctXk,Ugyrxo9tmlohfPOhLhl4AaABAg.9iDiO-cK_Ai9iYr9pQXAap,Red Bull + Vette =l &lt;3,UC64gizpEZp0zXXiNUSCSDJQ,1,2022-11-18T01:18:04Z,0


In [54]:
with open("dictionaries/toxic_words.txt") as toxic_words_file:
    set_of_toxic_words: set = set([word.strip() for word in toxic_words_file.readlines()])
set_of_toxic_words

{'coksucka',
 'garbage',
 'licking',
 'viagra',
 'asinine',
 'f*c*',
 'testical',
 'goofiest',
 'hogs',
 'cuntrag',
 'ignoramus',
 'dumbweazy',
 'degenerates',
 'predators',
 'piggish',
 'pisser',
 'orgasim',
 'mo-fo',
 'clitface',
 'degenerate',
 'incompetent',
 'poontang',
 'penis',
 'leftist',
 'mierda',
 'thug',
 'vacuous',
 'poopy',
 'niggers',
 'jizm',
 'boooobs',
 'dicksucking',
 'rapistinchief',
 'fashy',
 'nutters',
 'cockfucker',
 'gook',
 'hypocrisy',
 'atrocities',
 'horrific',
 'phonesex',
 'jackasses',
 'cocksucks',
 'fuxcking',
 'scrotum',
 'stfu',
 'bitches',
 'nazi-like',
 'feces',
 'trump-suckers',
 'meanspirited',
 'dope',
 'masterb8',
 'Goddamn',
 'shitty',
 'unpayable',
 'asslick',
 'deviant',
 'vapidity',
 'cocksniffer',
 'corupt',
 'cocksmoker',
 'bulls**tter',
 'bullsshit',
 'crooks',
 'jerkass',
 'goatse',
 'shitbagger',
 'illprepared',
 'racism',
 'incest',
 'daft',
 'fvcked',
 'moronic',
 'fagtard',
 'thoughtless',
 'isl',
 'crack',
 'cretins',
 'yuck',
 'suc

In [67]:
import numpy as np
from collections import Counter
toxic_word_counter: Counter = Counter()
toxic_word_count: list = []
for row in comment_df.text:
    toxic_words_in_comment: set = set(row.split(" ")).intersection(set_of_toxic_words)
    toxic_word_counter.update(toxic_words_in_comment)
    toxic_word_count.append(len(toxic_words_in_comment))
comment_df["toxic_word_count"] = toxic_word_count

In [69]:
comment_df.loc[comment_df["toxic_word_count"] > 0]

Unnamed: 0,video_id,id,text,user,like_count,published_at,reply_count,toxic_word_count
45,Z8wPGQhw4Pg,UgxXqdTRbrYXuxQQCVh4AaABAg.9CGSVWl_40q9CHkZgsp1SK,@Keisuke Takahasi ferrari needs some time at t...,UCiTlmx-EYVTmzQKDx4m4rgQ,1,2020-08-13T04:37:20Z,0,1
73,eioKgQUICjA,UgxcOZFJK8moiNCjIjp4AaABAg,The development of everything including the ca...,UC4rb1tv1SDYSkb2aV-b028g,256,2022-05-25T16:03:13Z,0,1
181,jlSXQuVnHAE,UgxurCAEQFF6vd1mye54AaABAg.9BKmPlNxJID9BMRaNqTUPt,@Rizaldi Ramdlani Pamungkas You weren&#39;t ev...,UCXv3XwEyoze6sutnSmNtmMw,0,2020-07-21T03:47:37Z,0,2
267,I1WEmbI12H4,UgwyhwIhgKevCF_hUyN4AaABAg.9AoJrrp9V0a9Aoki5Bw02a,God the 2019 French Grand Prix was dreadful,UCxzc_8degY9mHQp_LK46xdw,9,2020-07-07T16:30:00Z,0,1
270,5daN3RDsP80,UgzrRxvUoECK2M8vOAh4AaABAg.9VXqzFB0Hvy9VdqSMSaz08,@FIA Random Penalty Generator Machine You are ...,UCM9Sm_7O6qFUV_E6Ec4We-Q,0,2021-12-07T09:47:47Z,0,1
...,...,...,...,...,...,...,...,...
2493,7G7KewfdzTY,UgyHYEmG50jkG9rY0HR4AaABAg,"luke smith with the insane traction <a href=""h...",UCAaFqlgiHtYfJB8zYVt4Big,27,2022-07-13T20:56:21Z,0,1
2588,mYfBKflmgAQ,Ugzti_zPdpEc3gw5ZPd4AaABAg,The pure silence from the crowd is killing me 🫠😂😂,UCg5kkdme8ppRkmhiwZTqaxA,194,2022-12-10T11:18:13Z,3,1
2600,3oy0msSIkEI,UgwI2-SU-3i8bAQgQQp4AaABAg.9iNbXFAHftm9iNpYgga4xC,"@Oscar Arrieta though i love him, i immediatel...",UCoc2g8Cz_yboQOWie_Eq0iA,1,2022-11-13T18:32:20Z,0,1
2632,PdW4pLiOtL0,Ugzrrw9zFYyEfdCCDMZ4AaABAg,I think Mercedes will figure out their rear en...,UCNBRYTDssd7yL5tuGNpc-4w,258,2021-03-14T17:12:51Z,11,1


In [70]:
toxic_word_counter

Counter({'ass': 1,
         'insane': 13,
         'stfu': 1,
         'bullshit': 1,
         'God': 1,
         'hating': 3,
         'immature': 1,
         'terrible': 3,
         'garbage': 1,
         'crazy': 5,
         'trash': 5,
         'shills': 1,
         'a**': 1,
         'kick': 3,
         'monster': 5,
         'treacherous': 3,
         'blow': 1,
         'mediocre': 3,
         'clown': 3,
         'threats': 1,
         'useless': 1,
         'weird': 4,
         'beating': 5,
         'con': 2,
         'killing': 6,
         'rear': 6,
         'pathetic': 1,
         'chick': 1,
         'fake': 5,
         'deluded': 1,
         'bums': 1,
         'disgrace': 1,
         'cheating': 1,
         'ridiculously': 1,
         'fooled': 1,
         'aggressive': 1,
         'messed': 1,
         'duh': 1,
         'fukin': 1,
         'rat': 1,
         'beaten': 1,
         'weak': 2,
         'fkn': 1,
         'choke': 1,
         'fails': 2,
         'l': 1,

### Grievance Dictionary

### Ethnic Slurs

In [83]:
from os import listdir
import os.path

dict_files: list = list(filter(lambda f: f[-4:] == ".csv" ,listdir("dictionaries/ethnic_slurs/")))
dict_df: pd.DataFrame = pd.DataFrame()
for file in dict_files:
    part = pd.read_csv(os.path.join("dictionaries/ethnic_slurs", file))
    dict_df = pd.concat([part, dict_df])
dict_df.reset_index(inplace=True, drop=True)
dict_df

Unnamed: 0,Term,Location or origin,Targets,"Meaning, origin and notes",References
0,"Eight ball, 8ball",,Black people,"Referring to the black ball in pool. Slang, us...",
1,Eyetie,"United States, United Kingdom",Italian people,"Originated through the mispronunciation of ""It...",
2,"Dago, Dego","United States, Commonwealth","Italians, Spaniards, Portuguese people","Possibly derived from the Spanish name ""Diego""",
3,"Dago, Dego",United States,Italian people,,
4,Dal Khor,Urdu-speaking people,Indians and Pakistanis (specifically Punjabis),"The term literally translates to ""dal eater"", ...",
...,...,...,...,...,...
424,Huinca,"Argentina, Chile","Non-Mapuche Chileans, non-Mapuche Argentines",Mapuche term dating back at least to the Conqu...,
425,Hun,"United States, United Kingdom",German people,"(United States, United Kingdom) Germans, espec...",
426,Hun,Ireland,Protestants and British soldiers,A Protestant in Northern Ireland or historical...,
427,"Hunky, Hunk",United States,Central European laborers.,It originated in the coal regions of Pennsylva...,


## Transformer Classifiers

## Results

## Bibliography

<!--

In [41]:
import os
os.system("jupyter nbconvert --to markdown final.ipynb")
os.system("pandoc -s final.md -t pdf -o final.pdf --citeproc --bibliography=refs.bib --csl=apa.csl")

[NbConvertApp] Converting notebook final.ipynb to markdown
[NbConvertApp] Writing 12893 bytes to final.md


0

-->