# Notebook 3: Are GPT Detectors biased against non-native English Speakers?

In 2022, OpenAI released a large language model called ChatGPT, which was able to mimic human writing to a high level of degree. With this new tool available, fear from educators, and other stakeholders, about AI-generated content lead to the rise of known GPT detectors to determine if texts were human or AI written. A study published in the journal of *Patterns* highlights a potential concern: [Are GPT detectors biased against non-native English speakers?](https://www.sciencedirect.com/science/article/pii/S2666389923001307) In this notebook, we will look into the data to determine if there is a biased component in GPT detectors. More information on the data set can be found [here](https://github.com/rfordatascience/tidytuesday/blob/main/data/2023/2023-07-18/readme.md).

## Run Intial Code

In [None]:
# This code will load the R packages we will use
install.packages(c("csucistats"),
                 repos = c("https://inqs909.r-universe.dev", "https://cloud.r-project.org"))
library(csucistats)
library(tidyverse)


# Uncomment and run for categorical plots
# csucistats::install_plots()
# library(ggtricks)
# library(ggmosaic)
# library(waffle)

# Uncomment and run for themes
# csucistats::install_themes()
# library(ThemePark)
# library(ggthemes)


detectors <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-07-18/detectors.csv')


### Reset Data

In [None]:
detectors <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-07-18/detectors.csv')

## 1.0 - Thinking about the Question and Data

The `detectors` data set contains 6,185 rows and 9 variables, where each row represent information about an essay. 

1.1 - Take a quick `glimpse()` on the data sets and identify variables that may be of interest for the analysis.

1.2 - Given the variables, what are some questions that you may want to explore from the data. Look at the source [page](https://github.com/rfordatascience/tidytuesday/blob/main/data/2023/2023-07-18/readme.md) for more information about the variables.

## 2.0 - Descriptive Statistics

2.1 - Using the variable `kind`, what is the proportion of essays that were written by an "AI" model.

2.2 -  Using the `model` variable, what proportion of essays were written with "GPT3"? "GPT4"?

2.3 - Using the `native` variable, how many essays were written by non-native English speakers?

2.4 - Using the `detector` variable, describe distribution of use with the different types of GPT detectors.

## 3.0 - Effectiveness of GPT Detectors

A GPT Detector is only good if it can differentiate between a human piece of work and an AI-generated one. In this section we will see how effective these GPT dettectors are.

3.1 - Generate a cross-tabulation table between the variables `model` and `.pred_class`.

3.2 - What percentage of essays were correctly classified?

3.3 - What percentage of essays were incorrectly classified?

3.4 -  Given that the paper was writen by a human, what is the percentage that it will be classified as human-created?

3.5 - Given that the paper was written by GPT4, what is the proporiton that it will be classified as AI-created?

3.6 - Given that a paper was classified as written by a human, what is the proportion that it was actually written by a human?

3.7 - Given that a paper was classified as written by AI, what is the proportion that it was actually written by a AI?

3.8 - Look at question 3.6 and 3.7, what do you think these questions are really asking. Hint: These questions are much more powerful than they appear.

## 4.0  - Are GPT Detectors Biased?

In this section, we will create a new data set called `detectors2`. This new data set will contain a new variable called `model2` which is similart to `model` from `detectors`, but it will reclassify "Human" as "Native" or "Non-native".

4.1 - Describe in words what you think the code is doing below:

In [5]:
detectors2 <- detectors |> mutate(model2 = case_when(native == "Yes" & model == "Human" ~ "Native",
                                                     native == "No" & model == "Human" ~ "Non-native",
                                                     model == "GPT3" ~ "GPT3",
                                                     model == "GPT4" ~ "GPT4"))

4.2 - Obtain the continguency table for both variables `model` and `model2` from the `detectors2` data set. Does the variable `model2` look like it was created correctly. 

4.3 - Create a cross-tabs table between `model2` and `.pred_class`.

4.4 - Given that a native student wrote the paper, what is the proportion of it being classified as AI written?

4.5 - Given that a non-native student wrote the paper, what is the proportion of it being classified as AI written?

4.6 - Compare the two proportions between 4.4 and 4.5. What conclusions can you draw from the results. 

4.7 - In 4.4 and 4.5, what method did you use to calculate the answer? Row or column proportion.

4.8 - Based on your answer in 4.7, why would the other method (row or column, opposite of what you wrote from 4.7), will not work?

## 5.0 Submit Notebook

Submit your notebook to canvas.