<h1 style="text-align: center">
<div style="color: #DD3403; font-size: 60%">Data Science DISCOVERY MicroProject</div>
<span style="">MicroProject #12: AI vs Human: Response Analysis</span>
<div style="font-size: 60%;"><a href="https://discovery.cs.illinois.edu/microproject/ai-vs-human-response-analysis/">https://discovery.cs.illinois.edu/microproject/ai-vs-human-response-analysis/</a></div>
</h1>

<hr style="color: #DD3403;">

## Data Source: Hugging Face HC3 Dataset

[Hugging Face](https://huggingface.co/) is a company that has developed a platform for natural language processing (NLP) applications. They have created and shared a large collection of pre-trained models, datasets, and learning resources which are open-source and available for the public to use.

Hello-SimpleAI is a small group of researchers (PhD students and engineers) that released the [HC3 dataset](https://arxiv.org/abs/2301.07597). This dataset compares human responses (from Reddit, Wikipedia, and other sources) and AI responses to the same question, with details available on their GitHub repository ([@Hello-SimpleAI/chatgpt-comparison-detection](https://github.com/Hello-SimpleAI/chatgpt-comparison-detection)) or via the [Hugging Face API](https://huggingface.co/datasets/Hello-SimpleAI/HC3/viewer/all/train).

Are responses from AI different from humans? Let's nerd out with with this data anf find out! 🎉


### Background Knowledge

To finish this MicroProject, we assume you already know:

- All topics covered in *DISCOVERY Module 1: Basics of Data Science with Python* ([review the module here](https://discovery.cs.illinois.edu/learn/))
- Adding new columns into an existing DataFrame ([review creating new columns here](https://discovery.cs.illinois.edu/guides/Modifying-DataFrames/adding-columns-in-dataframes/))
- Working with functions in Python ([review Python functions here](https://discovery.cs.illinois.edu/learn/Simulation-and-Distributions/Functions-in-Python/))
- Understanding (and performing) hypothesis tests ([review hypothesis testing here](https://discovery.cs.illinois.edu/learn/Polling-Confidence-Intervals-and-Hypothesis-Testing/Hypothesis-Testing/))

Let's get started! :)

<hr style="color: #DD3403;">

## Part 1: Loading the Dataset

For this MicroProject, we have provided the first 100 samples of the `Hello-SampleAI` dataset that was gathered from the following URL: https://datasets-server.huggingface.co/rows?dataset=Hello-SimpleAI%2FHC3&config=all&split=train&offset=0&length=100

This data has been **cleaned up and provided to you** as `huggingface-hello-sampleAI-100.csv`.  This dataset has 100 `question`s that were asked and answered on the  "Explain Like I'm Five" (ELI5) subreddit, with the `human_answer` from reddit and a `chatgpt_answer` from ChatGPT.

Load the cleaned up CSV file dataset as a DataFrame `df`:

In [1]:
import pandas as pd

df = pd.read_csv("huggingface-hello-sampleAI-100.csv")
df

Unnamed: 0,question,human_answer,chatgpt_answer
0,"Why is every book I hear about a "" NY Times # ...","Basically there are many categories of "" Best ...",There are many different best seller lists tha...
1,"If salt is so bad for cars , why do we use it ...",salt is good for not dying in car crashes and ...,Salt is used on roads to help melt ice and sno...
2,Why do we still have SD TV channels when HD lo...,The way it works is that old TV stations got a...,There are a few reasons why we still have SD (...
3,Why has nobody assassinated Kim Jong - un He i...,You ca n't just go around assassinating the le...,It is generally not acceptable or ethical to a...
4,How was airplane technology able to advance so...,Wanting to kill the shit out of Germans drives...,After the Wright Brothers made the first power...
...,...,...,...
95,How come the Constitution of the US is relevan...,The Constitution is the highest law of the lan...,The Constitution of the United States is the s...
96,Why does Valve have Greenlight would n't it ma...,"Now , There are 2 problems with steam , and bo...","Valve, the company that developed the Steam pl..."
97,why do people so commonly say SO instead of bo...,"SO covers boyfriends , girlfriends , husbands ...","People might use the term ""SO"" instead of ""boy..."
98,How do companies that buy & re - sell other pe...,"I get that they are "" bottom feeders "" but to ...",When a company or individual owes money to ano...


### View Sample Outputs

Run the following cell to view a random question and the human and AI answers to get a feeling for this data (run it again to view a different random question).
- *Note that this text has already been tokenized, an early step in almost all text analysis that breaks a sentence into the small tokens for further processing, and may have extra spaces not naturally seen outside of NLP.*

In [2]:
# Run the following cell to view a random question and the Human and AI answers to
# get a feeling for this data (run it again to view a different random question).
sample = df.sample(n=1)
print("== Question ==")
print(f"{sample.question.values[0]}")
print()
print("== Human Answer (reddit response) ==")
print(f"{sample.human_answer.values[0]}")
print()
print("== AI Answer (ChatGPT) ==")
print(f"{sample.chatgpt_answer.values[0]}")


== Question ==
The differences between flash and HTML5 Which one is better for me as a user ? Please explain like I'm five.

== Human Answer (reddit response) ==
They are really quite different . HTML is the language that websites are written in . It describes what elements are in the page and how they 're organized ( elements such as text , images , buttons and links ) . However , HTML originally did n't include the ability to show graphics , animations or videos ( other than pre - rendered images and animated gifs ) . HTML did provide the ability to include embedded objects , which would require the user to install a plugin that could display them . One such common plugin was Flash , which was basically another small program that runs inside your browser ( another such plugin is Java which can display Java Applets , but they 're hardly in use anymore ) . So developers could use Flash to create things like games ( such as the ones you 'll find in URL_0 ) and video players ( such as Yo

In [3]:
### TEST CASE for Part 1: Loading the Dataset
# - This read-only cell contains a "checkpoint" for this section of the MicroProejct and verifies you are on the right track.
# - If this cell results in a celebration message, you PASSED all test cases!
# - If this cell results in any errors, check you previous cells, make changes, and RE-RUN your code and then this cell.
tada = "\N{PARTY POPPER}"
df = df.replace(r'\r\n', '\n', regex=True)
assert("df" in vars()), "Make sure your data is stored in `df`."
assert(len(df) == 100), "Your `df` is not the correct length."
assert("How come I have to pay for Foxtel and get ads , when things like youtube are free because they play ads . Should n't Foxtel be you pay and you do nt get ads , or it should be free and you get ads . Please explain like I'm five." in df["question"].values)
assert("It's important to remember that Japan is a country with a rich and diverse culture, and like any other country, it has a range of cultural practices and expressions. While it is true that Japan has a reputation for being a more traditional and conservative culture, it is also home to a vibrant and creative media industry. \nThe Japanese media industry is known for producing a wide variety of content, including video games, TV shows, and music, that often incorporates elements of Japanese culture and history. This can include things like traditional Japanese instruments and themes, as well as more modern and unconventional elements. \nIt's also worth noting that different parts of Japanese culture can have different levels of conservatism. For example, some aspects of Japanese culture, such as traditional family structure or gender roles, may be more conservative, while other parts of Japanese culture, such as the media industry, may be more open to experimentation and innovation. \nOverall, it's important to remember that Japan is a complex and multifaceted culture, and it's not fair to generalize or stereotype all aspects of Japanese culture based on a few specific examples." in df["chatgpt_answer"].values)
print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


<hr style="color: #DD3403;">

## Part 2: Response Length Analysis

This MicroProject attempts to identify features that may be distinctive between a human response and an AI response.  One common critique of current AI systems are that the answers are often excessively long.  Our first hypothesis to investigate is that an AI response may be longer than a human response.

To explore this, we will explore the average length of an AI response and see if there's any statistical significance to the difference from the human response.

### Part 2.1: Finding the Response Length

To perform the analysis of response length, we first need to two new columns in our DataFrame for each of our 100 sample questions.  When working with columns that contain `string` data, we work with the individual strings as data using the following **generic syntax**:

> ```py
> df["column"].str.STRING_FUNCTION()
> ```

...where `STRING_FUNCTION()` is any of the [string accessor methods](https://pandas.pydata.org/docs/reference/series.html#api-series-str).

Specifically, one string accessor method is:

> `[...].str.len()`: Compute the length of each element in the Series/Index.

To begin our analysis of the length of each response, add two new columns to the DataFrame:
- `human_answer_len`, containing the length (in characters) of the human response
- `chatgpt_answer_len`, containing the length (in characters) of the ChatGPT response

In [4]:
df["human_answer_len"] = df["human_answer"].str.len()
df["chatgpt_answer_len"] = df["chatgpt_answer"].str.len()

### Part 2.2: Compute the Average Answer Lengths

Now that we have the lengths of both responses, save the average length of the responses in the variables `human_avg` and `ai_avg`:

In [5]:
# Average response length for a human response:
human_avg = df["human_answer_len"].mean()
human_avg

np.float64(729.88)

In [6]:
# Average response length for a ChatGPT response:
ai_avg = df["chatgpt_answer_len"].mean()
ai_avg

np.float64(1098.77)

In [7]:
### TEST CASE for Part 2.2: Compute the Average Answer Lengths
tada = "\N{PARTY POPPER}"
assert("human_avg" in vars()), "Your average response length for the human responses has to be stored in the variable `human_avg`."
assert("ai_avg" in vars()), "Your average response length for the ai responses has to be stored in the variable `ai_avg`."
assert('human_answer_len' in df.columns), "The lengths of human responses should be in a column named `human_answer_len` in `df`."
assert('chatgpt_answer_len' in df.columns), "The lengths of ChatGPT responses should be in a column named `chatgpt_answer_len` in `df`."
assert(876 in df["human_answer_len"].values and 930 in df["chatgpt_answer_len"].values), "Make sure you're using the correct string accessor method."
assert((human_avg * 390)/30 == 9488.44), "`human_avg` was calculated incorrectly."
assert(((ai_avg * 444)/30) + 86 == 16347.796), "`ai_avg` was calculated incorrectly."
print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


### Part 2.3: Testing if the Difference in Length is Statistically Significant

To determine if the difference between the means is statistically significant, we're going to conduct a t-test. Let's first state our null and alternative hypotheses.

> **Null hypothesis**: There is no significant difference between the means of the two response lengths.
> 
> **Alternative hypothesis**: There is a significant difference between the means of the two response lengths.

Use the [`scipy.stats.ttest_ind()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html) function to get the t-stat (as `t_stat`) and p-value (as `p_value`).

You can find the function documentation here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

In [8]:
# Compute a t-test:
#    Syntax: t_stat, p_value = scipy.stats.ttest_ind( ... )
from scipy.stats import ttest_ind
t_stat, p_value = ttest_ind(df["human_answer_len"], df["chatgpt_answer_len"])

In [9]:
# Display the t-statistic:
t_stat

np.float64(-3.2043931944794153)

In [10]:
# Display the p-value:
p_value

np.float64(0.0015776855917118066)

### Part 2.4: Choosing Your Conclusion

Using the calculated p-value and an alpha level of 0.05, we can determine if there is a significant difference between the two mean values.

For each set of comments, un-comment **exactly one line in each set** corresponding to the true statement:

In [11]:
# == Un-comment the correct statement for the p-value: ==
#conclusion_p_value = "The p-value is greater than 0.05 (5%)"
conclusion_p_value = "The p-value is less than 0.05 (5%)"

# == Un-comment the correct statement for the significance: ==
conclusion_significance = "there IS a statistically significant difference"
#conclusion_significance = "there is NOT a statistically significant difference"

# == We will print your conclusion ==
print(f"Conclusion: {conclusion_p_value} so {conclusion_significance} difference between the means of the response lengths.")

Conclusion: The p-value is less than 0.05 (5%) so there IS a statistically significant difference difference between the means of the response lengths.


In [12]:
### TEST CASE for Part 2.4: Response Length Analysis
tada = "\N{PARTY POPPER}"
assert("t_stat" in vars()), "Store your t-statistic in the variable `t_stat`. Check the ttest_ind() documentation or the comment in that code box for the correct syntax."
assert("p_value" in vars()), "Store your p-value in the variable `p_value`. Check the ttest_ind() documentation or the comment in that code box for the correct syntax."

import math
assert( math.isclose(t_stat * p_value, -0.005055524973109543) ), "Check your t_stat and p_value."
assert( len(conclusion_p_value) * len(conclusion_significance) == 1598 ), "Your conclusion is incorrect."

print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


<hr style="color: #DD3403;">

## Part 3: Sentiment Analysis

A second common critique of current AI systems are the answers are often **more subjective** and **optimistic** than a human.

There is a field of Data and Computer Science called "Natural Language Processing (NLP)" that involves algorithms to analyze the "polarity" and "subjectivity" score of a piece of text.  In the next two parts of this MicroPorject, we will explore the polarity and subjectivity of human and AI responses.

### Part 3.1: Using TextBlob to Find the Polarity Scores

TextBlob is a Python library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. 

Among the many features, TextBlob can calculate a polarity score for any given piece of text.  A "polarity score" ranges from -1 to 1, where:
- A score of -1 is very negative,
- A score of 0 is neutral, and
- A score of 1 is very positive.

For example, the text *"You will fail DISCOVERY if you do not take the final exam"* is a negative statement and has a polarity score of -0.25.


### Part 3.2: Installing TextBlob

To use the TextBlob library, we first need to start installing the library:

1. In your terminal, type the following: `python3 -m pip install textblob`

    - If this command does not work, try the others `pip install` commands listed [here](https://discovery.cs.illinois.edu/guides/System-Setup/Setup-Your-System/#Step-4-Installing-pandas), replacing `pandas` with `textblob`.

2. Once it is installed, import the `TextBlob` library with the following import statement:

In [13]:
from textblob import TextBlob

### Part 3.3: Creating a `polarity` Function

To allow us to easily find the "polarity score" of hundreds of pieces of text using pandas, we need a function that will take a string `s` as input and return the polarity score.

Complete the `polarity` function below, by:
- Reading the [TextBlob Tutorial: Quickstart](https://textblob.readthedocs.io/en/dev/quickstart.html), specifically focusing on the "Sentiment Analysis" portion of the the tutorial.
- Finish the `polarity` so that the function returns the polarity score of the input string `s`.

In [14]:
def polarity(s):
  testimonial = TextBlob(s)
  return testimonial.sentiment.polarity

Now, let's test your function to make sure it's working:

In [15]:
# From the documentation, we expect a positive (above zero) polarity:
polarity("Textblob is amazingly simple to use. What great fun!")

0.39166666666666666

In [16]:
# A negative statement, we expect a negative (below zero) polarity:
polarity("You will fail DISCOVERY if you do not take the final exam")

-0.25

In [17]:
# A neutral statement, we expect a zero polarity:
polarity("Hello, world!")

0.0

In [18]:
# Feel free to try your own sentence :)
polarity("...")

0.0

### Part 3.4: Finding the Polarity of Human Answers

Once you have defined a custom function, you can use it on every row of data by using:
> ```python
> df["column"].apply( customFunctionName )
> ```

Note that `customFunctionName` is **ONLY the function name** and does NOT contain the **function call** (there are no parenthesis after the function name; it's just the name of the function). You are providing ONLY the **name of the function** and the `apply` function will internally **call** the function.

Using the syntax above, create a new column named `human_polarity` that finds the polarity of the `human_answer` data using the `.apply` operation on a column of the DataFrame:

In [19]:
df["human_polarity"] = df["human_answer"].apply(polarity)
df

Unnamed: 0,question,human_answer,chatgpt_answer,human_answer_len,chatgpt_answer_len,human_polarity
0,"Why is every book I hear about a "" NY Times # ...","Basically there are many categories of "" Best ...",There are many different best seller lists tha...,600,1114,0.696667
1,"If salt is so bad for cars , why do we use it ...",salt is good for not dying in car crashes and ...,Salt is used on roads to help melt ice and sno...,207,1148,0.289286
2,Why do we still have SD TV channels when HD lo...,The way it works is that old TV stations got a...,There are a few reasons why we still have SD (...,784,789,0.160761
3,Why has nobody assassinated Kim Jong - un He i...,You ca n't just go around assassinating the le...,It is generally not acceptable or ethical to a...,613,930,-0.025000
4,How was airplane technology able to advance so...,Wanting to kill the shit out of Germans drives...,After the Wright Brothers made the first power...,59,1235,-0.200000
...,...,...,...,...,...,...
95,How come the Constitution of the US is relevan...,The Constitution is the highest law of the lan...,The Constitution of the United States is the s...,368,1281,-0.083333
96,Why does Valve have Greenlight would n't it ma...,"Now , There are 2 problems with steam , and bo...","Valve, the company that developed the Steam pl...",876,1034,-0.061111
97,why do people so commonly say SO instead of bo...,"SO covers boyfriends , girlfriends , husbands ...","People might use the term ""SO"" instead of ""boy...",125,846,-0.250000
98,How do companies that buy & re - sell other pe...,"I get that they are "" bottom feeders "" but to ...",When a company or individual owes money to ano...,202,1254,0.170000


### Part 3.5: Finding the Polarity of ChatGPT Answers

Now, find the polarity of ChatGPT answers and save each polarity in the column `chatgpt_polarity`:

In [20]:
df["chatgpt_polarity"] = df["chatgpt_answer"].apply(polarity)
df

Unnamed: 0,question,human_answer,chatgpt_answer,human_answer_len,chatgpt_answer_len,human_polarity,chatgpt_polarity
0,"Why is every book I hear about a "" NY Times # ...","Basically there are many categories of "" Best ...",There are many different best seller lists tha...,600,1114,0.696667,0.352774
1,"If salt is so bad for cars , why do we use it ...",salt is good for not dying in car crashes and ...,Salt is used on roads to help melt ice and sno...,207,1148,0.289286,0.113158
2,Why do we still have SD TV channels when HD lo...,The way it works is that old TV stations got a...,There are a few reasons why we still have SD (...,784,789,0.160761,0.196140
3,Why has nobody assassinated Kim Jong - un He i...,You ca n't just go around assassinating the le...,It is generally not acceptable or ethical to a...,613,930,-0.025000,0.032037
4,How was airplane technology able to advance so...,Wanting to kill the shit out of Germans drives...,After the Wright Brothers made the first power...,59,1235,-0.200000,0.336167
...,...,...,...,...,...,...,...
95,How come the Constitution of the US is relevan...,The Constitution is the highest law of the lan...,The Constitution of the United States is the s...,368,1281,-0.083333,0.182286
96,Why does Valve have Greenlight would n't it ma...,"Now , There are 2 problems with steam , and bo...","Valve, the company that developed the Steam pl...",876,1034,-0.061111,0.000000
97,why do people so commonly say SO instead of bo...,"SO covers boyfriends , girlfriends , husbands ...","People might use the term ""SO"" instead of ""boy...",125,846,-0.250000,0.080769
98,How do companies that buy & re - sell other pe...,"I get that they are "" bottom feeders "" but to ...",When a company or individual owes money to ano...,202,1254,0.170000,0.112500


In [21]:
### TEST CASE for Part 3: Sentiment Analysis - Polarity
tada = "\N{PARTY POPPER}"

assert('human_polarity' in df.columns), "You must have add a column named `human_polarity` to your DataFrame."
assert('chatgpt_polarity' in df.columns), "You must have add a column named `chatgpt_polarity` to your DataFrame."

import math
assert(math.isclose( df["human_polarity"].mean(), 0.07790771420489018 )), "The calculation of the `human_polarity` is incorrect."
assert(math.isclose( df["chatgpt_polarity"].std(), 0.11014935757814867 )), "The calculation of the `chatgpt_polarity` is incorrect."

print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


### Analysis: Overall Results So Far

Run the following cell to see a summary of the results you found so far:

In [22]:
pd.DataFrame([
  {"Metric": "Response Length (mean):",  "Human": human_avg, "ChatGPT": ai_avg, "p-value": p_value, "significant": ("Yes (p < 0.05)" if p_value < 0.05 else "No") },
  {"Metric": "Polarity (mean):", "Human": df.human_polarity.mean(), "ChatGPT": df.chatgpt_polarity.mean(), "p-value": "(not tested)", "significant": "?" },
])

Unnamed: 0,Metric,Human,ChatGPT,p-value,significant
0,Response Length (mean):,729.88,1098.77,0.001578,Yes (p < 0.05)
1,Polarity (mean):,0.077908,0.098595,(not tested),?


<hr style="color: #DD3403;">

## Part 4: Sentiment Analysis - Subjectivity

TextBlob can also calculate a subjectivity score for any given piece of text.  A "subjectivity score" ranges from 0 to 1, where:
- A score of 0 is not a subjective statement, and
- A score of 1 is very subjective statement

Using the same library and design as Part 2, create a function called `subjectivity` that returns the subjectivity score of a given string `s`:

In [23]:
def subjectivity(s):
  testimonaial = TextBlob(s)
  return testimonaial.sentiment.subjectivity

Let's try it out:

In [24]:
# A highly subjective statement, expecting a score above 0.5:
subjectivity("The best fruit is a kiwi and everyone else is wrong!")

0.6

In [25]:
# A factual statement, expecting a score near 0:
subjectivity("Winter is colder than summer.")

0.0

In [26]:
# Feel free to try your own :)
subjectivity("...")

0.0

### Part 4.1: Finding the Subjectivity of Human and ChatGPT Answers

Add two new columns to the DataFrame, `human_subjectivity` and `chatgpt_subjectivity`, that stores the subjectivity of each answer for all of the questions.

*You may need to look back to the previous section to review how to use the .apply function.*

In [27]:
df["human_subjectivity"] = df["human_answer"].apply(subjectivity)
df["chatgpt_subjectivity"] = df["chatgpt_answer"].apply(subjectivity)
df

Unnamed: 0,question,human_answer,chatgpt_answer,human_answer_len,chatgpt_answer_len,human_polarity,chatgpt_polarity,human_subjectivity,chatgpt_subjectivity
0,"Why is every book I hear about a "" NY Times # ...","Basically there are many categories of "" Best ...",There are many different best seller lists tha...,600,1114,0.696667,0.352774,0.348333,0.490007
1,"If salt is so bad for cars , why do we use it ...",salt is good for not dying in car crashes and ...,Salt is used on roads to help melt ice and sno...,207,1148,0.289286,0.113158,0.625000,0.577193
2,Why do we still have SD TV channels when HD lo...,The way it works is that old TV stations got a...,There are a few reasons why we still have SD (...,784,789,0.160761,0.196140,0.387570,0.574912
3,Why has nobody assassinated Kim Jong - un He i...,You ca n't just go around assassinating the le...,It is generally not acceptable or ethical to a...,613,930,-0.025000,0.032037,0.521429,0.602222
4,How was airplane technology able to advance so...,Wanting to kill the shit out of Germans drives...,After the Wright Brothers made the first power...,59,1235,-0.200000,0.336167,0.800000,0.484583
...,...,...,...,...,...,...,...,...,...
95,How come the Constitution of the US is relevan...,The Constitution is the highest law of the lan...,The Constitution of the United States is the s...,368,1281,-0.083333,0.182286,0.333333,0.648095
96,Why does Valve have Greenlight would n't it ma...,"Now , There are 2 problems with steam , and bo...","Valve, the company that developed the Steam pl...",876,1034,-0.061111,0.000000,0.388194,0.346154
97,why do people so commonly say SO instead of bo...,"SO covers boyfriends , girlfriends , husbands ...","People might use the term ""SO"" instead of ""boy...",125,846,-0.250000,0.080769,0.495833,0.504487
98,How do companies that buy & re - sell other pe...,"I get that they are "" bottom feeders "" but to ...",When a company or individual owes money to ano...,202,1254,0.170000,0.112500,0.700000,0.465625


In [28]:
### TEST CASE for Part 4: Sentiment Analysis - Subjectivity
tada = "\N{PARTY POPPER}"
import math
assert('human_subjectivity' in df.columns), "You must have add a column named `human_subjectivity` to your DataFrame."
assert('chatgpt_subjectivity' in df.columns), "You must have add a column named `chatgpt_subjectivity` to your DataFrame."
assert( math.isclose( df["human_subjectivity"].mean(), 0.4507534813188739 ) ), "Your calculation of `human_subjectivity` is incorrect."
assert( math.isclose( df["chatgpt_subjectivity"].std(), 0.10025355258208316 ) ), "Your calculation of `chatgpt_subjectivity` is incorrect."
print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


### Analysis: Overall Results So Far

Run the following cell to see a summary of the results you found so far:

In [29]:
pd.DataFrame([
  {"Metric": "Response Length (mean):",  "Human": human_avg, "ChatGPT": ai_avg, "p-value": p_value, "significant": ("Yes (p < 0.05)" if p_value < 0.05 else "No") },
  {"Metric": "Polarity (mean):", "Human": df.human_polarity.mean(), "ChatGPT": df.chatgpt_polarity.mean(), "p-value": "(not tested)", "significant": "?" },
  {"Metric": "Subjectivity (mean of absolutely values):", "Human": df.human_subjectivity.mean(), "ChatGPT": df.chatgpt_subjectivity.mean(), "p-value": "(not tested)", "significant": "?" },
])

Unnamed: 0,Metric,Human,ChatGPT,p-value,significant
0,Response Length (mean):,729.88,1098.77,0.001578,Yes (p < 0.05)
1,Polarity (mean):,0.077908,0.098595,(not tested),?
2,Subjectivity (mean of absolutely values):,0.450753,0.499217,(not tested),?


<hr style="color: #DD3403;">

## Part 5: Testing if the Difference in Subjectivity is Statistically Significant

To determine if the difference between the subjectivity is statistically significant, we're going to conduct another t-test. Let's first state our null and alternative hypotheses.

> **Null hypothesis**: There is no significant difference between the means of the human and ChatGPT subjectivity.
> 
> **Alternative hypothesis**: There is a significant difference between the means of the human and ChatGPT subjectivity.

### Part 5.1: Preforming the t-test

Use the `scipy.stats.ttest_ind` function again to get the t-stat and p-value, storing it as `t_stat_subjectivity` and `p_value_subjectivity` 

In [30]:
# Compute a t-test:
#    Syntax: t_stat_subjectivity, p_value_subjectivity = ttest_ind( ... )
t_stat_subjectivity, p_value_subjectivity = ttest_ind(df["human_subjectivity"], df["chatgpt_subjectivity"])

In [31]:
t_stat_subjectivity

np.float64(-2.5152049954631126)

In [32]:
p_value_subjectivity

np.float64(0.012691373254489834)

### Part 5.2: Choosing Your Conclusion

Using the calculated p-value and an alpha level of 0.05, we can determine if there is a significant difference between the two mean values.

For each set of comments, un-comment **exactly one line in each set** corresponding to the true statement:

In [37]:
# == Un-comment the correct statement for the p-value: ==
#conclusion2_p_value = "The p-value is greater than 0.05 (5%)"
conclusion2_p_value = "The p-value is less than 0.05 (5%)"

# == Un-comment the correct statement for the significance: ==
conclusion2_significance = "there IS a statistically significant difference"
#conclusion2_significance = "there is NOT a statistically significant difference"

# == We will print your conclusion ==
print(f"Conclusion: {conclusion2_p_value} so {conclusion2_significance} difference between the means of the subjectivity scores.")

Conclusion: The p-value is less than 0.05 (5%) so there IS a statistically significant difference difference between the means of the subjectivity scores.


In [38]:
### TEST CASE for Part 5: Testing if the Difference in Subjectivity is Statistically Significant
tada = "\N{PARTY POPPER}"
assert("t_stat" in vars())
assert("p_value" in vars())

import math
assert( math.isclose(t_stat_subjectivity * p_value_subjectivity, -0.031921405408979774) ), "Check your t_stat and p_value."
assert( len(conclusion2_p_value) * len(conclusion2_significance) == 1598 ), "Your conclusion is incorrect."

print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


### Analysis: Results

Run the following cell to see a summary of the results of all of the metrics (we finished the t-test for polarity for you as well):

In [39]:
pd.DataFrame([
  {"Metric": "Response Length (mean):",  "Human": human_avg, "ChatGPT": ai_avg, "p-value": p_value, "significant": ("Yes (p < 0.05)" if p_value < 0.05 else "No") },
  {"Metric": "Polarity (mean):", "Human": df.human_polarity.mean(), "ChatGPT": df.chatgpt_polarity.mean(), "p-value": 0.3258256055853501, "significant": "No" },
  {"Metric": "Subjectivity (mean of absolutely values):", "Human": df.human_subjectivity.mean(), "ChatGPT": df.chatgpt_subjectivity.mean(), "p-value": p_value_subjectivity, "significant": ("Yes (p < 0.05)" if p_value_subjectivity < 0.05 else "No") },
])

Unnamed: 0,Metric,Human,ChatGPT,p-value,significant
0,Response Length (mean):,729.88,1098.77,0.001578,Yes (p < 0.05)
1,Polarity (mean):,0.077908,0.098595,0.325826,No
2,Subjectivity (mean of absolutely values):,0.450753,0.499217,0.012691,Yes (p < 0.05)


<hr style="color: #DD3403;">

## Submission

You're almost done!  All you need to do is to commit your lab to GitHub and run the GitHub Actions Grader:

1.  ⚠️ **Make certain to save your work.** ⚠️ To do this, go to **File => Save All**

2.  After you have saved, exit this notebook and return to https://discovery.cs.illinois.edu/microproject/ai-vs-human-response-analysis/ and complete the section **"Commit and Grade Your Notebook"**.

3. If you see a 100% grade result on your GitHub Action, you've completed this MicroProject! 🎉