In [1]:
__author__ = "Mars Gokturk Buchholz"
__version__ = "CS224u, Stanford, Winter 2023"

In [1]:
import pandas as pd
import numpy as np
pd.set_option('display.max_colwidth', None)
from api_requests.open_ai_client import LiarDataset

[34m[1mwandb[0m: Currently logged in as: [33mmars-gokturk[0m ([33mmars-works[0m). Use [1m`wandb login --relogin`[0m to force relogin


In [2]:
zero_shot_results = pd.read_csv("../output_data/all_test_results.csv", index_col=0)
test_ds = LiarDataset("../input_data/test.tsv", "liar_test").get()
test_ds = test_ds[["statement", "label"]]
test_ds = test_ds.rename(columns={"label": "true_label"})
df = test_ds.merge(zero_shot_results, left_index=True, right_on="idx")
df.head()

Unnamed: 0,statement,true_label,idx,predicted_label,evidence
0,Building a wall on the U.S.-Mexico border will take literally years.,true,0,true,"""according to a report by the government accountability office, it took an average of 4.5 years to complete the construction of the 654 miles of fencing already in place along the border as of 2015."" (source: npr, ""how long would it take to build trump's wall?,"" january 25, 2017)"
1,Wisconsin is on pace to double the number of layoffs this year.,false,1,mostly-true,"according to the wisconsin department of workforce development, there were 10,872 layoffs in the state from january to august 2020, compared to 5,038 layoffs during the same period in 2019. this represents an increase of over 100%, which supports the statement. however, it is important to note that the covid-19 pandemic has had a significant impact on the economy and job market, which is not mentioned in the statement. source: https://www.jsonline.com/story/money/business/2020/09/23/wisconsin-layoffs-2020-could-double-last-years-total/5867588002/"
2,Says John McCain has done nothing to help the vets.,false,2,false,"according to a politifact article from august 2018, john mccain was a strong advocate for veterans and worked on several pieces of legislation to improve their care and benefits. he also served in the military himself and was a prisoner of war in vietnam."
3,Suzanne Bonamici supports a plan that will cut choice for Medicare Advantage seniors.,half-true,3,false,"according to politifact, suzanne bonamici has not supported any plan that would cut choice for medicare advantage seniors. in fact, she has supported legislation to protect and strengthen medicare advantage. (source: https://www.politifact.com/factchecks/2020/oct/08/congressional-leadership-fund/ad-twists-bonamicis-record-medicare-advantage/)"
4,"When asked by a reporter whether hes at the center of a criminal scheme to violate campaign laws, Gov. Scott Walker nodded yes.",pants-fire,4,false,"the statement is not accurate. there is no evidence that gov. scott walker nodded yes when asked by a reporter whether he's at the center of a criminal scheme to violate campaign laws. in fact, the statement seems to be a fabrication as there is no record of such an interaction between gov. walker and a reporter."


In [3]:
df["predicted_label"].value_counts()

mostly-true    580
false          331
pants-fire     133
barely-true    114
true            71
half-true       35
undefined        3
Name: predicted_label, dtype: int64

In [4]:
accuracy_score = np.round(np.sum(df["true_label"] == df["predicted_label"]) / df.shape[0], 3)
accuracy_score

0.266

## Error Analysis

In [5]:
df["predicted_label"].value_counts()

mostly-true    580
false          331
pants-fire     133
barely-true    114
true            71
half-true       35
undefined        3
Name: predicted_label, dtype: int64

In [6]:
df["true_label"].value_counts()

half-true      265
false          249
mostly-true    241
barely-true    212
true           208
pants-fire      92
Name: true_label, dtype: int64

### False Negatives

In [7]:
FN = df[(df["true_label"].isin(["true", "mostly-true"])) & \
              (df["predicted_label"].isin(["pants-fire", "false"]))]
FN.info()         

<class 'pandas.core.frame.DataFrame'>
Int64Index: 91 entries, 6 to 1175
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   statement        91 non-null     object
 1   true_label       91 non-null     object
 2   idx              91 non-null     int64 
 3   predicted_label  91 non-null     object
 4   evidence         91 non-null     object
dtypes: int64(1), object(4)
memory usage: 4.3+ KB


In [8]:
FN.sample(5) #idx=943 in Appendix

Unnamed: 0,statement,true_label,idx,predicted_label,evidence
297,He (Obama) chairs the subcommittee on Europe. ... He's held not one substantive hearing to do oversight.,true,374,false,"according to the official website of the united states senate committee on foreign relations, barack obama was not the chair of the subcommittee on europe during his time in the senate. additionally, a search of the congressional record shows that obama did hold substantive hearings on european issues during his time in the senate."
103,Says American polling shows Russian President Vladimir Putin has an 80 percent approval rating.,true,180,false,"according to a 2021 poll by the independent levada center, putin's approval rating in russia is at 64 percent, not 80 percent. (source: reuters, ""putin's approval rating at 64%: poll,"" june 29, 2021)"
700,"We havent had a Republican senator in Washington for ... why, I think Clifford Case was our last Republican senator.",mostly-true,777,false,"the statement is false as the last republican senator from washington was slade gorton who served from 1981 to 1987. (source: seattle times, ""slade gorton, former washington governor and senator, dies at 92"", april 19, 2020)"
427,"His new running mate, Congressman Ryan, put forward a plan that would let Governor Romney pay less than 1 percent in taxes each year.",mostly-true,504,pants-fire,"this statement is false and makes a ridiculous claim. there is no evidence to support the claim that congressman ryan's plan would allow governor romney to pay less than 1 percent in taxes each year. in fact, both romney and ryan have released their tax returns, which show that they paid a higher percentage of their income in taxes than 1 percent. (source: politifact, ""romney's tax rate not as low as 1 percent,"" august 13, 2012)"
253,"If you look at the number of illegal immigrants coming into the country, it is net zero. Its been that way now for almost two years.",true,330,false,"according to data from the department of homeland security, there were approximately 977,000 apprehensions at the southern border in fiscal year 2019, and 458,000 in fiscal year 2020. this indicates that there are still a significant number of individuals attempting to enter the country illegally. (source: npr, ""fact check: is illegal immigration at a historic low?"", october 23, 2020)"


### False positives

In [9]:
FP = df[(df["true_label"].isin(["pants-fire", "false"])) & \
     (df["predicted_label"].isin(["true", "mostly-true"]))]
FP.info()  

<class 'pandas.core.frame.DataFrame'>
Int64Index: 106 entries, 1 to 1189
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   statement        106 non-null    object
 1   true_label       106 non-null    object
 2   idx              106 non-null    int64 
 3   predicted_label  106 non-null    object
 4   evidence         106 non-null    object
dtypes: int64(1), object(4)
memory usage: 5.0+ KB


In [10]:
FP.head()

Unnamed: 0,statement,true_label,idx,predicted_label,evidence
1,Wisconsin is on pace to double the number of layoffs this year.,false,1,mostly-true,"according to the wisconsin department of workforce development, there were 10,872 layoffs in the state from january to august 2020, compared to 5,038 layoffs during the same period in 2019. this represents an increase of over 100%, which supports the statement. however, it is important to note that the covid-19 pandemic has had a significant impact on the economy and job market, which is not mentioned in the statement. source: https://www.jsonline.com/story/money/business/2020/09/23/wisconsin-layoffs-2020-could-double-last-years-total/5867588002/"
16,Unfortunately we have documented instances where people defecated in the (Statehouse) building.,pants-fire,16,mostly-true,"according to an article by politifact published on january 12, 2021, there were reports of individuals defecating in the capitol building during the january 6th insurrection. however, it is important to note that the statement only refers to the statehouse building, which may or may not be the same as the capitol building."
30,Its been since 1888 that a Senate of a different party than the president in the White House confirmed a Supreme Court nominee.,false,30,mostly-true,"according to a fact-check by politifact, it is true that since 1888, there has only been one instance where a senate of a different party than the president confirmed a supreme court nominee. however, it is important to note that the political landscape was different in the late 19th century and early 20th century, and the current political climate may not necessarily follow historical precedent. (source: https://www.politifact.com/factchecks/2020/09/22/mitch-mcconnell/mcconnell-right-about-history-supreme-court-confir/)"
31,"Under Rosemary Lehmberg, the Travis County D.A.s office convened the grand jury that indicted Rick Perry.",false,31,true,"""under lehmberg, the travis county district attorney's office had convened a grand jury to investigate allegations that perry had abused his power by vetoing funding for the state's public integrity unit, which was housed in the travis county district attorney's office."" - politifact, ""rick perry says he was right to veto funding for da's office,"" august 19, 2014."
54,When undocumented children are picked up at the border and told to appear later in court ... 90 percent do not then show up.,false,54,mostly-true,"according to a report by the department of justice, from 2014 to 2016, approximately 90% of unaccompanied minors who were released from custody and given a notice to appear in court did not show up for their court hearings. (source: npr, ""fact check: how many migrant children get legal representation?"" june 21, 2018)"


### No evidence cases

In [11]:
df[df["evidence"] == "i don't have any evidence."].head() # idx=312

Unnamed: 0,statement,true_label,idx,predicted_label,evidence
12,"Pepper ... kicked off a jock tax, imposing a levy on the sports and entertainment industry.",true,89,false,i don't have any evidence.
135,Loranne Ausley voted six times to tax your savings.,false,212,pants-fire,i don't have any evidence.
200,Barack Obama and Hillary Clinton have changed their positions (on the Iraq war withdrawal) to follow Chris Dodd.,half-true,277,false,i don't have any evidence.
205,Says Ron Johnson gave himself a $10 million sweetheart corporate payout,half-true,282,pants-fire,i don't have any evidence.
235,Foreign Chinese prostitution money is allegedly behind the groups funding Congressman Sean Duffys Republican Majority.,pants-fire,312,pants-fire,i don't have any evidence.


### True Positives

In [12]:
TP = df[(df["true_label"].isin(["true", "mostly_true"])) & \
     (df["predicted_label"].isin(["true", "mostly-true"]))]
TP.info()  

<class 'pandas.core.frame.DataFrame'>
Int64Index: 156 entries, 0 to 1176
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   statement        156 non-null    object
 1   true_label       156 non-null    object
 2   idx              156 non-null    int64 
 3   predicted_label  156 non-null    object
 4   evidence         156 non-null    object
dtypes: int64(1), object(4)
memory usage: 7.3+ KB


In [13]:
TP.head()

Unnamed: 0,statement,true_label,idx,predicted_label,evidence
0,Building a wall on the U.S.-Mexico border will take literally years.,True,0,true,"""according to a report by the government accountability office, it took an average of 4.5 years to complete the construction of the 654 miles of fencing already in place along the border as of 2015."" (source: npr, ""how long would it take to build trump's wall?,"" january 25, 2017)"
5,Over the past five years the federal government has paid out $601 million in retirement and disability benefits to deceased former federal employees.,True,5,mostly-true,"according to a report by the office of personnel management, from 2014 to 2019, the federal government paid out $601 million in retirement and disability benefits to deceased former federal employees. however, it is important to note that the report also states that the majority of these payments were made in error and were later recovered by the government. (source: npr, ""federal government paid $601 million in benefits to dead people,"" july 6, 2020)"
15,Says the unemployment rate for college graduates is 4.4 percent and over 10 percent for noncollege-educated.,True,15,mostly-true,"according to the bureau of labor statistics, as of august 2021, the unemployment rate for individuals with a bachelor's degree or higher is 3.2 percent, while the unemployment rate for those with less than a high school diploma is 9.0 percent and for those with a high school diploma but no college is 6.0 percent. however, the statement is mostly true as it is close to the actual figures and the difference between the two rates is significant. source: https://www.bls.gov/web/empsit/cpsee_e16.htm"
18,"Each year, 18,000 people die in America because they don't have health care.",True,18,mostly-true,"according to a study published in the american journal of public health in 2009, an estimated 44,789 americans die each year due to lack of health insurance. however, this number has been disputed and some experts argue that it may be an overestimate."
25,"Now, there was a time when someone like Scalia and Ginsburg got 95-plus votes.",True,25,mostly-true,"according to a fact-check by politifact, supreme court justices antonin scalia and ruth bader ginsburg were both confirmed with overwhelming bipartisan support in the senate. scalia was confirmed with a vote of 98-0 in 1986, and ginsburg was confirmed with a vote of 96-3 in 1993. however, it should be noted that the statement in question does not specify which confirmation process or which specific vote it is referring to."
