# Using chatGPT to generate false statements

The aim is to generate a set of true and false statements to test fact-checking approaches. We need interesting statements and the false statements need to be plausible.

For the true statements we use [TriviaQA](http://nlp.cs.washington.edu/triviaqa/) which is a dataset of questions and answers that is meant for reading comprehension and question answering. 

To generate true and false statements, we use chatGPT to 1. combine the question and answer into a statement 2. generate a fake answer similar to the true answer (assume the true answer is unique and not equivalent to the fake one) 3. combine the question and fake answer into a false statement.



In [None]:
import openai
import json
import os
import re

MODEL = "gpt-3.5-turbo"
BASEDIR = os.getcwd()
DATADIR = os.path.join(BASEDIR, '../datasets/triviaqa-unfiltered')
OUTDIR = os.path.join(BASEDIR, '../testData')
BATCHSIZE = 16
DUMPINTERVAL = 8

def processQA(list: str    # a list of (question, answer) pairs
             ):
    query = [
        {"role": "user",
         "content": f"""This is a multi-step task.
         1. Given the list of pairs of question and answer below, combine them to make a statement. Example:
         
         ("What is the fastest mammal?", "the cheetah")
         Statement: The cheetah is the fastest mammal.
         
         2. Given the answer, generate a fake answer in the same class. Example:

         ("What is the fastest mammal?", "the cheetah")
         Fake answer: the sloth

         3. Given the question and the fake answer, combine them to make a fake statement. Example:

         ("What is the fastest mammal?", "the cheetah")
         Fake answer: the sloth
         Fake statement: The sloth is the fastest mammal.

         4. Generate your response as a json list of dicts with keys: 'question', 'answer', 'statement', 'fake_answer' and 'fake_statement'.
         List of pairs:
         {list}
         """
        }]
    oaiResp = openai.ChatCompletion.create(
        model=MODEL,
        messages=query,
        temperature=0,
    )
    return oaiResp



## TriviaQA

This consists of questions and answers among other things, all human verified.  For instance:
~~~
        {
            "Answer": {
                "Aliases": [
                    "Niger Republic",
                    "Nigerois",
                    "Republic Of Niger",
                    "Republic of Niger",
                    "The Republic of Niger",
                    "Nigerien",
                    "Niger (country)",
                    "République du Niger",
                    "Republique du Niger",
                    "ISO 3166-1:NE",
                    "Niger",
                    "NG-NI"
                ],
                "MatchedWikiEntityName": "Niger",
                "NormalizedAliases": [
                    "republic of niger",
                    "niger republic",
                    "niger",
                    "république du niger",
                    "niger country",
                    "ng ni",
                    "republique du niger",
                    "nigerois",
                    "nigerien",
                    "iso 3166 1 ne"
                ],
                "NormalizedMatchedWikiEntityName": "niger",
                "NormalizedValue": "niger",
                "Type": "WikipediaEntity",
                "Value": "Niger"
            },
            "EntityPages": [
                {
                    "DocSource": "TagMe",
                    "Filename": "Niamey.txt",
                    "LinkProbability": "1.00000",
                    "Rho": "0.67254",
                    "Title": "Niamey"
                }
            ],
            "Question": "Of which African country is Niamey the capital?",
            "QuestionId": "tc_241",
            "QuestionSource": "http://www.triviacountry.com/",
            "SearchResults": [
                {
                    "Description": "Location Niamey is the capital and largest city of the West African ... Niamey is the capital and largest city of the West African country Niger. Niamey lies ...",
                    "DisplayUrl": "fortuneofafrica.com/niger/niamey-city",
                    "Filename": "166/166_2204158.txt",
                    "Rank": 0,
                    "Title": "Niamey City - Fortune of Africa Niger",
                    "Url": "http://fortuneofafrica.com/niger/niamey-city/"
                },
...
                {
                    "Description": "Country: Niger. Diocese of Niamey. Latin Nam
e: Niameyensis; Elevated: 21 March 1961; Immediately Subject to the Holy See Cou
ntry: Niger. ... Archdiocese of Niamey: 2013:",
                    "DisplayUrl": "www.catholic-hierarchy.org/diocese/dniam.html
",
                    "Rank": 45,
                    "Title": "Niamey (Archdiocese) [Catholic-Hierarchy]",
                    "Url": "http://www.catholic-hierarchy.org/diocese/dniam.html
"
                }
            ]
        },


~~~

We pick the MatchedWikiEntityName if available, else the first alias as answer and output a file of questions and answers.



In [None]:
# processing the triviaQA data

def batchGen(inFname: str) -> list[list[str]]:
    batch = []
    bcnt = 0
    with open(inFname, 'r') as ix:
        ds = json.load(ix)
        for d in ds['Data']:
            try:
                if 'MatchedWikiEntityName' in d['Answer']:
                    ans = d['Answer']['MatchedWikiEntityName']
                else: 
                    ans = d['Answer']['Aliases'][0]
                ans = re.sub('\(.*\)', '', ans)
                batch.append([d['Question'], ans])
                bcnt += 1
            except Exception as e:
                print(e)
                continue
            if bcnt >= BATCHSIZE:
                yield batch
                batch = []
                bcnt = 0
        if bcnt > 0:
            yield batch

def stringifyBatch(batch: str) -> str:
    sb = ''
    for r in batch:
        sb += f"(\"{r[0]}\". \"{r[1]}\")\n"
    return sb

def genStmtFile(inFile: str, outFile:str, startBatch: int = 0, batchesToDo: int =-1) -> None:
    outData = []
    bgen = batchGen(inFile)
    done = False
    bcnt = -1
    dbcnt = 0
    while not done:
        if batchesToDo > 0 and dbcnt > batchesToDo : break
        try:
            batch = next(bgen)
            bcnt += 1
            if bcnt % 10 == 0: print(f"batch: {bcnt}")
            if bcnt < startBatch: continue
        except StopIteration:
            done = True
            continue
        sbatch = stringifyBatch(batch)
        try:
            oaiObj = processQA(sbatch)
            jresp = oaiObj['choices'][0]['message']['content']
            tTokens = oaiObj['usage']['total_tokens']
            resp = json.loads(jresp)
            outData.extend(resp)
            dbcnt += 1
            if dbcnt % DUMPINTERVAL == 0:
                with open(outFile, 'w') as ox:
                    json.dump(outData, ox, indent=4)
        except Exception as e:
            print('Exception ', e)
            print(batch)
    with open(outFile, 'w') as ox:
        json.dump(outData, ox, indent=4)
 
            
genStmtFile(os.path.join(DATADIR, 'unfiltered-web-dev.json'),
            os.path.join(OUTDIR, 'tf_qa-dev.json'),
            0, 100)
        

## Results

Generally, the true statements are correct and the false statements seem plausible.
~~~
    {
        "question": "Of which African country is Niamey the capital?",
        "answer": "Niger",
        "statement": "Niamey is the capital of Niger.",
        "fake_answer": "Nigeria",
        "fake_statement": "Niamey is the capital of Nigeria."
    },
~~~

chatGPT came up with a tricky fake answer in this case. Another example:

~~~
    {
        "question": "In Lewis Carroll's poem The Hunting of the Snark, what did 
the elusive, troublesome snark turn into to fool hunters?",
        "answer": "Boojum",
        "statement": "The elusive, troublesome snark turned into Boojum to fool 
hunters in Lewis Carroll's poem The Hunting of the Snark.",
        "fake_answer": "Jabberwocky",
        "fake_statement": "The elusive, troublesome snark turned into Jabberwock
y to fool hunters in Lewis Carroll's poem The Hunting of the Snark."
    },

~~~

There are some issues, however, for instance:

~~~
    {
        "question": "Dave Gilmore and Roger Waters were in which rock group?",
        "answer": "Pink Floyd",
        "statement": "Dave Gilmore and Roger Waters were in the rock group Pink Floyd.",
        "fake_answer": "The Beatles",
        "fake_statement": "The Beatles were in the rock group Pink Floyd."
    },
~~~
chatGPT does not seem to be operating on the syntax of the question. A similar kind of error:
~~~
   {
        "question": "Complete this Biblical quotation: 'It is easier for a camel to go through the eye of a needle, than...'",
        "answer": "...for a rich man to enter into the kingdom of God. The words are those of Jesus, from Matthew 19:24",
        "statement": "It is easier for a camel to go through the eye of a needle, than for a rich man to enter into the kingdom of God. The words are those of Jesus, from Matthew 19:24.",
        "fake_answer": "...for a poor man to find true happiness. The words are those of Jesus, from Matthew 19:24",
        "fake_statement": "It is easier for a poor man to find true happiness, than for a rich man to enter into the kingdom of God. The words are those of Jesus, from Matthew 19:24."
    },
~~~
