# WebSearch + inferenceModel for factchecking

That is a quite straightforward way of checking for facts. Steps:
1. do a web search
2. scrape the web sites
3. extract relevant chunks of text
4. determine whether the text supports or denies the fact
5. combine the results from all the text chunks

Some examples of facts to check:
- Mount Kilimanjaro is the highest mountain in Africa
- Route 66 was revisited in a classic 60s album by Bob Dylan.
- Albert Einstein was the last inmate of Spandau jail in Berlin

The urls and texts used to come up with the result are available and this gives this approach a high degree of explainability.
  

In [3]:

import os
import requests
import time
import json
import pandas as pd
import random
import re
from trafilatura import fetch_url, extract, html2txt
from trafilatura.settings import use_config

BINGKEY = os.environ['BING_SEARCH_V7_SUBSCRIPTION_KEY']
BINGURL = os.environ['BING_SEARCH_V7_ENDPOINT']

BASEDIR = os.getcwd()
TESTDIR = os.path.join(BASEDIR, 'testData')
RESULTDIR = os.path.join(BASEDIR, 'testResults/webDeberta')

RETRIES = 3
MAXSLEEP = 5
MINSLEEP = 1
MINTXTLEN = 64
MINURLTXTLEN = 100
NUMURLS = 20
NUMURLMULT = 1.5
WITHTABLES = False

UrlTexts = dict['url': str, 'texts': list[str]]

trafConfig = use_config()
trafConfig.set('DEFAULT', 'EXTRACTION_TIMEOUT', '0')


## Doing the web search

I used the [Bing api](https://learn.microsoft.com/en-us/rest/api/cognitiveservices-bingsearch/bing-web-api-v7-reference) to search for urls given the fact we want to verify. I onlu use the urls and none of the text summaries.


In [71]:

def getUrls(text: str, num: int) -> list[str]:
    params = {'q': text, 'mkt': 'en-US', 'answerCount': 1, 'count': num, 
              'responseFilter': 'Webpages'}
    endpoint = os.path.join(BINGURL, 'v7.0/search')
    headers = {'Ocp-Apim-Subscription-Key': BINGKEY}
    urls = []
    response = None
    for t in range(RETRIES):
        try:
            response = requests.get(endpoint, headers=headers, params=params)
            response.raise_for_status()
            break
        except Exception as e:
            print('Exception in web search\n', e)
            time.sleep(MINSLEEP + random.random()*(MAXSLEEP-MINSLEEP))
    if response is not None:
        for u in response.json()['webPages']['value']:
            urls.append(u['url'])
    return urls

## Scraping the text and keeping the relevant parts

I use [Trafilatura](https://github.com/adbar/trafilatura) to download and scrape the page. I split the text into paragraphs to 1. limit the input to the inference model 2. have more precision in determining the truth of the fact. Not all paragraphs are likely to be relevant to the fact, so I embed the paragraphs using a [sentence-transformer](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) and compute the cosine distance to the fact and only keep the relevant paragraphs.


In [75]:
# get texts and filter out irrelevant ones
from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F
import itertools

MINPARLEN = 64
MINSIMILARITY = 0.5

miniLM_tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
miniLM_model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')


def getText(url: str, withTables: bool = True) -> list[str]:
    page = fetch_url(url)
    webText = extract(page, include_comments=False, include_links=False,
                      include_images=False, include_tables=withTables,
                      config=trafConfig, favor_recall=True)
    if webText is None: webText = ''
    if len(webText) < MINURLTXTLEN:
        webText = extract(page, include_comments=True, include_links=True,
                      include_images=False, include_tables=withTables,
                      config=trafConfig, favor_recall=True)
        if webText is None: webText = ''
    return webText

def getSentenceEmbeddings(sentences: list[str]) -> torch.Tensor:
    encoded = miniLM_tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
    with torch.no_grad():
        modelOutput = miniLM_model(**encoded)
    sentenceEmbeddings = meanPooling(modelOutput, encoded['attention_mask'])
    sentenceEmbeddings = F.normalize(sentenceEmbeddings, p=1, dim=1)
    return sentenceEmbeddings

def meanPooling(embeddings: torch.Tensor, mask: torch.Tensor) -> torch.Tensor:
    tokenEmbeddings = embeddings[0] #First element of model_output contains all token embeddings
    inputMaskExpanded = mask.unsqueeze(-1).expand(tokenEmbeddings.size()).float()
    return torch.sum(tokenEmbeddings * inputMaskExpanded, 1) / torch.clamp(inputMaskExpanded.sum(1), 
                                                                           min=1e-9)

def filterTextStmt(stmtEmb: torch.Tensor,    # statement embedding
                    paragraphs: list[str], # paragraph texts
                    paraEmb: torch.Tensor  # embeddings of the paragraphs
                   ) -> list[str] :        # paragraphs that are similar to the statement
    cosineSims = F.cosine_similarity(stmtEmb, paraEmb)
    #for z in zip(cosineSims.numpy(), paragraphs): print(z)
    fList = (cosineSims > MINSIMILARITY).numpy().tolist()
    relevantParas = list(itertools.compress(paragraphs, fList))
    return relevantParas


def getRelevantParagraphs(stmtEmb: torch.Tensor,   # statement embedding
                          paragraphs: list[str]    # paragraphs from a url
                         ) -> list[str]:           # paragraphs relevant to the statement
    notTooShortParas = [x for x in paragraphs if len(x) > MINPARLEN]
    if len(notTooShortParas) == 0: 
        return []
    pset = set()
    deDuped = []
    dcnt = 0
    for p in notTooShortParas:
        if (ps := p.strip()) not in pset:
            pset.add(ps)
            deDuped.append(ps)
        else:
            dcnt += 1
    paragraphEmbeddings = getSentenceEmbeddings(deDuped)
    relevantParagraphs = filterTextStmt(stmtEmb, deDuped, paragraphEmbeddings)
    return relevantParagraphs

def getRelevantTexts(stmtEmbedding, url):
    webText = getText(url)
    paras = webText.split('\n')
    relevantParas = getRelevantParagraphs(stmtEmbedding, paras)
    return relevantParas


## Finding whether the text supports or denies the statement

Now that I have the bits of text that are relevant to the fact, I run [deberta](https://huggingface.co/cross-encoder/nli-deberta-base) to determine if the texts support, deny or are neutral to the fact.

In [36]:
from transformers import AutoModelForSequenceClassification
TextScore = dict['score': float, 'text': str]

inferenceModel = AutoModelForSequenceClassification.from_pretrained('cross-encoder/nli-deberta-base')
inferenceModel.eval()
inferenceTokenizer = AutoTokenizer.from_pretrained('cross-encoder/nli-deberta-base')
labelMapping = ['contradict', 'entail', 'neutral']

MINMAXSCORE = 0.6

def findSupport(stmt: str,
                paragraphs: list[str]
               ) -> dict['entail': list[TextScore],
                      'contradict': list[TextScore],
                      'neutral': list[TextScore]]:
    if len(paragraphs) == 0: return {'entail': [], 'contradict': [], 'neutral': []}
    conclusions = [stmt] * len(paragraphs)
    results = hasConseq(paragraphs, conclusions)
    return results
                 
def hasConseq(premises: list[str],
              conclusions: list[str]
             ) -> dict['entail': list[TextScore],
                      'contradict': list[TextScore],
                      'neutral': list[TextScore]]:
    results = {'entail': [], 'contradict': [], 'neutral': []}
    # assume there are not too many relevant paragraphs so we do not split
    modelInput = inferenceTokenizer(premises, conclusions, padding=True,
                                    truncation=True, return_tensors='pt')
    with torch.no_grad():
        scores = inferenceModel(**modelInput).logits
        labels = scores.argmax(dim=1)
        maxscores = F.normalize(scores, p=2.0, dim=1).max(dim=1).values.numpy()
    for t in zip(premises, labels, maxscores):      
        if t[2] >= MINMAXSCORE:
            results[labelMapping[t[1]]].append({'text': t[0], 'score': t[2]})
    return results

## Combining all inference results

I now have the list of texts, whether they support or deny the fact and the degree to which they do it. This needs to be combined to generatea support/deny/unknown label and a score.

Not all URLs returned by the search engine are equal.  I assume the ones closer to the top are more relevant to the fact. This is modeled with a power law. I treat all pargraphs in the text equally and add up the scores to get the result.


In [66]:
import math
MINNETSUPPORT = 0.3
POWER = -1.0

def collateResults(allRes: list[dict['url': str, 
                                     'support': dict['entail': list[TextScore],
                                                     'contradict': list[TextScore],
                                                     'neutral': list[TextScore]]]]
                  ) -> dict['status': str, 
                            'score': float, 
                            'entail': list[dict['url': str, 'texts': list[TextScore]]],
                            'contradict': list[dict['url': str, 'texts': list[TextScore]]]]:
    netSupport = 0.0
    supportCount = 0
    collRes = {'entail': [], 'contradict': []}
    ucnt = 0; mSum = 0
    for i, us in enumerate(allRes):
        if POWER < 0:
            mult = math.pow((ucnt+1), POWER)
            for inf in ['entail', 'contradict']:
                for ts in us['support'][inf]:
                    ts['score'] = ts['score'] * mult
        thisSupport, thisCount = 0, 0
        if len(us['support']['entail']) > 0:
            thisSupport += sum([x['score'] for x in us['support']['entail']])
            collRes['entail'].append({'url': us['url'], 'texts': us['support']['entail']})
            supportCount += mult * len(us['support']['entail']) 
        if len(us['support']['contradict']) > 0:
            thisSupport -= sum([x['score'] for x in us['support']['contradict']])
            collRes['contradict'].append({'url': us['url'], 'texts': us['support']['contradict']})
            supportCount += mult * len(us['support']['contradict'])
        if (thisCount := len(us['support']['entail']) + len(us['support']['contradict'])) > 0: 
            ucnt += 1
            netSupport += mult * (thisSupport/thisCount)
            mSum += mult
    if mSum > 0: 
        netSupport /= mSum
    if supportCount == 0:
        collRes['status'] = 'unknown'
        collRes['score'] = 0
        return collRes
    if netSupport > MINNETSUPPORT:
        collRes['status'] = True
        collRes['score'] = netSupport
    elif netSupport < - MINNETSUPPORT:
        collRes['status'] = False
        collRes['score'] = -netSupport
    else:
        collRes['status'] = 'unknown'
        collRes['score'] = 0
    return collRes 

## Assembling the parts

Simply assembling the parts above.

In [70]:

def wwwFactCheck(stmt: str, num: int=10, minNum: int=3):
    urls = getUrls(stmt, num)
    stmtEmbedding = getSentenceEmbeddings([stmt])
    i, procUrl, status = 0, 0, 'unknown'
    allEvidence = []
    while i < len(urls) and (procUrl < minNum or status == 'unknown'):
        url = urls[i]
        relevantTexts = getRelevantTexts(stmtEmbedding, url)
        support = findSupport(stmt, relevantTexts)
        allEvidence.append({'url': url, 'support': support})
        if len(support['entail']) > 0 or len(support['contradict']) > 0:
            procUrl += 1
        if len(allEvidence) >= minNum:
            fcResults = collateResults(allEvidence)
            status = fcResults['status']
        i += 1
    fcResults['numUrls'] = procUrl
    return fcResults


## Running on a test file

THis is run on files generated by gen_falseStmts. The result is a csv file with columns: question, true answer, computed answer and whether the true answer is correct. In addition, a log file is generated with details of the texts and urls used to obtain the result.

In [None]:

def testFactCheck(inFile: str, selectFile: str, outFile: str, logFile: str = None,
                  start: int = 0, number: int = -1):
    """
    inFile: json list of dicts contaning 'question' and 'answer'
    selectFile: json list of true/false as to which version to verify
    outFile: csv of question, selection, verification, match?
    """
    print('Start factcheck')
    results = []
    dc = 0
    ret = re.compile('.*3\..*True.*', re.DOTALL)
    ref = re.compile('.*3\..*False.*', re.DOTALL)
    with open(inFile, 'r') as ix:
        testData = json.load(ix)
    with open(selectFile, 'r') as sx:
        selData = json.load(sx)
    if logFile is not None:
        log = open(logFile, 'w')
    for i, inst in enumerate(testData):
        if i < start: continue
        if number > 0 and dc >= number: break
        try:
            stmt = inst['statement'] if selData[i] else inst['fake_statement']
            print(stmt)
            sleepTime = MINSLEEP + random.random() * (MAXSLEEP - MINSLEEP)
            time.sleep(sleepTime)
            fcResult = wwwFactCheck(stmt)
            fc = fcResult['status']
            dc += 1
            results.append([stmt, selData[i], fc, selData[i]==fc])
            if logFile is not None:
                log.write(f"{i}: {stmt}\n{json.dumps(fcResult, indent=2)}\n\n")
            print(dc, ': ', str(results[-1]))
            if dc % 25 == 0:
                df = pd.DataFrame(results, columns=['question', 'trueAnswer', 
                                                    'webAnswer', 'correct?'])
                df.to_csv(outFile, index=False, header=True)
        except Exception as e:
            print('Error in ', i, ': ', inst['question'], '\n', e)
            if logFile is not None:
                log.write(f"Error in {i}. {inst['question']}\n{str(e)}")
    if logFile is not None: log.close()
    df = pd.DataFrame(results, columns=['question', 'trueAnswer', 'webAnswer', 'correct?'])
    df.to_csv(outFile, index=False, header=True)

testFactCheck(os.path.join(TESTDIR, 'tf_qa-dev.json'),
              os.path.join(TESTDIR, 'tf_qa-dev_select1.json'),
              os.path.join(RESULTDIR, 'triviaQA_dev.csv'),
              os.path.join(RESULTDIR, 'triviaQA_dev.log'),
              0,
              -1,
             )


## Results



### Examples

#### Example 1
- True fact: Nigel Hawthorne was Oscar nominated for The Madness of George.
- Computed answer: True

Details:
~~~
{
  "entail": [
    {
      "url": "https://en.wikipedia.org/wiki/The_Madness_of_King_George",
      "texts": [
        {
          "text": "The Madness of King George won the BAFTA Awards in 1995 for Outstanding British Film and Best Actor in a Leading Role for Nigel Hawthorne, who was also nominated for the Academy Award for Best Actor. The film won the Oscar for Best Art Direction and was also nominated for Oscars for Best Supporting Actress for Mirren and Best Adapted Screenplay. Helen Mirren also won the Cannes Film Festival Award for Best Actress and Hytner was nominated for the Palme d'Or.",
          "score": 0.6066892147064209
        }
      ]
    },
    {
      "url": "https://www.latimes.com/archives/la-xpm-2001-dec-27-me-18351-story.html",
      "texts": [
        {
          "text": "Sir Nigel Hawthorne, an award-winning stage actor who received a best actor Oscar nomination for his vivid portrayal of the title role in the 1994 film \u201cThe Madness of King George,\u201d has died. He was 72.",
          "score": 0.044708751142024994
        },
        {
          "text": "Hawthorne obituary--A reference to the obituary of actor Nigel Hawthorne on the front of Thursday\u2019s California section noted incorrectly that he had won an Academy Award for \u201cThe Madness of King George.\u201d In fact, Hawthorne was nominated but did not win in the best actor category.",
          "score": 0.04643946513533592
        }
      ]
    },
    {
      "url": "https://www.imdb.com/title/tt0115177/",
      "texts": [
        {
          "text": "Oscar\u00ae nominee Nigel Hawthorne (The Madness of King George) stars as a renowned surgeon in the midst of a personal and professional crisis in this compelling miniseries.Oscar\u00ae nominee Nigel Hawthorne (The Madness of King George) stars as a renowned surgeon in the midst of a personal and professional crisis in this compelling miniseries.Oscar\u00ae nominee Nigel Hawthorne (The Madness of King George) stars as a renowned surgeon in the midst of a personal and professional crisis in this compelling miniseries.",
          "score": 0.17288510501384735
        }
      ]
    }
  ],
  "contradict": [
    {
      "url": "https://www.latimes.com/archives/la-xpm-2001-dec-27-me-18351-story.html",
      "texts": [
        {
          "text": "He followed that triumph by sweeping his country\u2019s major theatrical awards for his starring role in \u201cThe Madness of George III,\u201d Alan Bennett\u2019s prize-winning play about the mentally tormented 18th-century British monarch.",
          "score": 0.04210841655731201
        }
      ]
    },
    {
      "url": "https://www.theguardian.com/film/2010/feb/18/madness-king-george-alan-bennett-nigel-hawthorne",
      "texts": [
        {
          "text": "A triumph. Shockingly, Nigel Hawthorne lost the Oscar to Tom Hanks for Forrest Gump. Since that makes even less sense than mistaking an oak tree for the King of Prussia, perhaps it was a final act of revenge by what the film calls those \"ramshackle colonists in America\" on their last and unlamented king.",
          "score": 0.08574755986531575
        }
      ]
    }
  ],
  "status": true,
  "score": 0.30216061005989714,
  "numUrls": 4
}

~~~
- The outcome is dominated by the wikipedia entry.


### Example 2

- True fact: Fiddler on the Roof is the stage show from which 'If I Were A Rich Man' was a big hit.
- Computed answer: False

Details of the result:
~~~
{
  "entail": [],
  "contradict": [
    {
      "url": "https://www.mentalfloss.com/article/65530/12-things-you-might-not-know-about-fiddler-roof",
      "texts": [
        {
          "text": "1. Fiddler on the Roof Was Based on a Series of Stories Written by \"The Jewish Mark Twain.\"",
          "score": 0.6259282827377319
        }
      ]
    }
  ],
  "status": false,
  "score": 0.6259282827377319,
  "numUrls": 1
}
~~~

In this case, only one url was found relevant and it was in fact irrelevant to the fact to be checked.

### Example 3

- True statement: Laius, King of Thebes, and his queen, Jocasta were the names of Oedipus's parents.
- Computed answer Unknown
- Details:
~~~
59: Laius, King of Thebes, and his queen, Jocasta were the names of Oedipus's parents.
{
  "entail": [
    {
      "url": "https://en.wikipedia.org/wiki/Jocasta",
      "texts": [
        {
          "text": "In Greek mythology, Jocasta (/d\u0292o\u028a\u02c8k\u00e6st\u0259/), also rendered Iocaste[1] (Ancient Greek: \u1f38\u03bf\u03ba\u03ac\u03c3\u03c4\u03b7 Iok\u00e1st\u0113 [i.ok\u00e1st\u025b\u02d0]) and also known as Epicaste (/\u02cc\u025bp\u026a\u02c8k\u00e6sti\u02d0/; \u1f18\u03c0\u03b9\u03ba\u03ac\u03c3\u03c4\u03b7 Epik\u00e1st\u0113[2]), was a daughter of Menoeceus, a descendant of the Spartoi Echion,[3] and queen consort of Thebes. She was the wife of first Laius, then of their son Oedipus, and both mother and grandmother of Antigone, Eteocles, Polynices and Ismene. She was also sister of Creon and mother-in-law of Haimon.",
          "score": 0.7372725009918213
        }
      ]
    },
    {
      "url": "https://www.thecollector.com/oedipus-rex-summary-story-breakdown/",
      "texts": [
        {
          "text": "For Oedipus Rex, his string of Fate had some terrors woven into it. When he was born, his parents were told a prophecy that their son would grow up to kill his father, Laius. Laius and his wife Jocasta were the King and Queen of Thebes. Horrified at this prophecy of patricide, the parents decided to abandon the baby.",
          "score": 0.0003306630216998817
        }
      ]
    },
    {
      "url": "https://en.wikipedia.org/wiki/Oedipus",
      "texts": [
        {
          "text": "Oedipus was the son of Laius and Jocasta, king and queen of Thebes. Having been childless for some time, Laius consulted the Oracle of Apollo at Delphi. The Oracle prophesied that any son born to Laius would kill him. In an attempt to prevent this prophecy's fulfillment, when Jocasta indeed bore a son, Laius had his son's ankles pierced and tethered together so that he could not crawl; Jocasta then gave the boy to a servant to abandon (\"expose\") on the nearby mountain. However, rather than leave the child to die of exposure, as Laius intended, the servant passed the baby on to a shepherd from Corinth, who then gave the child to another shepherd.",
          "score": 0.0005718348202881988
        }
      ]
    },
    {
      "url": "https://www.encyclopedia.com/literature-and-arts/classical-literature-mythology-and-folklore/folklore-and-mythology/oedipus",
      "texts": [
        {
          "text": "Oedipus was born to King Laius (pronounced LAY-uhs) and Queen Jocasta (pronounced joh-KAS-tuh) of Thebes (pronounced THEEBZ). The oracle at Delphi (pronounced DEL-fye), who could communicate direcdy with the gods, told Laius and Jocasta that their child would grow up to murder Laius and marry Jocasta. Horrified, the king fastened the infant's feet together with a large pin and left him on a mountainside to die.",
          "score": 0.08248225185606214
        }
      ]
    }
  ],
  "contradict": [
    {
      "url": "https://en.wikipedia.org/wiki/Jocasta",
      "texts": [
        {
          "text": "Jocasta handed the newborn infant over to Laius. Jocasta or Laius pierced and pinned the infant's ankles together. Laius instructed his chief shepherd, Menoetes (not to be confused with Menoetes, the underworld spirit) a slave who had been born in the palace, to expose the infant on Cithaeron and leave it to die. Laius' shepherd took pity on the infant and gave him to another shepherd in the employ of King Polybus of Corinth. Childless, Polybus and his queen, Merope of Corinth (according to Sophocles, or Periboea according to Pseudo-Apollodorus), raised the infant to adulthood.[4]",
          "score": 0.7989310026168823
        },
        {
          "text": "Oedipus grew up in Corinth under the assumption that he was the biological son of Polybus and his wife. Hearing rumors about his parentage, he consulted the Delphic Oracle. Oedipus was informed by the Oracle that he was fated to kill his father and to marry his mother. Fearing for the safety of the only parents known to him, Oedipus fled from Corinth before he could commit these sins. During his travels, Oedipus encountered Laius on a narrow pass at Phocis. After a heated argument regarding right-of-way, Oedipus killed Laius, unknowingly fulfilling the first half of the prophecy. Oedipus continued his journey to Thebes and discovered that the city was being terrorized by the sphinx. Oedipus solved the sphinx's riddle, and the grateful city, along with the acting regent Creon, elected Oedipus as its new king. Oedipus accepted the throne and married Laius' widowed queen Jocasta, Oedipus\u2019 actual mother, thereby fulfilling the second half of the prophecy. Jocasta bore her son's four children: two girls, Antigone and Ismene, and two boys, Eteocles and Polynices.",
          "score": 0.6485769748687744
        },
        {
          "text": "- Sophocles, Sophocles. Vol 1: Oedipus the king. Oedipus at Colonus. Antigone. With an English translation by F. Storr. The Loeb classical library, 20. Francis Storr. London; New York. William Heinemann Ltd.; The Macmillan Company. 1912. Greek text available at the Perseus Digital Library.",
          "score": 0.6379601955413818
        }
      ]
    },
    {
      "url": "https://www.britannica.com/topic/Oedipus-Greek-mythology",
      "texts": [
        {
          "text": "According to one version of the story, Laius, king of Thebes, was warned by an oracle that his son would slay him. Accordingly, when his wife, Jocasta (Iocaste; in Homer, Epicaste), bore a son, he had the baby exposed (a form of infanticide) on Cithaeron. (Tradition has it that his name, which means \u201cSwollen-Foot,\u201d was a result of his feet having been pinned together, but modern scholars are skeptical of that etymology.) A shepherd took pity on the infant, who was adopted by King Polybus of Corinth and his wife and was brought up as their son. In early manhood Oedipus visited Delphi and upon learning that he was fated to kill his father and marry his mother, he resolved never to return to Corinth.",
          "score": 0.0032469015568494797
        },
        {
          "text": "Traveling toward Thebes, he encountered Laius, who provoked a quarrel in which Oedipus killed him. Continuing on his way, Oedipus found Thebes plagued by the Sphinx, who put a riddle to all passersby and destroyed those who could not answer. Oedipus solved the riddle, and the Sphinx killed herself. In reward, he received the throne of Thebes and the hand of the widowed queen, his mother, Jocasta. They had four children: Eteocles, Polyneices, Antigone, and Ismene. Later, when the truth became known, Jocasta committed suicide, and Oedipus (according to another version), after blinding himself, went into exile, accompanied by Antigone and Ismene, leaving his brother-in-law Creon as regent. Oedipus died at Colonus near Athens, where he was swallowed into the earth and became a guardian hero of the land.",
          "score": 0.0032663613092154264
        }
      ]
    },
    {
      "url": "https://www.thecollector.com/oedipus-rex-summary-story-breakdown/",
      "texts": [
        {
          "text": "Laius, the father of Oedipus and first husband of Jocasta, had made some bad choices in his early years as a young man. These actions caused a curse to be placed upon Laius and his descendants. Laius had two brothers, and not much is known about Laius\u2019 mother, but his father, Labdacus, was King of Thebes. Labdacus died when his sons were very young, and so Lycus became their guardian and also the regent of Thebes.",
          "score": 0.00034624839624859276
        },
        {
          "text": "Jocasta, the wife (and mother) of Oedipus Rex, at first told Oedipus to ignore the \u201cmad ravings\u201d of the prophet, but then she tells Oedipus about the prophecy about her son who was fated to kill his father and marry his mother. She hopes these words will comfort Oedipus, but in fact they have the opposite effect. Oedipus slowly comes to realize the truth\u2026",
          "score": 0.00036687427423945493
        },
        {
          "text": "Jocasta could not live with the truth, and so she took her own life. Oedipus decided to inflict punishment on himself to protect the people of Thebes and he gouged his own eyes out. The end of Sophocles\u2019 play was indeed gruesome.",
          "score": 0.0002744449673039104
        }
      ]
    },
    {
      "url": "https://en.wikipedia.org/wiki/Oedipus_Rex",
      "texts": [
        {
          "text": "Two oracles in particular dominate the plot of Oedipus Rex. Jocasta relates the prophecy that was told to Laius before the birth of Oedipus (lines 711\u20134):",
          "score": 0.00016070202400442213
        },
        {
          "text": "- ^ Theodoridis, G. (2005). Oedipus Rex (Oedipus Tyrannus, Tyrannos, King, Vasileus) \u039f\u03b9\u03b4\u03af\u03c0\u03bf\u03c5\u03c2 \u03a4\u03cd\u03c1\u03b1\u03bd\u03bd\u03bf\u03c2. Retrieved from Bacchicstage: https://bacchicstage.wordpress.com/sophocles/oedipus-rex/ Note: this source is assumed as reliable, as it is provided in Powell (2015), a university-course-level textbook.",
          "score": 0.00014714800636284053
        }
      ]
    },
    {
      "url": "https://www.sparknotes.com/drama/oedipus/character/jocasta/",
      "texts": [
        {
          "text": "Jocasta, who only appears in Oedipus the King, is both Oedipus\u2019s mother and wife, as well as Creon\u2019s sister. Having served as the Queen of Thebes for many years, she believes herself to be well aware of the events surrounding her first-born son\u2019s death, Laius\u2019s murder, and Oedipus\u2019s ascension to the throne. She initially appears as a mediator between Oedipus and Creon as the two argue over Tiresias\u2019s visions, and she tries to convince her husband to dismiss the prophecies by telling him the story of an oracle that wrongly predicted Laius\u2019s murder at the hands of his son. Jocasta speaks with a confident attitude in this moment, one which highlights her belief in the power of human behavior over the will of the gods. Even as Oedipus explains to her that he too received a similar prophecy stating that he would kill his father and marry his mother, she maintains that nothing can be predicted and tells him not to worry. This behavior reveals her hubris and suggests that from her position of authority, she feels protected from the tales that others tell about her fate.",
          "score": 0.00022836444854736337
        }
      ]
    },
    {
      "url": "https://en.wikipedia.org/wiki/Oedipus",
      "texts": [
        {
          "text": "In the best-known version of the myth, Oedipus was born to King Laius and Queen Jocasta of Thebes. Laius wished to thwart the prophecy, so he sent a shepherd-servant to leave Oedipus to die on a mountainside. However, the shepherd took pity on the baby and passed him to another shepherd who gave Oedipus to King Polybus and Queen Merope to raise as their own. Oedipus learned from the oracle at Delphi of the prophecy that he would end up killing his father and marrying his mother but, unaware of his true parentage, believed he was fated to murder Polybus and marry Merope, and so he left for Thebes. On his way, he met an older man and killed him in a quarrel. Continuing on to Thebes, he found that the king of the city (Laius) had recently been killed and that the city was at the mercy of the Sphinx. Oedipus answered the monster's riddle correctly, defeating it and winning the throne of the dead king \u2013 and the hand in marriage of the king's widow, who was also (unbeknownst to him) his mother Jocasta.",
          "score": 0.0004737162387665406
        },
        {
          "text": "Years later, to end a plague on Thebes, Oedipus searched to find who had killed Laius and discovered that he himself was responsible. Jocasta, upon realizing that she had married her own son, hanged herself. Oedipus then seized two pins from her dress and blinded himself with them.",
          "score": 0.0005539536108205347
        },
        {
          "text": "Variations on the legend of Oedipus are mentioned in fragments by several ancient Greek poets including Homer, Hesiod, Pindar, Aeschylus and Euripides. However, the most popular version of the legend comes from the set of Theban plays by Sophocles: Oedipus Rex, Oedipus at Colonus, and Antigone.",
          "score": 0.0005049973174377723
        },
        {
          "text": "The infant Oedipus eventually came to the house of Polybus, king of Corinth, and his queen, Merope, who adopted him, as they were without children of their own. Little Oedipus was named after the swelling from the injuries to his feet and ankles (\"swollen foot\"). The word \"oedema\" (British English) or \"edema\" (American English) is from this same Greek word for swelling: \u03bf\u1f34\u03b4\u03b7\u03bc\u03b1, or oed\u0113ma.",
          "score": 0.0006237451309039268
        },
        {
          "text": "On the way, Oedipus came to Davlia, where three roads crossed. There he encountered a chariot driven by his birth-father, King Laius. They fought over who had the right to go first and Oedipus killed Laius when the charioteer tried to run him over. The only witness of the king's death was a slave who fled from a caravan of slaves also traveling on the road at the time.",
          "score": 0.00046437164699589755
        },
        {
          "text": "Queen Jocasta's brother, Creon, had announced that any man who could rid the city of the Sphinx would be made king of Thebes and given the recently widowed Queen Jocasta's hand in marriage. This marriage of Oedipus to Jocasta fulfilled the rest of the prophecy. Oedipus and Jocasta had four children: sons Eteocles and Polynices (see Seven Against Thebes) and daughters Antigone and Ismene.",
          "score": 0.0006061963461063526
        },
        {
          "text": "King Laius of Thebes hears of a prophecy that his infant son will one day kill him.[2] He pierces Oedipus' feet and leaves him out to die, but a shepherd finds him and carries him away.[3] Years later, Oedipus, not knowing he was adopted, leaves home in fear of the same prophecy that he will kill his father and marry his mother.[4] Laius journeys out to seek a solution to the Sphinx's mysterious riddle.[5] As prophesied, Oedipus and Laius cross paths, but they do not recognize each other. A fight ensues, and Oedipus kills Laius and most of his guards.[6] Oedipus goes on to defeat the Sphinx by solving a riddle to become king.[7] He marries the widowed Queen Jocasta, unaware that she is his mother. A plague falls on the people of Thebes. Upon discovering the truth, Oedipus blinds himself, and Jocasta hangs herself.[8] After Oedipus is no longer king, Oedipus's brother-sons kill each other.",
          "score": 0.0006277196477224795
        },
        {
          "text": "Overwhelmed with the knowledge of all his crimes, Oedipus rushes into the palace where he finds his mother-wife, dead by her own hand. Ripping a brooch from her dress, Oedipus blinds himself with it. Bleeding from the eyes, he begs his uncle and brother-in-law Creon, who has just arrived on the scene, to exile him forever from Thebes. Creon agrees to this request. Oedipus begs to hold his two daughters Antigone and Ismene with his hands one more time to have their eyes full of tears and Creon out of pity sends the girls in to see Oedipus one more time.",
          "score": 0.0005260834723343083
        },
        {
          "text": "- ^ a b Sophocles. Sophocles I: Oedipus the King, Oedipus at Colonus, Antigone. 2nd ed. Grene, David and Lattimore, Richard, eds. Chicago: University of Chicago, 1991. pp. 1\u20132.",
          "score": 0.0005413027549231493
        }
      ]
    },
    {
      "url": "https://mythology.net/greek/mortals/oedipus/",
      "texts": [
        {
          "text": "King Lauis had difficulty believing the prophecy but when Queen Jocasta finally did get pregnant, he decided to take fate into his own hands. He commanded his servants pierce holes into his baby\u2019s ankles, preventing him from crawling. This is how his name was chosen, as Oedipus means \u201cswollen foot\u201d. Queen Jocasta then commanded that the baby be brought to the mountains to die. Oedipus was handed to a shepherd, but the man was unable to bring himself to kill the child. Instead, he gave Oedipus to another shepherd, who brought him to King Polybus and Queen Merope of Corinth.",
          "score": 0.0023366986141260438
        }
      ]
    },
    {
      "url": "https://kids.britannica.com/students/article/Oedipus/276164",
      "texts": [
        {
          "text": "In horror Oedipus put out his eyes, while his mother hanged herself. A blind and helpless outcast, Oedipus wandered away with his faithful daughter Antigone. She cared for him until he died. The Greek dramatist Sophocles told the story of Oedipus and his children in the great trilogy of Oedipus Rex, Oedipus at Colonus, and Antigone.",
          "score": 0.01057521440088749
        }
      ]
    },
    {
      "url": "https://www.encyclopedia.com/literature-and-arts/classical-literature-mythology-and-folklore/folklore-and-mythology/oedipus",
      "texts": [
        {
          "text": "Oedipus and Jocasta lived happily for a time and had two sons and two daughters. Then a dreadful plague came upon Thebes. A prophet declared that the plague would not end until the Thebans drove out Laius's murderer, who was within the city. A messenger then arrived from Corinth, announcing the death of King Polybus and asking Oedipus to return and rule the Corinthians. Oedipus told Jocasta what the oracle had predicted for him and expressed relief that the danger of his murdering Polybus was past. Jocasta told him not to fear oracles, for the oracle had said that her first husband would be killed by his own son, and instead he had been murdered by a stranger on the road to Delphi.",
          "score": 0.08617854118347168
        },
        {
          "text": "Suddenly Oedipus remembered that fatal encounter on the road and knew that he had met and killed his real father, Laius. At the same time, Jocasta realized that the scars on Oedipus's feet marked him as the baby whose feet Laius had pinned together so long ago. Faced with the fact that she had married her own son and the murderer of Laius, she hanged herself. Oedipus seized a pin from her dress and blinded himself with it.",
          "score": 0.07412354813681708
        },
        {
          "text": "The Myth. The story begins with a son born to King Laius and Queen Jocasta of Thebes*. The oracle at Delphi* told them that their child would grow up to murder Laius and marry Jocasta. Horrified, the king fastened the infant's feet together with a large pin and left him on a mountainside to die.",
          "score": 0.08844863706164889
        },
        {
          "text": "Oedipus and Jocasta lived happily for a time and had two sons and two daughters. Then a dreadful plague came upon Thebes. A prophet declared that the plague would not end until the Thebans drove out the murderer of Laius, who was within the city. A messenger then arrived from Corinth, announcing the death of King Polybus and asking Oedipus to return and rule the Corinthians. Oedipus told Jocasta what the oracle had predicted for him and expressed relief that the danger of his murdering Polybus was past. Jocasta told him not to fear oracles, for the oracle had said that her first husband would be killed by his own son, and instead he had been murdered by a stranger on the road to Delphi.",
          "score": 0.0843855341275533
        }
      ]
    }
  ],
~~~
- Even though multiple urls are found relevant only the wikipedia article was significant because of the weights applied
- The inference models sometimes makes unsupported inferences
- There should be a within-document weighting of paragraphs - those at the beginning and at the end of the text are typically more significant.

### Computing the confusion matrix

In [9]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_recall_fscore_support
import numpy as np
import os

def getConfusionMatrix(inFile):
    df = pd.read_csv(inFile, index_col=None, dtype=str)
    preds = df['webAnswer'].tolist()
    trues = df['trueAnswer'].tolist()
    labels = [True, False, 'unknown']
    p,r,f,s = precision_recall_fscore_support(trues, preds, labels=labels, average=None,
                                             zero_division=np.nan)
    prf = np.array([p[:2], r[:2], f[:2]])
    print('Precision, recall, F1 for True, False')
    print(prf)
    print('\nConfusion matrix:')
    cm = confusion_matrix(trues, preds, labels=labels)
    print(cm)

getConfusionMatrix(os.path.join(RESULTDIR, 'triviaQA_dev.csv'))

Precision, recall, F1 for True, False
[[0.8358209  0.64876033]
 [0.28       0.77339901]
 [0.41947566 0.70561798]]

Confusion matrix:
[[ 56  85  59]
 [ 11 157  35]
 [  0   0   0]]


- The 'True' performance is very poor

Potential improvements:
- tweaking the parameters, for instance the thresholds for true/unknown/false, number of urls, rate of decay etc.
- using a better inference model, like a LLM
