# Table of contents 

- [Purpose](#purpose)
    - [Libraries](#libraries) 
- [Ten randomly selected articles](#)
- [Quality assurance (QA)](#qualityassurance)
    - [Preprocessing](#preprocessing)
    - [Metrics](#metrics) 
- [Extractor evaluation](#extractorevaluation)
    - [E-URLs](#eurls)
    - [E-Sentences](#esentences)
- [Code evaluation](#codeevaluation)
    - [C-URLs](#curls)
    - [C-Sentences](#csentences)
- [Overview](#overview)
- [References](#references)

<a name='purpose'></a>
# Purpose

This notebook evaluates and compares the extraction of URLs and sentences performed by a human (called Extractor 1) and the code implemented in 'Code/datasets_v5'. The data that is compared is: 
- 'Data/URL_extraction_QA/groundtruth.csv': This data was extracted manually by a human.
- 'Data/URL_extraction_QA/extractor1.csv': This data was extracted manually by a human.
- 'Data/articles_filtered_urls.csv': This data was extracted automatically by the code implemented in 'Code/datasets_v5'.


In [1]:
import pandas as pd
import numpy as np

import json 
import os 
import re 
import io

# Random 
import random

# Difflib to find difference between strings 
import difflib 

# 1. Ten randomly selected articles 

I want to manually extract the URLs and sentences containing URLs from 10 randomly selected research papers. First, I need to get the research papers and make sure they are not a part of the NeuroImage groundtruth set ('Data/articles_groundtruth.csv'). 

In [2]:
# Path to the groundtruth data directory
groundtruth_path = os.path.join(os.pardir, 'Data/articles_groundtruth_urls_and_sentences.csv')
groundtruth_articles = pd.read_csv(groundtruth_path)

In [3]:
groundtruth_dois = groundtruth_articles['DOI'].unique()

In [4]:
def get_random_dois(json_file_path, num_samples, dois_to_exclude=None, random_seed=40):
    """
    Get a list of random DOIs from a JSON file while ensuring that they are not in the groundtruth DOI list.

    Parameters:
    json_file_path (str): The path to the JSON file containing DOI data.
    num_samples (int): The number of random DOIs to sample.
    dois_to_exclude (list, optional): A list of DOIs to exclude from the random sampling. Default is None.
    random_seed (int, optional): Seed for the random number generator for reproducibility. Default is 40.

    Returns:
    list: A list of unique random DOIs not in the groundtruth DOI list.
    """
    # Set the random seed for reproducibility
    random.seed(random_seed)

    # Load the DOI data from the JSON file
    with open(json_file_path, 'r') as json_file:
        doi_data = json.load(json_file)
        doi_list = doi_data['DOIs']

    doi_list = [doi for doi in doi_list if doi not in groundtruth_dois]

    # Ensure the number of requested samples does not exceed available DOIs
    num_samples = min(num_samples, len(doi_list))

    # Get a sample of DOIs
    random_dois = random.sample(doi_list, num_samples)

    return random_dois

In [5]:
# Path to the JSON file containing DOI values
json_file_path = '../Data/ElsevierAPI/downloadedPDFs_info.json'
samples = 12

# Get 12 random DOIs 
random_dois = get_random_dois(json_file_path, samples, groundtruth_dois)

random_dois

['10.1016/j.neuroimage.2022.119254',
 '10.1016/j.neuroimage.2022.119133',
 '10.1016/j.neuroimage.2022.119360',
 '10.1016/j.neuroimage.2022.119742',
 '10.1016/j.neuroimage.2022.119294',
 '10.1016/j.neuroimage.2022.118992',
 '10.1016/j.neuroimage.2022.119077',
 '10.1016/j.neuroimage.2021.118868',
 '10.1016/j.neuroimage.2022.118960',
 '10.1016/j.neuroimage.2022.119769',
 '10.1016/j.neuroimage.2022.119048',
 '10.1016/j.neuroimage.2022.119199']

I will save all of the PDFs to these articles and share them with the person who is going to function as my quality control. 
The PDFs were shared in a folder alongside an excel sheet, which the extractor filled out in the following manner: 

| Title          | Description      |
| --------------- | ---------------- |
| DOI            | The article's DOI                 |
| URL            | The URL. It can be start with http, www, or simply be .com or similar. It has to occur in the text, i.e., from title to "References". If there are no URLs, leave this field and the others blank.                 |
| Sentence(s)    | Copy of the full sentence(s) that contain the URL. Put quotations (either "" or '') around. If there are multiple sentences, put a comma between them.                 |

---

The extractor and I extracted the URLs and sentences together from 10.1016/j.neuroimage.2022.119254 and 10.1016/j.neuroimage.2022.119133, and we extracted the URLs and sentences from the other ten articles individually. 

<a name='qualityassurance'></a>
# Quality assurance (QA)

<a name='preprocessing'></a>
## Preprocessing
I will perform some simple cleaning of the manually extracted data (remove certain characters and extra whitespace)

In [6]:
def clean_URLs(URL):
    """This function removes spaces, trailing spaces, \n characters, and dots that might be in manually extracted links.
    """ 
    if not pd.isna(URL):
        return URL.replace("- ", "-").replace("-  ", "-").replace(" -", "-").replace("  -", "-").replace(". ", ".").replace(" /", "/").replace("/ ", "/").replace('\n', '').strip().strip('.')
    else: 
        return np.nan

def clean_and_flatten(sentence):
    """This function performs cleaning and flattening steps on the input sentence.
    It replaces double quotes (") with single quotes (').
    Removes extra whitespace.
    Removes spaces after hyphens (-).
    Removes newline characters (\n).
    Splits the sentence into words.
    Joins the words into a single sentence.
    """
    if not pd.isna(sentence):  # Check if the value is not NaN
        sentence = sentence.replace('"', "'")
        sentence = ' '.join(sentence.split())
        sentence = sentence.replace(' -', '-').replace('- ', '-').replace('\n', ' ').replace(' .', '.').replace("'", '')
        return re.sub('  +', ' ', sentence)  # Remove extra spaces (more than one space becomes a single space)
    else:
        return ''

In [7]:
# Read manual data from the CSV files 
path_manual = os.path.join(os.pardir, 'Data/URL_extraction_QA')
together = 'together.csv'
extractor1 = 'extractor1.csv'
groundtruth = 'groundtruth.csv'

manual_together = pd.read_csv(os.path.join(path_manual, together))
manual_e1 = pd.read_csv(os.path.join(path_manual, extractor1))
manual_gt = pd.read_csv(os.path.join(path_manual, groundtruth))

# Read automatic data 
path_automatic = os.path.join(os.pardir, 'Data/articles_filtered_urls.csv')
automatic_urls = pd.read_csv(path_automatic)
automatic_df = automatic_urls[automatic_urls['DOI'].isin(random_dois)]

# Sort the dataframes 
manual_together = manual_together.sort_values(by=['DOI', 'URL'])
manual_together.reset_index(drop=True, inplace=True)
manual_e1 = manual_e1.sort_values(by=['DOI', 'URL'])
manual_e1.reset_index(drop=True, inplace=True)
manual_gt = manual_gt.sort_values(by=['DOI', 'URL'])
manual_gt.reset_index(drop=True, inplace=True)
automatic_df = automatic_df.sort_values(by=['DOI', 'URL'])
automatic_df.reset_index(drop=True, inplace=True)

In [8]:
manual_e1.loc[20].values

array(['10.1016/j.neuroimage.2022.119360',
       'https://balsa.wustl.edu/jjnjZ.',
       'Methods sections 2.2 and 2.5 describe the participants and data, and preprocessing is described in methods sections 2.6 and 2.9. https://balsa.wustl.\nedu/jjnjZ.'],
      dtype=object)

In [9]:
manual_gt.loc[23].values

array(['10.1016/j.neuroimage.2022.119360',
       'https://balsa.wustl.edu/jjnjZ',
       '"https://balsa.wustl.edu/jjnjZ."'], dtype=object)

In [10]:
# Apply the cleaning functions to the DataFrame
manual_together['URL'] = manual_together['URL'].apply(clean_URLs)
manual_together['Sentence(s)'] = manual_together['Sentence(s)'].apply(clean_and_flatten)

manual_e1['URL'] = manual_e1['URL'].apply(clean_URLs)
manual_e1['Sentence(s)'] = manual_e1['Sentence(s)'].apply(clean_and_flatten)

manual_gt['URL'] = manual_gt['URL'].apply(clean_URLs)
manual_gt['Sentence(s)'] = manual_gt['Sentence(s)'].apply(clean_and_flatten)

In [11]:
manual_e1.loc[20].values

array(['10.1016/j.neuroimage.2022.119360',
       'https://balsa.wustl.edu/jjnjZ',
       'Methods sections 2.2 and 2.5 describe the participants and data, and preprocessing is described in methods sections 2.6 and 2.9. https://balsa.wustl. edu/jjnjZ.'],
      dtype=object)

In [12]:
manual_gt.loc[23].values

array(['10.1016/j.neuroimage.2022.119360',
       'https://balsa.wustl.edu/jjnjZ', 'https://balsa.wustl.edu/jjnjZ.'],
      dtype=object)

In [13]:
# Rename column in automatic_df to match column names from the manual extraction  
automatic_df.rename(columns={'Sentences': 'Sentence(s)'}, inplace=True)

automatic_df['Sentence(s)'] = automatic_df['Sentence(s)'].str.strip('[]')
automatic_df['Sentence(s)'] = automatic_df['Sentence(s)'].apply(clean_and_flatten)

<a name='metrics'></a>
## Metrics 
I will evaluate the performance of the human extractor and the code based on the **precision**, **recall**, and **F-1 score** (Géron 2019). 
To calculate precision, recall, and F1-score, I will count the following:
- True Positives (TP): The number of correct extractions by the extractor that matches the ground truth's extractions.
- False Positives (FP): The number of extractions by the extractor that does not match the ground truth (incorrect extractions).
- False Negatives (FN): The number of extractions in the ground truth that were not extracted (missed extractions).

**Precision** measures the accuracy of the extractor's and code's results and is calculated in the following manner: 

$$precision = TP / (TP + FP)$$

**Recall** measures the completeness of the extractor's and code's results and is calculated in the following manner: 

$$recall = TP / (TP + FN)$$

**F1-score** provides the balance between precision and recall and is calculated in the following manner: 

$$f1 = 2 * (precision * recall) / (precision + recall)$$ 


I will also include a metric called 'Raw agreement', which is calculated based on the following: 

    "The simplest way to measure agreement between annotators is to count the number of items for which they provide identical labels, and report that number as a percentage of the total to be annotated. This is called raw agreement or observed agreement, and according to Bayerl and Paul [5] it is still the most common way of reporting agreement in the literature. Raw agreement is easy to measure and understand; however, agreement in itself does not imply that the annotation process is reliable, because some agreement may be accidental – and this accidental agreement could be very high." (Arstein 2017, p.299)

<a name='functions'></a>
## Functions 

In [14]:
def compare_extractions(df1, df2):
    """This function compares two dataframes to see if they contain the same URLs.
    For each article (DOI), it checks if both dataframes contain the same URLs.
    If df1 contains a URL that is not in df2, the columns will be updated accordingly, and vice versa.
    """
    unique_dois = set(df1['DOI'].unique()) | set(df2['DOI'].unique())
    data = []

    for doi in unique_dois:
        unique_urls1 = set(df1.loc[df1['DOI'] == doi, 'URL'].unique())
        unique_urls2 = set(df2.loc[df2['DOI'] == doi, 'URL'].unique())
        unique_urls = unique_urls1 | unique_urls2

        for URL in unique_urls:
            if pd.isna(URL):
                doi1 = df1.loc[(df1['DOI'] == doi) & (df1['URL'].isna()), 'DOI'].values
                doi2 = df2.loc[(df2['DOI'] == doi) & (df2['URL'].isna()), 'DOI'].values
            else:
                doi1 = df1.loc[(df1['DOI'] == doi) & (df1['URL'] == URL), 'DOI'].values
                doi2 = df2.loc[(df2['DOI'] == doi) & (df2['URL'] == URL), 'DOI'].values

            e_extracted = bool(doi1.size > 0)
            gt_extracted = bool(doi2.size > 0)

            e_sentences = (
                df1.loc[(df1['DOI'] == doi) & (df1['URL'] == URL), 'Sentence(s)'].values[0]
                if e_extracted and len(df1.loc[(df1['DOI'] == doi) & (df1['URL'] == URL), 'Sentence(s)']) > 0
                else None
            )
            gt_sentences = (
                df2.loc[(df2['DOI'] == doi) & (df2['URL'] == URL), 'Sentence(s)'].values[0]
                if gt_extracted and len(df2.loc[(df2['DOI'] == doi) & (df2['URL'] == URL), 'Sentence(s)']) > 0
                else None
            )

            data.append({
                'DOI': doi,
                'URL': URL,
                'URL_both': e_extracted and gt_extracted,
                'e_extracted': e_extracted,
                'gt_extracted': gt_extracted,
                'e_sentences': e_sentences,
                'gt_sentences': gt_sentences,
            })

    extracted_df = pd.DataFrame(data)

    return extracted_df

In [15]:
def compare_sentences(df):
    """This function compares sentences (e_sentences and gt_sentences) within a single dataframe.
    It checks if sentences are the same within each row and reports the result.
    """
    data = []

    for index, row in df.iterrows():
        e_sentences = row['e_sentences']
        gt_sentences = row['gt_sentences']
        
        same = e_sentences == gt_sentences
        difference = None if same else highlight_differences(e_sentences, gt_sentences)

        data.append({
            'DOI': row['DOI'],
            'URL': row['URL'],
            'Same': same,
            'Difference': difference,
            'e_sentences': e_sentences,
            'gt_sentences': gt_sentences
        })

    result_df = pd.DataFrame(data)
    
    return result_df
    

def highlight_differences(sentence1, sentence2):
    """The - symbol indicates words removed from sentence1, and the + symbol indicates words added in sentence2."""
    # Create a differ object
    differ = difflib.Differ()

    if sentence1 is None and sentence2 is None:
        return None 
    elif sentence1 is None:
        return sentence2
    elif sentence2 is None:
        return sentence1
    
    # Compute the differences between the sentences
    diff = list(differ.compare(sentence1.split(), sentence2.split()))

    # Join the differences to get the highlighted text
    highlighted_text = ' '.join(diff)

    return highlighted_text

In [16]:
def evaluation_metrics(gt_observations, e_observations, both_observed, either_observed, TP, FP, FN, title):
    gt_extracted = len(gt_observations)
    e_extracted = len(e_observations)
    both_extracted = len(both_observed)
    either_extracted = len(either_observed)

    average_extracted = (e_extracted + gt_extracted) / 2
    agreement_pct = (both_extracted / average_extracted) * 100 

    precision = TP / (TP+FP)
    recall = TP / (TP+FN)
    f1_score = 2 * (precision * recall) / (precision + recall)

    counts = {
        title: ['Ground Truth', 'Extractor', 'Both', 'Either'],
        'Count': [int(gt_extracted), int(e_extracted), int(both_extracted), int(either_extracted)]
    }

    metrics = {
        'Metric': ['Raw Agreement %', 'Precision', 'Recall', 'F1-Score'],
        'Value': [round(agreement_pct, 2), round(precision, 2), round(recall, 2), round(f1_score, 2)]
    }

    return pd.DataFrame(counts), pd.DataFrame(metrics) 

In [17]:
sentence1 = "Pre-processed functional data were then analyzed in the vistasoftsoftware (https://github.com/vistalab/vistasoft)."
sentence2 = "Pre-processed functional data were then analyzed in the vistasoft software (https://github.com/vistalab/vistasoft)."

highlighted_text = highlight_differences(sentence1, sentence2)
print(highlighted_text)

  Pre-processed   functional   data   were   then   analyzed   in   the - vistasoftsoftware + vistasoft + software   (https://github.com/vistalab/vistasoft).


<a name='humanextractorevaluation'></a>
## Human extractor evaluation  

I will investigate the following: 
- Did the person extract the same URLs in each article as the ground truth URLs?
- Did the person extract the same sentences for each URL?
- In percent, how similar is the performance of the person to the ground truth? 

In [18]:
results_h = compare_extractions(manual_e1, manual_gt)
results_h = results_h.sort_values(by=['DOI', 'URL'])
results_h.reset_index(drop=True, inplace=True)

In [19]:
results_h

Unnamed: 0,DOI,URL,URL_both,e_extracted,gt_extracted,e_sentences,gt_sentences
0,10.1016/j.neuroimage.2021.118868,https://github.com/vistalab/vistasoft,True,True,True,Pre-processed functional data were then analyz...,Pre-processed functional data were then analyz...
1,10.1016/j.neuroimage.2022.118960,,True,True,True,,
2,10.1016/j.neuroimage.2022.118992,,True,True,True,,
3,10.1016/j.neuroimage.2022.119048,,True,True,True,,
4,10.1016/j.neuroimage.2022.119077,http://neuroimage.usc.edu/brainstorm,True,True,True,Brainstorm is documented and freely available ...,Brainstorm is documented and freely available ...
5,10.1016/j.neuroimage.2022.119077,https://neuroimage.usc.edu/brainstorm/Introduc...,True,True,True,The data presented in this manuscript were acq...,The analysis was performed in MATLAB using Bra...
6,10.1016/j.neuroimage.2022.119077,https://sites.google.com/site/bctnet/home,True,True,True,The data presented in this manuscript were acq...,The analysis was performed in MATLAB using Bra...
7,10.1016/j.neuroimage.2022.119077,https://www.neurobs.com/,True,True,True,The stimuli were presented us-ing NBS Presenta...,The stimuli were presented us-ing NBS Presenta...
8,10.1016/j.neuroimage.2022.119199,https://civmvoxport.vm.duke.edu,False,False,True,,The diffusion MRI met-rics are available from:...
9,10.1016/j.neuroimage.2022.119199,https://civmvoxport.vm.duke.edu upon request,False,True,False,The diffusion MRI met-rics are available from:...,


<a name='hurls'></a>
### H-URLs 

In [20]:
# Calculate True Positives (TP), False Positives (FP), and False Negatives (FN)
TP_url = len(results_h[(results_h['e_extracted'] == True) & (results_h['gt_extracted'] == True)])
FP_url = len(results_h[(results_h['e_extracted'] == True) & (results_h['gt_extracted'] == False)])
FN_url = len(results_h[(results_h['e_extracted'] == False) & (results_h['gt_extracted'] == True)])
    
h_url_counts, h_url_metrics = evaluation_metrics(manual_gt['URL'], manual_e1['URL'], results_h[results_h['URL_both']==True], results_h[results_h['URL_both']==False], TP_url, FP_url, FN_url, 'H-URLs')

In [21]:
h_url_counts

Unnamed: 0,H-URLs,Count
0,Ground Truth,44
1,Extractor,39
2,Both,34
3,Either,11


In [22]:
h_url_metrics

Unnamed: 0,Metric,Value
0,Raw Agreement %,81.93
1,Precision,0.92
2,Recall,0.81
3,F1-Score,0.86


The human extractor (E1) extracted fewer URLs compared to the ground truth (GT) set. 
- 92 % of the extractions made by E1 match GT.
- E1 captured 81 % of the information present in the GT data.
- Overall, E1 made correct extractions in 86 % of the extractions. 

Investigating the URLs that E1 and GT did not both extract, I can see that some of the disagreement is due to how the URLs were extracted: 

- In two instances, E1 added characters that were not a part of the links when copying them:
    - For 10.1016/j.neuroimage.2022.119199, the URL is 'https://civmvoxport.vm.duke.edu', but E1 included the text 'upon request' when copying it, saving the URL as 'https://civmvoxport.vm.duke.edu upon request')
    - For 10.1016/j.neuroimage.2022.119742, E1 added a dot to a link, turning 'http://fcon_1000.projects.nitrc.org/indi/retro/yale_hires.html' into 'http://fcon_1000.projects.nitrc.org/indi/retro/.yale_hires.html'

- Different operating systems copy characters differently, so for 10.1016/j.neuroimage.2022.119360, the 'https://nda.nih.gov' link looks different: 
    - E1 uses a Windows computer and the '%' is encoded as '\\04520', so the link ends up looking like this: 'https://nda.nih.gov/general-query.html?q=query=featured-datasets:HCP\\04520Aging\\04520and\\04520Development'
    - GT uses a Mac computer and the link looks like it does in the PDF: 'https://nda.nih.gov/general-query.html?q=query=featured-datasets:HCP%20Aging%20and%20Development'

- GT had five additional URLs that E1 did not find: 
    - Three URLs in 10.1016/j.neuroimage.2022.119360:
        - https://balsa.wustl.edu/6VjVk
        - https://balsa.wustl.edu/7qwq3
        - https://balsa.wustl.edu/nplpK
    - Two URLs in 10.1016/j.neuroimage.2022.119742:
        - https://www.nitrc.org/projects/bioimagesuite/
        - https://github.com/SNeuroble/NBS_benchmarking

Four URLs are affected in the instances where E1 added characters, two URLs are affected in the instances where the links were copied differently, and the last five URLs are instances where one extractor found some that the other person did not. 

In [23]:
different_urls_h = results_h[results_h['URL_both'] == False]

In [24]:
different_urls_h

Unnamed: 0,DOI,URL,URL_both,e_extracted,gt_extracted,e_sentences,gt_sentences
8,10.1016/j.neuroimage.2022.119199,https://civmvoxport.vm.duke.edu,False,False,True,,The diffusion MRI met-rics are available from:...
9,10.1016/j.neuroimage.2022.119199,https://civmvoxport.vm.duke.edu upon request,False,True,False,The diffusion MRI met-rics are available from:...,
17,10.1016/j.neuroimage.2022.119360,https://balsa.wustl.edu/6VjVk,False,False,True,,https://balsa.wustl.edu/6VjVk.
18,10.1016/j.neuroimage.2022.119360,https://balsa.wustl.edu/7qwq3,False,False,True,,https://balsa.wustl.edu/7qwq3.
26,10.1016/j.neuroimage.2022.119360,https://balsa.wustl.edu/nplpK,False,False,True,,https://balsa.wustl.edu/nplpK.
32,10.1016/j.neuroimage.2022.119360,https://nda.nih.gov/general-query.html?q=query...,False,False,True,,The raw HCD and HCA data are available in the ...
33,10.1016/j.neuroimage.2022.119360,https://nda.nih.gov/general-query.html?q=query...,False,True,False,The raw HCD and HCA data are available in the ...,
37,10.1016/j.neuroimage.2022.119742,http://fcon_1000.projects.nitrc.org/indi/retro...,False,True,False,The Yale data used in this study to construct ...,
38,10.1016/j.neuroimage.2022.119742,http://fcon_1000.projects.nitrc.org/indi/retro...,False,False,True,,The Yale data used in this study to construct ...
40,10.1016/j.neuroimage.2022.119742,https://github.com/SNeuroble/NBS_benchmarking,False,False,True,,"Code used for bench-marking, inference, and su..."


In [25]:
results_h['URL'][(results_h['DOI'] == '10.1016/j.neuroimage.2022.119360') & (results_h['URL_both'] == False)].values

array(['https://balsa.wustl.edu/6VjVk', 'https://balsa.wustl.edu/7qwq3',
       'https://balsa.wustl.edu/nplpK',
       'https://nda.nih.gov/general-query.html?q=query=featured-datasets:HCP%20Aging%20and%20Development',
       'https://nda.nih.gov/general-query.html?q=query=featured-datasets:HCP\\04520Aging\\04520and\\04520Development'],
      dtype=object)

In [26]:
results_h['URL'][(results_h['DOI'] == '10.1016/j.neuroimage.2022.119742') & (results_h['URL_both'] == False)].values

array(['http://fcon_1000.projects.nitrc.org/indi/retro/.yale_hires.html',
       'http://fcon_1000.projects.nitrc.org/indi/retro/yale_hires.html',
       'https://github.com/SNeuroble/NBS_benchmarking',
       'https://www.nitrc.org/projects/bioimagesuite/'], dtype=object)

<a name='hsentences'></a>
### H-Sentences
I will now look at and compare the extracted sentences.

In [27]:
results_h_sentences = compare_sentences(results_h)
results_h_sentences = results_h_sentences.sort_values(by=['DOI', 'URL'])
results_h_sentences.reset_index(drop=True, inplace=True)

In [28]:
results_h_sentences

Unnamed: 0,DOI,URL,Same,Difference,e_sentences,gt_sentences
0,10.1016/j.neuroimage.2021.118868,https://github.com/vistalab/vistasoft,True,,Pre-processed functional data were then analyz...,Pre-processed functional data were then analyz...
1,10.1016/j.neuroimage.2022.118960,,True,,,
2,10.1016/j.neuroimage.2022.118992,,True,,,
3,10.1016/j.neuroimage.2022.119048,,True,,,
4,10.1016/j.neuroimage.2022.119077,http://neuroimage.usc.edu/brainstorm,True,,Brainstorm is documented and freely available ...,Brainstorm is documented and freely available ...
5,10.1016/j.neuroimage.2022.119077,https://neuroimage.usc.edu/brainstorm/Introduc...,False,- The - data - presented - in - this - manuscr...,The data presented in this manuscript were acq...,The analysis was performed in MATLAB using Bra...
6,10.1016/j.neuroimage.2022.119077,https://sites.google.com/site/bctnet/home,False,- The - data - presented - in - this - manuscr...,The data presented in this manuscript were acq...,The analysis was performed in MATLAB using Bra...
7,10.1016/j.neuroimage.2022.119077,https://www.neurobs.com/,True,,The stimuli were presented us-ing NBS Presenta...,The stimuli were presented us-ing NBS Presenta...
8,10.1016/j.neuroimage.2022.119199,https://civmvoxport.vm.duke.edu,False,The diffusion MRI met-rics are available from:...,,The diffusion MRI met-rics are available from:...
9,10.1016/j.neuroimage.2022.119199,https://civmvoxport.vm.duke.edu upon request,False,The diffusion MRI met-rics are available from:...,The diffusion MRI met-rics are available from:...,


In [29]:
# Calculate True Positives (TP), False Positives (FP), and False Negatives (FN)
# TP is the number of correct extractions that match the ground truth 
TP_h_sentence = len(results_h_sentences[results_h_sentences['Same'] == True]) 
# FP is the number of extractions by E1 that do not match the ground truth, i.e., the sentences are not the same and E1 caused the mismatch
FP_h_sentence = len(results_h_sentences[(results_h_sentences['Same'] == False) & (results_h_sentences['Difference'].str.contains(" - "))]) 
# FN is the number of extractions that E1 missed, i.e., the sentences are not the same and GT has text that E1 does not 
FN_h_sentence = len(results_h_sentences[(results_h_sentences['Same'] == False) & (results_h_sentences['Difference'].str.contains(" \+ "))]) 

h_sentence_counts, h_sentence_metrics = evaluation_metrics(manual_gt['Sentence(s)'], manual_e1['Sentence(s)'], results_h_sentences[results_h_sentences['Same']==True], results_h_sentences[results_h_sentences['Same']==False], TP_h_sentence, FP_h_sentence, FN_h_sentence, 'H-Sentences')

In [30]:
h_sentence_counts

Unnamed: 0,H-Sentences,Count
0,Ground Truth,44
1,Extractor,39
2,Both,15
3,Either,30


In [31]:
h_sentence_metrics

Unnamed: 0,Metric,Value
0,Raw Agreement %,36.14
1,Precision,0.48
2,Recall,0.65
3,F1-Score,0.56


In [32]:
different_urls_h_list = []
for url in different_urls_h['URL']:
    different_urls_h_list.append(url)

In [33]:
different_h_sentences = results_h_sentences[results_h_sentences['Same']==False]

different_h_sentences_diffurls = different_h_sentences[different_h_sentences['URL'].isin(different_urls_h_list)]
different_h_sentences_excl_diffurls = different_h_sentences[~different_h_sentences['URL'].isin(different_urls_h_list)]

In [34]:
pd.set_option('display.max_colwidth', None)

I want to look at the sentences where E1 and E2 extracted the same URL but different sentences: 

In [35]:
different_h_sentences_excl_diffurls[['DOI', 'Difference', 'e_sentences', 'gt_sentences']]

Unnamed: 0,DOI,Difference,e_sentences,gt_sentences
5,10.1016/j.neuroimage.2022.119077,- The - data - presented - in - this - manuscript - were - acquired - as - part - of - an - ongoing - longitudinal - study - of - dyslexia - (2018–2023). - Some - data - derivatives - will - be - made - available - after - completion - of - the - study - by - contacting - the - senior - author - (UG) - of - the - study. The analysis was performed in MATLAB using Brainstorm toolbox https://neuroimage.usc.edu/brainstorm/Introduction and Brain Con-nectivity toolbox https://sites.google.com/site/bctnet/home.,The data presented in this manuscript were acquired as part of an ongoing longitudinal study of dyslexia (2018–2023). Some data derivatives will be made available after completion of the study by contacting the senior author (UG) of the study. The analysis was performed in MATLAB using Brainstorm toolbox https://neuroimage.usc.edu/brainstorm/Introduction and Brain Con-nectivity toolbox https://sites.google.com/site/bctnet/home.,The analysis was performed in MATLAB using Brainstorm toolbox https://neuroimage.usc.edu/brainstorm/Introduction and Brain Con-nectivity toolbox https://sites.google.com/site/bctnet/home.
6,10.1016/j.neuroimage.2022.119077,- The - data - presented - in - this - manuscript - were - acquired - as - part - of - an - ongoing - longitudinal - study - of - dyslexia - (2018–2023). - Some - data - derivatives - will - be - made - available - after - completion - of - the - study - by - contacting - the - senior - author - (UG) - of - the - study. The analysis was performed in MATLAB using Brainstorm toolbox https://neuroimage.usc.edu/brainstorm/Introduction and Brain Con-nectivity toolbox https://sites.google.com/site/bctnet/home.,The data presented in this manuscript were acquired as part of an ongoing longitudinal study of dyslexia (2018–2023). Some data derivatives will be made available after completion of the study by contacting the senior author (UG) of the study. The analysis was performed in MATLAB using Brainstorm toolbox https://neuroimage.usc.edu/brainstorm/Introduction and Brain Con-nectivity toolbox https://sites.google.com/site/bctnet/home.,The analysis was performed in MATLAB using Brainstorm toolbox https://neuroimage.usc.edu/brainstorm/Introduction and Brain Con-nectivity toolbox https://sites.google.com/site/bctnet/home.
12,10.1016/j.neuroimage.2022.119294,"- The + To + facilitate + future + use + of + this + method, + we + provide + a toolbox to run FR-RSA in Python - is - available - via - GitHub - (https://github.com/ViCCo-Group/frrsa). ? ^\n + (https://github.com/ViCCo-Group/frrsa), ? ^\n + with + recommenda-tions + regarding + implementational + choices.",The toolbox to run FR-RSA in Python is available via GitHub (https://github.com/ViCCo-Group/frrsa).,"To facilitate future use of this method, we provide a toolbox to run FR-RSA in Python (https://github.com/ViCCo-Group/frrsa), with recommenda-tions regarding implementational choices."
14,10.1016/j.neuroimage.2022.119360,- Methods - sections - 2.2 - and - 2.3 - describe - the - participants - and - data - and - preprocessing - is - described - in - methods - sections - 2.6 - and - 2.7. https://balsa.wustl.edu/1BgBP.,Methods sections 2.2 and 2.3 describe the participants and data and preprocessing is described in methods sections 2.6 and 2.7. https://balsa.wustl.edu/1BgBP.,https://balsa.wustl.edu/1BgBP.
16,10.1016/j.neuroimage.2022.119360,"- Methods - sections - 2.2 - and - 2.3 - describe - the - participants - and - data, - preprocessing - is - described - in - methods - sections - 2.6 - and - 2.7, - and - the - analyses - are - described - in - methods - sections - 2.10 - and - 2.11. https://balsa.wustl.edu/5XqXP.","Methods sections 2.2 and 2.3 describe the participants and data, preprocessing is described in methods sections 2.6 and 2.7, and the analyses are described in methods sections 2.10 and 2.11. https://balsa.wustl.edu/5XqXP.",https://balsa.wustl.edu/5XqXP.
19,10.1016/j.neuroimage.2022.119360,"- Methods - sections - 2.2 - and - 2.4 - describe - the - participants - and - data, - and - preprocessing - is - described - in - methods - sections - 2.6 - and - 2.8. https://balsa.wustl.edu/B494V.","Methods sections 2.2 and 2.4 describe the participants and data, and preprocessing is described in methods sections 2.6 and 2.8. https://balsa.wustl.edu/B494V.",https://balsa.wustl.edu/B494V.
20,10.1016/j.neuroimage.2022.119360,"- Methods - sections - 2.2 - and - 2.4 - describe - the - participants - and - data, - preprocessing - is - described - in - methods - sections - 2.6 - and - 2.8, - and - the - analyses - are - described - in - methods - sections - 2.10 - and - 2.11. https://balsa.wustl.edu/Mxnx8.","Methods sections 2.2 and 2.4 describe the participants and data, preprocessing is described in methods sections 2.6 and 2.8, and the analyses are described in methods sections 2.10 and 2.11. https://balsa.wustl.edu/Mxnx8.",https://balsa.wustl.edu/Mxnx8.
21,10.1016/j.neuroimage.2022.119360,- Methods - sections - 2.2 - and - 2.3 - describe - the - participants - and - data - and - preprocessing - is - described - in - methods - sections - 2.6 - and - 2.7. https://balsa.wustl.edu/PrBrK.,Methods sections 2.2 and 2.3 describe the participants and data and preprocessing is described in methods sections 2.6 and 2.7. https://balsa.wustl.edu/PrBrK.,https://balsa.wustl.edu/PrBrK.
22,10.1016/j.neuroimage.2022.119360,"- Methods - sections - 2.2 - and - 2.4 - describe - the - participants - and - data, - and - preprocessing - is - described - in - methods - sections - 2.6 - and - 2.8. https://balsa.wustl.edu/g767V.","Methods sections 2.2 and 2.4 describe the participants and data, and preprocessing is described in methods sections 2.6 and 2.8. https://balsa.wustl.edu/g767V.",https://balsa.wustl.edu/g767V.
23,10.1016/j.neuroimage.2022.119360,"- Methods - sections - 2.2 - and - 2.5 - describe - the - participants - and - data, - and - preprocessing - is - described - in - methods - sections - 2.6 - and - 2.9. - https://balsa.wustl. + https://balsa.wustl.edu/jjnjZ. ? ++++++++++\n - edu/jjnjZ.","Methods sections 2.2 and 2.5 describe the participants and data, and preprocessing is described in methods sections 2.6 and 2.9. https://balsa.wustl. edu/jjnjZ.",https://balsa.wustl.edu/jjnjZ.


There are 30 sentences that are different between E1 and GT.
The sentences are different in 11 cases because the URLs extracted are different.
If we exclude the cases where the URLs are different, 19 sentences are extracted differently. 

When looking at sentences, there are two major differences between E1 and GT: 
- E1 extracted one or more sentences before the sentence containing the URL. There are a total of 13 sentences that are different because of this. 
    -  This is the cause for most of the differences in article 10.1016/j.neuroimage.2022.119360, e.g., for URL https://balsa.wustl.edu/1BgBP, E1 extracted: "Methods sections 2.2 and 2.3 describe the participants and data and preprocessing is described in methods sections 2.6 and 2.7. https://balsa.wustl.edu/1BgBP." and E2 extracted: "https://balsa.wustl.edu/1BgBP."
    -  This is the cause for differences in article 10.1016/j.neuroimage.2022.119077, where E1 included "The data presented in this manuscript were acquired as part of an ongoing longitudinal study of dyslexia (2018–2023). Some data derivatives will be made available after completion of the study by contacting the senior author (UG) of the study." in addition to what both E1 and E2 extracted. 


- E1 did not include all sentences that contained links they had already identified. There are a total of 4 sentences that are different because of this. 
    - This is the cause for some of the differences in article 10.1016/j.neuroimage.2022.119360, e.g., for URL https://balsa.wustl.edu/study/show/mDBP0, where both E1 and E2 extracted "The data for this study are available at the BALSA neuroimaging study results database (https://balsa.wustl.edu/study/show/mDBP0; Van Essen et al., 2017), and a link to each figure’s specific data is provided in the legend.", but E2 additionally extracted: "The data for this study are available at the BALSA neuroimaging study results database (https://balsa.wustl.edu/study/show/mDBP0; Van Essen et al., 2017), and a link to each figure’s specific data is provided in the legend."
    - This is the cause for all of the differences in article 10.1016/j.neuroimage.2022.119742, where E2 included sentences even if they were exact repetitions of previous sentences. 

In total, 17 sentences are different due to how E1 and GT extracted them differently. E1 stated that they erred on the side of caution and included more in text in quite many cases, resulting in 13 comparisons being different, but they forgot to include all sentences that contained URLs they had encountered before, resulting in 4 comparisons being different. 

11 sentences are different because the URLs are different - as such, it's a comparison between an empty sentence and a sentence. 

The last two instances: 
- E1 and GT found different sentences for a URL, indicating that they both overlooked a sentence in 10.1016/j.neuroimage.2022.119294: 
    - E1: "The toolbox to run FR-RSA in Python is available via GitHub (https://github.com/ViCCo-Group/frrsa)."
    - GT: "To facilitate future use of this method, we provide a toolbox to run FR-RSA in Python (https://github.com/ViCCo-Group/frrsa), with recommenda-tions regarding implementational choices."
- The last instance is in 10.1016/j.neuroimage.2022.119360, but I cannot see what the differences between the two sentences are:
    - E1: "The raw HCP-YA data are available in ConnectomeDB (https://db.humanconnectome.org/) or Amazon Public Datasets (https://registry.opendata.aws/hcp-openaccess; https://wiki.humanconnectome.org/display/PublicData/ How+To+Connect+to+Connectome+Data+via+AWS)."
    - GT: "The raw HCP-YA data are available in ConnectomeDB (https://db.humanconnectome.org/) or Amazon Public Datasets (https://registry.opendata.aws/hcp-openaccess; https://wiki.humanconnectome.org/display/PublicData/How+To+Connect+to+Connectome+Data+via+AWS)."

<a name='codeevaluation'></a>
## Code evaluation 
I will now compare the results of the urlextract library (this is placed in the variable 'automatic_urls') with the ground truth extraction (this is placed in the variable 'manual_urls'). 

In [36]:
manual_df = pd.concat([manual_together, manual_gt], ignore_index=True)

In [37]:
results_c = compare_extractions(automatic_df, manual_df)
results_c = results_c.sort_values(by=['DOI', 'URL'])
results_c.reset_index(drop=True, inplace=True)

In [38]:
results_c

Unnamed: 0,DOI,URL,URL_both,e_extracted,gt_extracted,e_sentences,gt_sentences
0,10.1016/j.neuroimage.2021.118868,https://github.com/vistalab/vistasoft,True,True,True,Pre-processed functional data were then analyzed in the vistasoft software (https://github.com/vistalab/vistasoft).,Pre-processed functional data were then analyzed in the vistasoft software (https://github.com/vistalab/vistasoft).
1,10.1016/j.neuroimage.2022.118960,,True,True,True,,
2,10.1016/j.neuroimage.2022.118992,,True,True,True,,
3,10.1016/j.neuroimage.2022.119048,,True,True,True,,
4,10.1016/j.neuroimage.2022.119077,http://neuroimage.usc.edu/brainstorm,True,True,True,Brainstorm is documented and freely available for download under GNU general public license (http://neuroimage.usc.edu/brainstorm).,Brainstorm is documented and freely available for download under GNU general public license (http://neuroimage.usc.edu/brainstorm).
5,10.1016/j.neuroimage.2022.119077,https://neuroimage.usc.edu/brainstorm/Introduction,True,True,True,The analysis was performed in MATLAB using Brainstorm toolbox https://neuroimage.usc.edu/brainstorm/Introduction and Brain Con-nectivity toolbox https://sites.google.com/site/bctnet/home.,The analysis was performed in MATLAB using Brainstorm toolbox https://neuroimage.usc.edu/brainstorm/Introduction and Brain Con-nectivity toolbox https://sites.google.com/site/bctnet/home.
6,10.1016/j.neuroimage.2022.119077,https://sites.google.com/site/bctnet/home,True,True,True,The analysis was performed in MATLAB using Brainstorm toolbox https://neuroimage.usc.edu/brainstorm/Introduction and Brain Con-nectivity toolbox https://sites.google.com/site/bctnet/home.,The analysis was performed in MATLAB using Brainstorm toolbox https://neuroimage.usc.edu/brainstorm/Introduction and Brain Con-nectivity toolbox https://sites.google.com/site/bctnet/home.
7,10.1016/j.neuroimage.2022.119077,https://www.neurobs.com/,True,True,True,The stimuli were presented us-ing NBS Presentation software (https://www.neurobs.com/).,The stimuli were presented us-ing NBS Presentation software (https://www.neurobs.com/).
8,10.1016/j.neuroimage.2022.119133,http://www.librow.com/articles/article-13,True,True,True,"R-peaks were then identiﬁed using an online sample software package (http://www.librow.com/articles/article-13 ; Petzschner et al., 2019) and then data were down-sampled to 512 Hz. 3 L.","R-peaks were then identified using an online sample software package (http://www.librow.com/articles/article-13; Petzschner et al., 2019) and then data were down-sampled to 512 Hz."
9,10.1016/j.neuroimage.2022.119199,https://civmvoxport.vm.duke.edu,True,True,True,"The diﬀusion MRI met-rics are available from: https://civmvoxport.vm.duke.edu upon request., The FA, DWI, MD, AD, RD, and NQA maps at dMRI datasets are available through https://civmvoxport.vm.duke.edu upon request.","The diffusion MRI met-rics are available from: https://civmvoxport.vm.duke.edu upon request., The FA, DWI, MD, AD, RD, and NQA maps at dMRI datasets are available through https://civmvoxport.vm.duke.edu upon request."


<a name='curls'></a>
### C-URLs 
First, I will look at the extracted URLs. 

In [39]:
# Calculate True Positives (TP), False Positives (FP), and False Negatives (FN)
TP_c_url = len(results_c[(results_c['e_extracted'] == True) & (results_c['gt_extracted'] == True)])
FP_c_url = len(results_c[(results_c['e_extracted'] == True) & (results_c['gt_extracted'] == False)])
FN_c_url = len(results_c[(results_c['e_extracted'] == False) & (results_c['gt_extracted'] == True)])
    
c_url_counts, c_url_metrics = evaluation_metrics(manual_df['URL'], automatic_df['URL'], results_c[results_c['URL_both']==True], results_c[results_c['URL_both']==False], TP_c_url, FP_c_url, FN_c_url, 'C-URLs')

In [40]:
c_url_counts

Unnamed: 0,C-URLs,Count
0,Ground Truth,51
1,Extractor,48
2,Both,45
3,Either,7


In [41]:
c_url_metrics

Unnamed: 0,Metric,Value
0,Raw Agreement %,90.91
1,Precision,0.94
2,Recall,0.92
3,F1-Score,0.93


The code (C) extracted fewer URLs compared to the ground truth (GT) set. 
- 94 % of the extractions made by C match GT.
- C captured 92 % of the information present in the GT data.
- Overall, C made correct extractions in 93 % of the extractions.

Investigating the URLs the C and GT did not both extract, there are two main tendencies: 
- C will sometimes cut off a URL, which is most likely due to how the PDF is read and parsed.
    - In 10.1016/j.neuroimage.2022.119360, C found and extracted the link 'https://nda.nih.gov/general-query.html?q', but the full link is 'https://nda.nih.gov/general-query.html?q=query=featured-datasets:HCP%20Aging%20and%20Development'
    - In 10.1016/j.neuroimage.2022.119742, C found and extracted the link 'https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id', but the full link is 'https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000607.v3.p2'

- C might not recognize all URLs.
    - In 10.1016/j.neuroimage.2022.119294, C did not recognize 'https://osf.io/8weum/'
    - In 10.1016/j.neuroimage.2022.119360, C did not recognize 'https://balsa.wustl.edu/jjnjZ'

C also found a link that was not in GT: 
- In 10.1016/j.neuroimage.2022.119254, C found 'ClinicalTrials.gov'

In [42]:
different_urls_c = results_c[results_c['URL_both'] == False]

In [43]:
different_urls_c

Unnamed: 0,DOI,URL,URL_both,e_extracted,gt_extracted,e_sentences,gt_sentences
12,10.1016/j.neuroimage.2022.119254,ClinicalTrials.gov,False,True,False,"The Whitehall II Imaging Sub-study was supported by the UK Med-ical Research Council (MRC) grants “Predicting MRI abnormalities with longitudinal data of the Whitehall II Sub-study ” (G1001354; PI KPE; ClinicalTrials.gov Identiﬁer: NCT03335696), and “Adult De-terminants of Late Life Depression, Cognitive Decline and Physical Functioning-The Whitehall II Ageing Study ”(MR/K013351/1 ; PI: MK).",
20,10.1016/j.neuroimage.2022.119294,https://osf.io/8weum/,False,False,True,,All result files and analysis scripts pertaining to this study are available via an OSF repos-itory (https://osf.io/8weum/).
30,10.1016/j.neuroimage.2022.119360,https://balsa.wustl.edu/jjnjZ,False,False,True,,https://balsa.wustl.edu/jjnjZ.
39,10.1016/j.neuroimage.2022.119360,https://nda.nih.gov/general-query.html?q,False,True,False,The raw HCD and HCA data are available in the NDA (https://nda.nih.gov/general-query.html?q = query = featured-datasets:HCP%20Aging%20and %20Development).,
40,10.1016/j.neuroimage.2022.119360,https://nda.nih.gov/general-query.html?q=query=featured-datasets:HCP%20Aging%20and%20Development,False,False,True,,The raw HCD and HCA data are available in the NDA (https://nda.nih.gov/general-query.html?q=query=featured-datasets:HCP%20Aging%20and%20Development).
47,10.1016/j.neuroimage.2022.119742,https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id,False,True,False,"Data and code availability statement The data used in this study for inference and benchmarking are open-source: HCP (https://db.humanconnectome.org), HBN (http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/sharing.html), and PNC (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id = phs000607.v3.p2)., Data and Code Availability The data used in this study for inference and benchmarking are open-source: HCP (https://db.humanconnectome.org), HBN (http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/sharing.html), and PNC (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id = phs000607.v3.p2).",
48,10.1016/j.neuroimage.2022.119742,https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000607.v3.p2,False,False,True,,"The data used in this study for inference and benchmarking are open-source: HCP (https://db.humanconnectome.org), HBN (http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/ sharing.html), and PNC (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000607.v3.p2)., The data used in this study for inference and benchmarking are open-source: HCP (https://db.humanconnectome.org), HBN (http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/ sharing.html), and PNC (https://www.ncbi.nlm.nih.gov/projects/gap/ cgi-bin/study.cgi?study_id=phs000607.v3.p2)."


<a name='csentences'></a>
### C-Sentences 

In [44]:
results_c_sentences = compare_sentences(results_c)
results_c_sentences = results_c_sentences.sort_values(by=['DOI', 'URL'])
results_c_sentences.reset_index(drop=True, inplace=True)

In [45]:
results_c_sentences 

Unnamed: 0,DOI,URL,Same,Difference,e_sentences,gt_sentences
0,10.1016/j.neuroimage.2021.118868,https://github.com/vistalab/vistasoft,True,,Pre-processed functional data were then analyzed in the vistasoft software (https://github.com/vistalab/vistasoft).,Pre-processed functional data were then analyzed in the vistasoft software (https://github.com/vistalab/vistasoft).
1,10.1016/j.neuroimage.2022.118960,,True,,,
2,10.1016/j.neuroimage.2022.118992,,True,,,
3,10.1016/j.neuroimage.2022.119048,,True,,,
4,10.1016/j.neuroimage.2022.119077,http://neuroimage.usc.edu/brainstorm,True,,Brainstorm is documented and freely available for download under GNU general public license (http://neuroimage.usc.edu/brainstorm).,Brainstorm is documented and freely available for download under GNU general public license (http://neuroimage.usc.edu/brainstorm).
5,10.1016/j.neuroimage.2022.119077,https://neuroimage.usc.edu/brainstorm/Introduction,True,,The analysis was performed in MATLAB using Brainstorm toolbox https://neuroimage.usc.edu/brainstorm/Introduction and Brain Con-nectivity toolbox https://sites.google.com/site/bctnet/home.,The analysis was performed in MATLAB using Brainstorm toolbox https://neuroimage.usc.edu/brainstorm/Introduction and Brain Con-nectivity toolbox https://sites.google.com/site/bctnet/home.
6,10.1016/j.neuroimage.2022.119077,https://sites.google.com/site/bctnet/home,True,,The analysis was performed in MATLAB using Brainstorm toolbox https://neuroimage.usc.edu/brainstorm/Introduction and Brain Con-nectivity toolbox https://sites.google.com/site/bctnet/home.,The analysis was performed in MATLAB using Brainstorm toolbox https://neuroimage.usc.edu/brainstorm/Introduction and Brain Con-nectivity toolbox https://sites.google.com/site/bctnet/home.
7,10.1016/j.neuroimage.2022.119077,https://www.neurobs.com/,True,,The stimuli were presented us-ing NBS Presentation software (https://www.neurobs.com/).,The stimuli were presented us-ing NBS Presentation software (https://www.neurobs.com/).
8,10.1016/j.neuroimage.2022.119133,http://www.librow.com/articles/article-13,False,"R-peaks were then - identiﬁed ? ^\n + identified ? ^^\n using an online sample software package - (http://www.librow.com/articles/article-13 + (http://www.librow.com/articles/article-13; ? +\n - ; Petzschner et al., 2019) and then data were down-sampled to 512 Hz. - 3 - L.","R-peaks were then identiﬁed using an online sample software package (http://www.librow.com/articles/article-13 ; Petzschner et al., 2019) and then data were down-sampled to 512 Hz. 3 L.","R-peaks were then identified using an online sample software package (http://www.librow.com/articles/article-13; Petzschner et al., 2019) and then data were down-sampled to 512 Hz."
9,10.1016/j.neuroimage.2022.119199,https://civmvoxport.vm.duke.edu,False,"The - diﬀusion ? ^\n + diffusion ? ^^\n MRI met-rics are available from: https://civmvoxport.vm.duke.edu upon request., The FA, DWI, MD, AD, RD, and NQA maps at dMRI datasets are available through https://civmvoxport.vm.duke.edu upon request.","The diﬀusion MRI met-rics are available from: https://civmvoxport.vm.duke.edu upon request., The FA, DWI, MD, AD, RD, and NQA maps at dMRI datasets are available through https://civmvoxport.vm.duke.edu upon request.","The diffusion MRI met-rics are available from: https://civmvoxport.vm.duke.edu upon request., The FA, DWI, MD, AD, RD, and NQA maps at dMRI datasets are available through https://civmvoxport.vm.duke.edu upon request."


In [46]:
# Calculate True Positives (TP), False Positives (FP), and False Negatives (FN)
# TP is the number of correct extractions that match the ground truth 
TP_c_sentence = len(results_c_sentences[results_c_sentences['Same'] == True]) 
# FP is the number of extractions by E1 that do not match the ground truth, i.e., the sentences are not the same and E1 caused the mismatch
FP_c_sentence = len(results_c_sentences[(results_c_sentences['Same'] == False) & (results_c_sentences['Difference'].str.contains(" - "))]) 
# FN is the number of extractions that E1 missed, i.e., the sentences are not the same and GT has text that E1 does not 
FN_c_sentence = len(results_c_sentences[(results_c_sentences['Same'] == False) & (results_c_sentences['Difference'].str.contains(" \+ "))]) 

c_sentence_counts, c_sentence_metrics = evaluation_metrics(manual_df['Sentence(s)'], automatic_df['Sentence(s)'], results_c_sentences[results_c_sentences['Same']==True], results_c_sentences[results_c_sentences['Same']==False], TP_c_sentence, FP_c_sentence, FN_c_sentence, 'C-Sentences')

In [47]:
c_sentence_counts

Unnamed: 0,C-Sentences,Count
0,Ground Truth,51
1,Extractor,48
2,Both,25
3,Either,27


In [48]:
c_sentence_metrics

Unnamed: 0,Metric,Value
0,Raw Agreement %,50.51
1,Precision,0.57
2,Recall,0.64
3,F1-Score,0.6


27 sentences were extracted differently by C compared to GT. 
7 of these are different, because C extracted different URLs compared to GT. 

In [49]:
different_urls_c_list = []
for url in different_urls_c['URL']:
    different_urls_c_list.append(url)

In [50]:
different_c_sentences = results_c_sentences[results_c_sentences['Same']==False]

different_c_sentences_diffurls = different_c_sentences[different_c_sentences['URL'].isin(different_urls_c_list)] # df containing sentences where the URLs are not the same 
different_c_sentences_excl_diffurls = different_c_sentences[~different_c_sentences['URL'].isin(different_urls_c_list)]

In [51]:
different_c_sentences_excl_diffurls[['DOI', 'URL', 'Difference', 'e_sentences', 'gt_sentences']]

Unnamed: 0,DOI,URL,Difference,e_sentences,gt_sentences
8,10.1016/j.neuroimage.2022.119133,http://www.librow.com/articles/article-13,"R-peaks were then - identiﬁed ? ^\n + identified ? ^^\n using an online sample software package - (http://www.librow.com/articles/article-13 + (http://www.librow.com/articles/article-13; ? +\n - ; Petzschner et al., 2019) and then data were down-sampled to 512 Hz. - 3 - L.","R-peaks were then identiﬁed using an online sample software package (http://www.librow.com/articles/article-13 ; Petzschner et al., 2019) and then data were down-sampled to 512 Hz. 3 L.","R-peaks were then identified using an online sample software package (http://www.librow.com/articles/article-13; Petzschner et al., 2019) and then data were down-sampled to 512 Hz."
9,10.1016/j.neuroimage.2022.119199,https://civmvoxport.vm.duke.edu,"The - diﬀusion ? ^\n + diffusion ? ^^\n MRI met-rics are available from: https://civmvoxport.vm.duke.edu upon request., The FA, DWI, MD, AD, RD, and NQA maps at dMRI datasets are available through https://civmvoxport.vm.duke.edu upon request.","The diﬀusion MRI met-rics are available from: https://civmvoxport.vm.duke.edu upon request., The FA, DWI, MD, AD, RD, and NQA maps at dMRI datasets are available through https://civmvoxport.vm.duke.edu upon request.","The diffusion MRI met-rics are available from: https://civmvoxport.vm.duke.edu upon request., The FA, DWI, MD, AD, RD, and NQA maps at dMRI datasets are available through https://civmvoxport.vm.duke.edu upon request."
11,10.1016/j.neuroimage.2022.119199,https://www.mrtrix.org/,"To validate the - ﬁber + fiber orientation distribution at each voxel for the brain, the constrained spherical deconvolution (CSD) method provided by MR-Trix3 (https://www.mrtrix.org/) was also performed with a maximum harmonic order of 6.","To validate the ﬁber orientation distribution at each voxel for the brain, the constrained spherical deconvolution (CSD) method provided by MR-Trix3 (https://www.mrtrix.org/) was also performed with a maximum harmonic order of 6.","To validate the fiber orientation distribution at each voxel for the brain, the constrained spherical deconvolution (CSD) method provided by MR-Trix3 (https://www.mrtrix.org/) was also performed with a maximum harmonic order of 6."
14,10.1016/j.neuroimage.2022.119254,https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FDT,"DTIFit (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FDT) was used to generate maps of MD, FA, and RD for each subject. + RD + maps + were + generated + for + each + subject + by + averaging + the + respective + L2 + and + L3 + outputs + generated + by + DTI-Fit.","DTIFit (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FDT) was used to generate maps of MD, FA, and RD for each subject.","DTIFit (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FDT) was used to generate maps of MD, FA, and RD for each subject. RD maps were generated for each subject by averaging the respective L2 and L3 outputs generated by DTI-Fit."
15,10.1016/j.neuroimage.2022.119254,https://github.com/CoBrALab/minc-bpipe-library,"+ In + this + study + we + focus + on + vertex-wise + measures + of + cortical + macro-and + microstructure. T1w images were preprocessed using the minc-bpipe-library (https://github.com/CoBrALab/minc-bpipe-library), including bias - ﬁeld + field correction (Tustison et al., 2010), adaptive non-local means denoising - (Manjón ? ^\n + (Manjón ? ^^\n et al., 2010), head masking and brain extraction (Eskildsen et al., 2012) The resulting bias - ﬁeld + field corrected, head-masked images and brain masks of each subject were input into the CIVET algo-rithm (Ad-Dab’bagh et al., - 2006 + 2006; ? +\n - ; Lerch & Evans, 2005) (version 2.1.0) in order to obtain cortical mid-surfaces and vertex wise measures of cortical thickness (CT) and surface area (SA), describing CT and SA esti-mates at a total of 81924 points across the cortical mid-surface.","T1w images were preprocessed using the minc-bpipe-library (https://github.com/CoBrALab/minc-bpipe-library), including bias ﬁeld correction (Tustison et al., 2010), adaptive non-local means denoising (Manjón et al., 2010), head masking and brain extraction (Eskildsen et al., 2012) The resulting bias ﬁeld corrected, head-masked images and brain masks of each subject were input into the CIVET algo-rithm (Ad-Dab’bagh et al., 2006 ; Lerch & Evans, 2005) (version 2.1.0) in order to obtain cortical mid-surfaces and vertex wise measures of cortical thickness (CT) and surface area (SA), describing CT and SA esti-mates at a total of 81924 points across the cortical mid-surface.","In this study we focus on vertex-wise measures of cortical macro-and microstructure. T1w images were preprocessed using the minc-bpipe-library (https://github.com/CoBrALab/minc-bpipe-library), including bias field correction (Tustison et al., 2010), adaptive non-local means denoising (Manjón et al., 2010), head masking and brain extraction (Eskildsen et al., 2012) The resulting bias field corrected, head-masked images and brain masks of each subject were input into the CIVET algo-rithm (Ad-Dab’bagh et al., 2006; Lerch & Evans, 2005) (version 2.1.0) in order to obtain cortical mid-surfaces and vertex wise measures of cortical thickness (CT) and surface area (SA), describing CT and SA esti-mates at a total of 81924 points across the cortical mid-surface."
17,10.1016/j.neuroimage.2022.119254,https://mrc.ukri.org/research/policies-and-guidance-for-researchers/data-sharing/,- Data - and - Code - Availability The study follows Medical Research Council data sharing policies (https://mrc.ukri.org/research/policies-and-guidance-for-researchers/data-sharing/).,Data and Code Availability The study follows Medical Research Council data sharing policies (https://mrc.ukri.org/research/policies-and-guidance-for-researchers/data-sharing/).,The study follows Medical Research Council data sharing policies (https://mrc.ukri.org/research/policies-and-guidance-for-researchers/data-sharing/).
18,10.1016/j.neuroimage.2022.119254,https://portal.dementiasplatform.uk/,"- NeuroImage - 257 - (2022) - 119254 + In + accordance + with + these + guidelines, + data + from + the + Whitehall + II + Study + and + the + Imaging + Sub-study + are + acces-sible + via + a + formal + application + on + the + Dementias + Platform + UK + portal (https://portal.dementiasplatform.uk/).",NeuroImage 257 (2022) 119254 (https://portal.dementiasplatform.uk/).,"In accordance with these guidelines, data from the Whitehall II Study and the Imaging Sub-study are acces-sible via a formal application on the Dementias Platform UK portal (https://portal.dementiasplatform.uk/)."
24,10.1016/j.neuroimage.2022.119360,https://balsa.wustl.edu/6VjVk,https://balsa.wustl.edu/6VjVk. - 11 - M.F.,https://balsa.wustl.edu/6VjVk. 11 M.F.,https://balsa.wustl.edu/6VjVk.
26,10.1016/j.neuroimage.2022.119360,https://balsa.wustl.edu/B494V,https://balsa.wustl.edu/B494V. - 15 - M.F.,https://balsa.wustl.edu/B494V. 15 M.F.,https://balsa.wustl.edu/B494V.
29,10.1016/j.neuroimage.2022.119360,https://balsa.wustl.edu/g767V,https://balsa.wustl.edu/g767V. - 14 - M.F.,https://balsa.wustl.edu/g767V. 14 M.F.,https://balsa.wustl.edu/g767V.


The differences between the sentences extracted by C and GT fall into five categories (there are overlaps in the counts, because some sentences were different for multiple reasons: 
- Extracted sentences contain "invisible" special characters (8)
- C included more text (8), including section titles (4), text without meaning (5), and spaces (2)
- GT contained mistakes (3), including additional sentences (2) and missed sentences (1)
- In one instance, it looks as though the way the PDF was read caused C to include text that is not a part of the main text, but instead a header. 
- In one instance, I cannot see what the difference is, neither by reading the sentences nor looking at the difference. 

---

**"Invisible" characters**
Some sentences contain special characters (e.g., "? ^\n", "?^^\n"). In the included quotes, the differences between the sentences are highlighted by the " - " and " + ", where the " - " is what is special to e_sentences and + is special to gt_sentences 
- In 10.1016/j.neuroimage.2022.119199: "The - diﬀusion ? ^\n + diffusion ? ^^\n MRI met-rics are available from: https://civmvoxport.vm.duke.edu upon request., The FA, DWI, MD, AD, RD, and NQA maps at dMRI datasets are available through https://civmvoxport.vm.duke.edu upon request."
- In 10.1016/j.neuroimage.2022.119254, excluding the first sentence: "(...) T1w images were preprocessed using the minc-bpipe-library (https://github.com/CoBrALab/minc-bpipe-library), including bias - ﬁeld + field correction (Tustison et al., 2010), adaptive non-local means denoising - (Manjón ? ^\n + (Manjón ? ^^\n et al., 2010), head masking and brain extraction (Eskildsen et al., 2012) The resulting bias - ﬁeld + field corrected, head-masked images and brain masks of each subject were input into the CIVET algo-rithm (Ad-Dab’bagh et al., - 2006 + 2006; ? +\n - ; Lerch & Evans, 2005) (version 2.1.0) in order to obtain cortical mid-surfaces and vertex wise measures of cortical thickness (CT) and surface area (SA), describing CT and SA esti-mates at a total of 81924 points across the cortical mid-surface."
- In 10.1016/j.neuroimage.2022.119360: "The data for this study are available at the BALSA neuroimaging study results database - (https://balsa.wustl.edu/study/show/mDBP0 + (https://balsa.wustl.edu/study/show/mDBP0; ? +\n - ; Van Essen et al., 2017), and a link to each - ﬁgure’s ? ^\n + figure’s ? ^^\n - speciﬁc ? ^\n + specific ? ^^\n data is provided in the legend., The study results data from this manuscript are already available in the BALSA neuroimaging study results database: https://balsa.wustl.edu/study/show/mDBP0."
- In 10.1016/j.neuroimage.2022.119360: "The raw HCP-YA data are available in ConnectomeDB (https://db.humanconnectome.org/) or Amazon Public Datasets - (https://registry.opendata.aws/hcp-openaccess + (https://registry.opendata.aws/hcp-openaccess; ? +\n - ; - https://wiki.humanconnectome.org/display/PublicData/How+To+Connect+to+Connectome+Data+via+AWS). + https://wiki.humanconnectome.org/display/PublicData/ + How+To+Connect+to+Connectome+Data+via+AWS)."
- In 10.1016/j.neuroimage.2022.119742: "The Yale data used in this study to construct edge-centric networks are open-source and available here: http://fcon_1000.projects.nitrc.org/indi/retro/yale_hires.html., The Yale data used in this study to construct edge-centric networks are open-source and available here: - http://fcon_1000.projects.nitrc.org/indi/retro/yale_hires.html. + http://fcon_1000.projects.nitrc.org/indi/retro/12yale_hires.html. ? ++\n"

**More text**
- **Text without meaning**
    - In 10.1016/j.neuroimage.2022.119133, C included "3 L." in the sentence: "R-peaks were then - identiﬁed ? ^\n + identified ? ^^\n using an online sample software package - (http://www.librow.com/articles/article-13 + (http://www.librow.com/articles/article-13; ? +\n - ; Petzschner et al., 2019) and then data were down-sampled to 512 Hz. - 3 - L."
        - NB! Had it not been for the "3 L." this sentence would be different because the extraction contains additional special characters.
    - 10.1016/j.neuroimage.2022.119360, C included more text around four links, e.g., "https://balsa.wustl.edu/6VjVk. 11 M.F." instead of "https://balsa.wustl.edu/6VjVk."
-  **Section titles**
    - In 10.1016/j.neuroimage.2022.119254: "- Data - and - Code - Availability The study follows Medical Research Council data sharing policies (https://mrc.ukri.org/research/policies-and-guidance-for-researchers/data-sharing/)."
    - In 10.1016/j.neuroimage.2022.119742: "- Data - and - code - availability - statement The data used in this study for inference and benchmarking are open-source: HCP (https://db.humanconnectome.org), HBN - (http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/sharing.html), ? --------------\n + (http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/ + sharing.html), and PNC - (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id + (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000607.v3.p2)., ? +++++++++++++++++++\n - = - phs000607.v3.p2)., - Data - and - Code - Availability The data used in this study for inference and benchmarking are open-source: HCP (https://db.humanconnectome.org), HBN - (http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/sharing.html), ? --------------\n + (http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/ + sharing.html), and PNC - (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id ? --------------------------\n + (https://www.ncbi.nlm.nih.gov/projects/gap/ + cgi-bin/study.cgi?study_id=phs000607.v3.p2). - = - phs000607.v3.p2)."
        - NB! Had it not been for the inclusion of the title, this sentence would be different because the extraction contains additional special characters. 
    - In 10.1016/j.neuroimage.2022.119742: "- Data - and - code - availability - statement The data used in this study for inference and benchmarking are open-source: HCP (https://db.humanconnectome.org), HBN - (http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/sharing.html), ? --------------\n + (http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/ + sharing.html), and PNC - (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id + (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000607.v3.p2)., ? +++++++++++++++++++\n - = - phs000607.v3.p2)., - Data - and - Code - Availability The data used in this study for inference and benchmarking are open-source: HCP (https://db.humanconnectome.org), HBN - (http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/sharing.html), ? --------------\n + (http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/ + sharing.html), and PNC - (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id ? --------------------------\n + (https://www.ncbi.nlm.nih.gov/projects/gap/ + cgi-bin/study.cgi?study_id=phs000607.v3.p2). - = - phs000607.v3.p2)."
        - NB! Had it not been for the inclusion of the title, this sentence would be different because the extraction contains additional special characters. 
    - In 10.1016/j.neuroimage.2022.119769: "- Data - availability Python and matlab codes for training the CNN models and analysis are available at https://github.com/brainneuro/Multi-Face-attribution."
- **Spaces**
    - In 10.1016/j.neuroimage.2022.119360, a space between ';' and the text in front of it should've been removed during processing: "The raw HCP-YA data are available in ConnectomeDB (https://db.humanconnectome.org/) or Amazon Public Datasets - (https://registry.opendata.aws/hcp-openaccess + (https://registry.opendata.aws/hcp-openaccess; ? +\n - ; - https://wiki.humanconnectome.org/display/PublicData/How+To+Connect+to+Connectome+Data+via+AWS). + https://wiki.humanconnectome.org/display/PublicData/ + How+To+Connect+to+Connectome+Data+via+AWS)."
    - In 10.1016/j.neuroimage.2022.119360, a space between ";" and the text in front of it: "The raw HCP-YA data are available in ConnectomeDB (https://db.humanconnectome.org/) or Amazon Public Datasets - (https://registry.opendata.aws/hcp-openaccess + (https://registry.opendata.aws/hcp-openaccess; ? +\n - ; https://wiki.humanconnectome.org/display/PublicData/How+To+Connect+to+Connectome+Data+via+AWS)."

**GT mistakes**
- **Mistakenly included sentences**
    - In 10.1016/j.neuroimage.2022.119254, GT mistakenly included an extra sentence at the end: "DTIFit (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FDT) was used to generate maps of MD, FA, and RD for each subject. + RD + maps + were + generated + for + each + subject + by + averaging + the + respective + L2 + and + L3 + outputs + generated + by + DTI-Fit."
    - In 10.1016/j.neuroimage.2022.119254, GT mistakenly included an extra sentence in the beginning: "+ In + this + study + we + focus + on + vertex-wise + measures + of + cortical + macro-and + microstructure. T1w images were preprocessed using the minc-bpipe-library (https://github.com/CoBrALab/minc-bpipe-library), including bias - ﬁeld + field correction (Tustison et al., 2010), adaptive non-local means denoising - (Manjón ? ^\n + (Manjón ? ^^\n et al., 2010), head masking and brain extraction (Eskildsen et al., 2012) The resulting bias - ﬁeld + field corrected, head-masked images and brain masks of each subject were input into the CIVET algo-rithm (Ad-Dab’bagh et al., - 2006 + 2006; ? +\n - ; Lerch & Evans, 2005) (version 2.1.0) in order to obtain cortical mid-surfaces and vertex wise measures of cortical thickness (CT) and surface area (SA), describing CT and SA esti-mates at a total of 81924 points across the cortical mid-surface."
- **Missed sentences**
    - In 10.1016/j.neuroimage.2022.119360, GT failed to include "The code will be made available as a part of the HCP Pipelines (https://github.com/Washington-University/HCPpipelines) when the various analysis streams presented here (currently existing as 3 separate pipelines) are integrated into a single multi-functional pipeline to make this approach easier for users to use.", but C included this sentence.

**PDF reading**
- In 10.1016/j.neuroimage.2022.119254, C extracted "NeuroImage 257 (2022) 119254 (https://portal.dementiasplatform.uk/).", while the actual sentence is "In accordance with these guidelines, data from the Whitehall II Study and the Imaging Sub-study are acces-sible via a formal application on the Dementias Platform UK portal (https://portal.dementiasplatform.uk/)."

**Unsure**
- In some instances, I cannot see why the sentences are reported as being different, e.g.,
    - In 10.1016/j.neuroimage.2022.119199: "To validate the - ﬁber + fiber orientation distribution at each voxel for the brain, the constrained spherical deconvolution (CSD) method provided by MR-Trix3 (https://www.mrtrix.org/) was also performed with a maximum harmonic order of 6."

<a name='overview'></a>
# Overview 
The final results, comparing the ground truth extraction with another extractor and the code. 

In [52]:
from IPython.display import display, HTML

In [53]:
html = '<div style="display: flex; justify-content: space-between;">'
html += h_url_counts.to_html() + h_sentence_counts.to_html() + c_url_counts.to_html() + c_sentence_counts.to_html()
html += '</div>'

display(HTML(html))

Unnamed: 0,H-URLs,Count
0,Ground Truth,44
1,Extractor,39
2,Both,34
3,Either,11

Unnamed: 0,H-Sentences,Count
0,Ground Truth,44
1,Extractor,39
2,Both,15
3,Either,30

Unnamed: 0,C-URLs,Count
0,Ground Truth,51
1,Extractor,48
2,Both,45
3,Either,7

Unnamed: 0,C-Sentences,Count
0,Ground Truth,51
1,Extractor,48
2,Both,25
3,Either,27


In [54]:
html = '<div style="display: flex; justify-content: space-between;">'
html += h_url_metrics.to_html() + h_sentence_metrics.to_html() + c_url_metrics.to_html() + c_sentence_metrics.to_html()
html += '</div>'

display(HTML(html))

Unnamed: 0,Metric,Value
0,Raw Agreement %,81.93
1,Precision,0.92
2,Recall,0.81
3,F1-Score,0.86

Unnamed: 0,Metric,Value
0,Raw Agreement %,36.14
1,Precision,0.48
2,Recall,0.65
3,F1-Score,0.56

Unnamed: 0,Metric,Value
0,Raw Agreement %,90.91
1,Precision,0.94
2,Recall,0.92
3,F1-Score,0.93

Unnamed: 0,Metric,Value
0,Raw Agreement %,50.51
1,Precision,0.57
2,Recall,0.64
3,F1-Score,0.6


<a name='references'></a>
# References 

- Artstein, R. (2017). Inter-annotator Agreement. In N. Ide & J. Pustejovsky (Eds.), Handbook of Linguistic Annotation (pp. 297–313). Springer Netherlands. https://doi.org/10.1007/978-94-024-0881-2_11
- Géron, A. (2019). Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd ed.). O’Reilly Media, Inc.
