# LLM Task Extraction
**Purpose:**
- Extracts cognitive tasks using LLM (GPT-4-mini via OpenRouter)
- Validates LLM annotations against original articles
- Compares annotations from different sources (LLM, LabelBuddy, NeuroVault)
- Exports task combinations to CSV files

**Data dependencies:**

The notebook works with data stored in:
- `data/neurovault_labeled_papers/`
- Generated output pickle files include timestamps for versioning

In [1]:
import os
import sys
notebook_path = os.getcwd()  # Get the current working directory
project_root = os.path.dirname(notebook_path)  # Get the parent directory
sys.path.append(project_root)
os.chdir(project_root)
sys.path.append(project_root + '/src')
sys.path = list(set(sys.path))


In [2]:
import json
from src.lb_annotation_utils import Article
from src.llm_utils import system_prompts, estimate_token_cost
import re
from openai import OpenAI
client = OpenAI(
            base_url="https://openrouter.ai/api/v1",
            api_key=os.getenv("OPENROUTER_API_KEY"))

4e-05


## 1. Data Loading and Preprocessing

**Summary:**
1. Load article annotations from LabelBuddy
2. Filter annotated articles based on exclusion criteria:
    - Resting state studies
    - Meta-analyses
    - Articles without explicit methods sections


In [3]:
# Define constants
MAP_FOLDER = "data/neurovault_labeled_papers/"
INDEX_FILE = MAP_FOLDER + "nv_labeled_tasks_pmcid_map.json"
TASK_FILE_PATTERN = MAP_FOLDER + "nv_labeled_tasks_{chunk_number}.json"
LB_ANNOTATION = os.getenv("LB_ANNOTATION")

In [4]:
ann = json.load(open(LB_ANNOTATION))

In [5]:
# Set up filter logic
# exclusion criteria
exclude_types = ["resting_state", "meta-analysis"]
# excluded articles
excluded_articles = []
# included articles
included_articles = []  
# extract methods
for i in range(len(ann)):
    article_obj  = Article(ann[i])
    if article_obj.type in exclude_types:
        excluded_articles.append({"pmcid": article_obj.pmcid, "exclusion_reason":{"type": article_obj.type}})
    elif article_obj.methods == "No explicit methods section":
        excluded_articles.append({"pmcid": article_obj.pmcid, "exclusion_reason":{"methods": "No explicit methods section"}})
    else:
        included_articles.append(article_obj)


Title: 
# Title

The role of the hippocampus in generalizing configural relationships	No subheaders found in body section

Title: 
# Title

A Coordinate-Based Meta-Analysis of Overlaps in Regional Specialization and Functional Connectivity across Subjective Value and Default Mode Networks	No subheaders found in body section



In [6]:
excluded_articles

[{'pmcid': 5324609,
  'exclusion_reason': {'methods': 'No explicit methods section'}},
 {'pmcid': 5090046, 'exclusion_reason': {'type': 'resting_state'}},
 {'pmcid': 10634720, 'exclusion_reason': {'type': 'meta-analysis'}},
 {'pmcid': 3445793, 'exclusion_reason': {'type': 'meta-analysis'}},
 {'pmcid': 5243799, 'exclusion_reason': {'type': 'meta-analysis'}}]

In [7]:
included_articles

[<src.lb_annotation_utils.Article at 0x7f42ec3a5a50>,
 <src.lb_annotation_utils.Article at 0x7f42ec3a6b30>,
 <src.lb_annotation_utils.Article at 0x7f42ec3a51b0>,
 <src.lb_annotation_utils.Article at 0x7f42ec3a5d80>,
 <src.lb_annotation_utils.Article at 0x7f42ec3a5db0>,
 <src.lb_annotation_utils.Article at 0x7f42ec3a6cb0>,
 <src.lb_annotation_utils.Article at 0x7f4308bd7880>,
 <src.lb_annotation_utils.Article at 0x7f4308bd7e80>,
 <src.lb_annotation_utils.Article at 0x7f4308bd7910>,
 <src.lb_annotation_utils.Article at 0x7f4308bd6cb0>,
 <src.lb_annotation_utils.Article at 0x7f4308bd76a0>,
 <src.lb_annotation_utils.Article at 0x7f4308255540>,
 <src.lb_annotation_utils.Article at 0x7f43082546a0>,
 <src.lb_annotation_utils.Article at 0x7f4308254040>,
 <src.lb_annotation_utils.Article at 0x7f4308264c40>,
 <src.lb_annotation_utils.Article at 0x7f4308264be0>,
 <src.lb_annotation_utils.Article at 0x7f42ec3a59f0>,
 <src.lb_annotation_utils.Article at 0x7f4308264bb0>,
 <src.lb_annotation_utils.Ar

## 2. LLM Task Extraction

**Summary:**
1. Estimate token cost for LLM task extraction
2. Extract both task names and task descriptions using OpenRouter API with GPT-4-mini model
    - Export the included articles with LLM annotations as pickle file
3. Validates extracted tasks against original article text

### 2.1 Estimate token cost

In [8]:
total_cost = 0
for i in range(len(included_articles)):
    article_obj  = included_articles[i]
    if article_obj.methods!="No explicit methods section":
        item = {"pmcid": article_obj.pmcid, "methods": article_obj.methods}
        total_cost += estimate_token_cost(article_obj.methods, "gpt-4o-mini")
    else:
        item = {"pmcid": article_obj.pmcid, "methods": None}
print(total_cost)


0.5548899999999999


### 2.2 Extract tasks using LLM

In [9]:
def extract_using_llm(text:str, client:OpenAI, messages:list):
    response = client.chat.completions.create(
        model = 'openai/gpt-4o-mini',
        messages = messages,
        seed = 42, 
        response_format = {"type": "json_object"}
    )
    output = response.choices[0].message.content
    # check output format
    assert isinstance(output, str), "Output is not a string"
    assert output.startswith('{') and output.endswith('}'), "Output is not a valid JSON string"
    return output

def generate_messages(text:str, system_prompt:str):
    messages = [
        {"role": "system", "content": f"{system_prompt}"},
        {"role": "user", "content": f"{text}"},
    ]
    return messages



#### Export included articles with LLM annotations

This section exports the included articles with LLM annotations as a pickle file so that the LLM annotations can be re-used for future steps. This is useful for reproducibility across multiple LLM extraction runs.

In [10]:
# Extract cognitive tasks AND task descriptions
for i in range(len(included_articles)):
#for i in range(1):
    methods_text = included_articles[i].methods
    # Extract cognitive tasks
    for k,v in system_prompts.items():
        messages = generate_messages(text = methods_text, system_prompt = v)
        output = extract_using_llm(text = methods_text, client = client, messages = messages)
        output_dict = json.loads(output)

        try: 
            [(key, value)] = output_dict.items()  
            included_articles[i].llm_annotations[k] = value
        except ValueError:
            print(f"Expected exactly one key-value pair, but got {len(output_dict)} items")

In [11]:
# export included_articles as pickle
import pickle as pkl
import datetime
YYYYMMDD_HHMMSS = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")

with open(f'data/{YYYYMMDD_HHMMSS}_included_articles.pkl', 'wb') as f:
        pkl.dump(included_articles, f)


In [12]:
import os
from pathlib import Path
import pickle as pkl

# Find most recent included_articles file
data_dir = Path('data')
latest_file = max(data_dir.glob('*_included_articles.pkl'))

# Load the file
with open(latest_file, 'rb') as f:
    included_articles = pkl.load(f)

### 2.3 Validate LLM annotations
This section validates the LLM annotations against the original article text. It checks if the extracted tasks match the original article text and identifies any mismatched tasks.

Tasks without an exact match in the original article text are identified and moved to the `Article.llm_annotations` dictionary under the `mismatched_tasks` key.

Tasks that are matched to the original article text stay in the `Article.llm_annotations` dictionary under the original key.

In [13]:
# verify Article.llm_annotations is from original Article.methods
for i in range(len(included_articles)):
    print(included_articles[i].pmcid)
    methods = included_articles[i].methods.lower()
    # match task names in Article.llm_annotations to Article.methods
    mismatched_tasks = {'mismatched_tasks': {}}
    for k,v in included_articles[i].llm_annotations.items():
        if "TaskName" in k:
            for name in v:
                if name is None:
                    continue
                pattern = r'\b' + re.escape(name.lower()) + r'\b'
                if re.search(pattern, methods):
                    continue
                else:
                    # add to list of mismatched tasks
                    if k not in mismatched_tasks['mismatched_tasks'].keys():
                        mismatched_tasks['mismatched_tasks'][k] = []
                    mismatched_tasks['mismatched_tasks'][k].append(name)
    if len(mismatched_tasks['mismatched_tasks']) > 0:
        included_articles[i].llm_annotations['mismatched_tasks'] = mismatched_tasks
        # remove mismatched tasks from Article.llm_annotations
        for k,v in mismatched_tasks['mismatched_tasks'].items():
            included_articles[i].llm_annotations[k] = [task for task in included_articles[i].llm_annotations[k] if task not in v]   
    print(included_articles[i].llm_annotations)
    print("\n")


8318202
{'TaskName_SimplePrompt': ['attentive listening', 'word repetition'], 'TaskName_DetailedPrompt': ['attentive listening', 'word repetition'], 'TaskDescription': ['Attentive listening task where participants listened to a sequence of auditory sounds, including words, silence, and noise, while focusing on a fixation cross.', 'Word repetition task where participants listened to words and repeated them aloud following the stimulus presentation.']}


9202476
{'TaskName_SimplePrompt': ['match-to-sample (MTS) task', 'resting-state fMRI'], 'TaskName_DetailedPrompt': [], 'TaskDescription': ['Subjects performed the match-to-sample (MTS) task of the Cambridge Neuropsychological Test Automated Battery (CANTAB) for visual search and attention, where they had to identify a pattern that matched a previously shown complex figure among multiple peripheral patterns. The task involved varying the number of patterns from two to eight across trials, with performance measured in terms of total correc

## 3. Compare LLM annotations with NeuroVault and LabelBuddy

This section combines annotations from multiple sources:
- LLM-extracted tasks
- LabelBuddy annotations
- NeuroVault labeled tasks

**Summary:**
1. Add NeuroVault task annotations to included articles
2. Compare LLM annotations with LLM annotations
3. Create a CSV table of task combinations

### 3.1 Add NeuroVault task annotations to included articles
This section adds the NeuroVault task annotations to the included articles. It works with the pickle file created in the previous section.



In [14]:
def get_nv_labeled_tasks(pmid, index_file = INDEX_FILE,json_file_pattern = TASK_FILE_PATTERN):
    nv_labeled_tasks = []
    query_pmid = str(pmid)
    with open(index_file, "r") as f:
        indexer = json.load(f)
        task_location = indexer[query_pmid]
        print([query_pmid, task_location])
        for file in task_location:
            with open(file, "r") as ff:
                extracted_tasks = json.load(ff)
                for task in extracted_tasks:
                    if list(task.keys())[0] == query_pmid:
                        nv_labeled_tasks.append(task)
    return nv_labeled_tasks
# Table
for i in range(len(included_articles)):
    try:
        nv_labeled_tasks = get_nv_labeled_tasks(pmid = included_articles[i].pmid)
        id = list(nv_labeled_tasks[0].keys())[0]
        included_articles[i].nv_annotations = nv_labeled_tasks[0][id]
    except:
        print(f"No NV labeled tasks found for {included_articles[i].pmid}")
        continue

print(included_articles[1].pmid)
print(included_articles[1].doi)
nv_labeled_tasks = get_nv_labeled_tasks(pmcid = included_articles[1].pmid)
print(nv_labeled_tasks)
print("\n")
print(included_articles[0].pmid)
print(included_articles[0].doi)
nv_labeled_tasks = get_nv_labeled_tasks(pmcid = included_articles[0].pmid)
print(nv_labeled_tasks)

['34327333', ['data/neurovault_labeled_papers/nv_labeled_tasks_5.json']]
['35720693', ['data/neurovault_labeled_papers/nv_labeled_tasks_0.json']]
No NV labeled tasks found for 25994551
['30297824', ['data/neurovault_labeled_papers/nv_labeled_tasks_0.json']]
No NV labeled tasks found for 34744973
['36092648', ['data/neurovault_labeled_papers/nv_labeled_tasks_1.json']]
['28948083', ['data/neurovault_labeled_papers/nv_labeled_tasks_11.json']]
No NV labeled tasks found for 28441460
No NV labeled tasks found for 37922608
['35504565', ['data/neurovault_labeled_papers/nv_labeled_tasks_2.json']]
No NV labeled tasks found for 33093488
No NV labeled tasks found for 37756616
No NV labeled tasks found for 30405475
['27230218', ['data/neurovault_labeled_papers/nv_labeled_tasks_0.json']]
No NV labeled tasks found for 32320123
['32116608', ['data/neurovault_labeled_papers/nv_labeled_tasks_11.json']]
['17973998', ['data/neurovault_labeled_papers/nv_labeled_tasks_5.json']]
['32701449', ['data/neurovaul

### 3.2 Compare LLM annotations with LabelBuddy annotations

This section compares the LLM annotations with the LabelBuddy annotations then creates a table of task combinations between the LLM and LabelBuddy annotations.

Each article has a list of LLM annotations containing multiple task names and LabelBuddy annotations containing multiple task names and unsure flags.

The process is as follows for each article:
1. Match each LLM task name to a LabelBuddy task name
2. If the LLM task name is an exact match:
    1. Create a tuple of the LLM task name, LabelBuddy task name, and unsure flag. 
        - Example format: `(LLM_task_name, LabelBuddy_task_name, unsure_flag)`
        - Example: `(TaskA, TaskA, 1)`
        - `unsure_flag` is 1 if the LabelBuddy task name is in the LabelBuddy unsure annotation, otherwise 0
    2. Add the tuple to the `task_combinations` list
    3. Remove the  exact match from the list of LabelBuddy task names to match against the next LLM task name.
3. Repeat Step 2for all LLM task names
4. For the remaining LLM task names, create a tuple of the LLM task name and list of unmatched LabelBuddy task names. Add the tuple to the `task_combinations` list.


In [15]:
# create table of cognitive tasks and descriptions
def get_value(obj, *keys, default=None):
    """ get nested dictionary values."""
    try:
        result = obj
        for key in keys:
            result = result[key]
        return result
    except (KeyError, TypeError):
        return default
    
def get_task_data(article):
    """Extract task data from a single article."""
    return {
        'pmcid': article.pmcid,
        'llm_task': get_value(article.llm_annotations, 'TaskName_SimplePrompt'),
        'lb_task': get_value(article.lb_annotations, 'annotations', 'TaskName'),
        'lb_unsure': get_value(article.lb_annotations, 'annotations', 'Unsure')
    }
# columns: pmcid, llm_task, lb_task, lb_unsure
for i in range(len(included_articles)):
    pmcid = included_articles[i].pmcid
    llm_task = get_value(included_articles[i].llm_annotations, 'TaskName_SimplePrompt')
    lb_task = get_value(included_articles[i].lb_annotations, 'annotations', 'TaskName')
    if lb_task is not None:
        lb_task = [task[0] for task in lb_task]
    lb_unsure = get_value(included_articles[i].lb_annotations, 'annotations', 'Unsure')
    if lb_unsure is not None:
        lb_unsure = [task[0] for task in lb_unsure]


    # create tuple of lb_task and lb_unsure (task_name, is_unsure_flag)
    if lb_task is not None and lb_unsure is not None:
        lb_task = [(task, 1 if task in lb_unsure else 0) for task in lb_task]
    if lb_task is not None and lb_unsure is None:
        lb_task = [(task, 0) for task in lb_task]
    # create combination of llm_task and lb_task as tuple using itertools
    from itertools import product
    if llm_task is not None and lb_task is not None:
        llm_lb_combinations = list(product(llm_task, lb_task))
    else:
        llm_lb_combinations = None
    #print(llm_lb_combinations)
    if llm_lb_combinations is not None: 
        # flatten nested tuple
        for i in range(len(llm_lb_combinations)):
            try:
                combination = llm_lb_combinations[i] 
                lb_task, lb_unsure_flag = combination[1]
                llm_task = combination[0]
                llm_lb_combinations[i] = (llm_task, lb_task, lb_unsure_flag)
            except ValueError:
                print(combination)
    print(llm_lb_combinations)

# lb_unsure: lb_annotation label is unsure

[('attentive listening', 'attentive listening', 0), ('attentive listening', 'word repetition', 0), ('word repetition', 'attentive listening', 0), ('word repetition', 'word repetition', 0)]
None
[('matching task', 'matching task', 1), ('matching task', 'emotional matching task', 1), ('emotional matching task', 'matching task', 1), ('emotional matching task', 'emotional matching task', 1), ('Faces – Forms', 'matching task', 1), ('Faces – Forms', 'emotional matching task', 1), ('IAPS Pictures – Forms', 'matching task', 1), ('IAPS Pictures – Forms', 'emotional matching task', 1)]
[('reminiscence task', 'personal semantics', 1), ('reminiscence task', 'autobiographical reminiscence', 1), ('week discrimination task', 'personal semantics', 1), ('week discrimination task', 'autobiographical reminiscence', 1)]
None
[('Food Cue Reactivity Task', 'food cue reactivity task', 1)]
None
[('story-reading phase', 'Social Norm Processing Task', 0), ('story-reading phase', 'story-reading phase', 1), ('rat

In [16]:
# create table of cognitive tasks and descriptions
    
def get_task_data(article, llm_task_key = 'TaskName_SimplePrompt'):
    """Extract task data from a single article."""

    def get_value(obj, *keys, default=None):
        """ get nested dictionary values."""
        try:
            result = obj
            for key in keys:
                result = result[key]
            return result
        except (KeyError, TypeError):
            return default

    return {
        'pmcid': article.pmccid,
        'llm_task': get_value(article.llm_annotations, llm_task_key),
        'lb_task': get_value(article.lb_annotations, 'annotations', 'TaskName'),
        'lb_unsure': get_value(article.lb_annotations, 'annotations', 'Unsure')
    }

def process_lb_tasks(lb_task, lb_unsure):
    """Process labelbox tasks and unsure flags into tuples."""
    if lb_task is None:
        return None
        
    # Extract first elements from task lists
    lb_task = [task[0] for task in lb_task] if lb_task else None
    lb_unsure = [task[0] for task in lb_unsure] if lb_unsure else None
    
    # Create tuples with unsure flags
    if lb_task and lb_unsure:
        return [(task, 1 if task in lb_unsure else 0) for task in lb_task]
    elif lb_task:
        return [(task, 0) for task in lb_task]
    return None

def create_task_combinations(llm_task, lb_task):
    """Create combinations of LLM and labelbox tasks."""
    if not (llm_task and lb_task):
        return None
        
    combinations = list(product(llm_task, lb_task))
    return [
        (llm, lb_task, lb_unsure_flag)
        for llm, (lb_task, lb_unsure_flag) in combinations
    ]




This section creates a table of task combinations between the LLM and LabelBuddy annotations from the included articles. Each table contains the LLM output from a single LLM extraction run using a unique system prompt.

In [17]:
import pandas as pd

# Main processing loop

for llm_task_key in ['TaskName_SimplePrompt', 'TaskName_DetailedPrompt']:
    task_combinations = []
    for article in included_articles:
        # Get raw data
        data = get_task_data(article, llm_task_key = llm_task_key)
        df = pd.DataFrame()
        # Process labelbox tasks
        processed_lb_tasks = process_lb_tasks(data['lb_task'], data['lb_unsure'])
        
        # Create combinations
        combinations = create_task_combinations(data['llm_task'], processed_lb_tasks)
        if combinations is not None:
            # transpose combinations
            combinations = list(zip(*combinations))
            llm_task, lb_task, lb_unsure_flag = combinations
            #print(llm_task)
            
            task_combinations.append({
                'pmcid': [data['pmcid']]*len(llm_task),
                'llm_task': list(llm_task),
                'lb_task': list(lb_task),
                'lb_unsure_flag': list(lb_unsure_flag),
            })
    # merge task_combinations into one dataframe
    
    df = pd.concat([pd.DataFrame(task) for task in task_combinations], ignore_index=True)
    # get YYYYMMDD_HHMMSS from latest file name
    YYYYMMDD = latest_file.stem.split('_')[0]
    HHMMSS = latest_file.stem.split('_')[1]
    YYYYMMDD_HHMMSS = f"{YYYYMMDD}_{HHMMSS}"
    df.to_csv(f'data/{YYYYMMDD_HHMMSS}_task_combinations_{llm_task_key}.csv', index=False)


This section prints out the LLM, NeuroVault, and LabelBuddy annotations for each article. This is useful for manual comparison.

In [None]:
for i in range(len(included_articles)):
    print(included_articles[i].pmcid)
    print(i)

    # Compare LLM, NV, and LB annotations for cognitive task names
    print("Cognitive task names:")
    if "cognitive_task" in included_articles[i].llm_annotations.keys():
        print("LLM:\t", included_articles[i].llm_annotations['cognitive_task'])
    if "task" in included_articles[i].nv_annotations.keys():
        print("NV:\t", included_articles[i].nv_annotations['task']['name'])
    if "TaskName" in included_articles[i].lb_annotations['annotations'].keys():
        print("LB:\t", included_articles[i].lb_annotations['annotations']['TaskName'])
    if "Unsure" in included_articles[i].lb_annotations['annotations'].keys():
        print("LB_unsure:\t", included_articles[i].lb_annotations['annotations']['Unsure'])
    
    # Compare LLM and NV annotations for cognitive task descriptions
    print("\nCognitive task descriptions:")
    if "cognitive_task_description" in included_articles[i].llm_annotations.keys():
        print("LLM description:\t", included_articles[i].llm_annotations['cognitive_task_description'])
    #if "definition_text" in included_articles[i].nv_annotations.keys():
    if "task" in included_articles[i].nv_annotations.keys():
        definition_text = "".join(included_articles[i].nv_annotations['task']['definition_text']).replace("\n", "")
        print("NV description:\t", definition_text)
    if "TaskDescription" in included_articles[i].lb_annotations['annotations'].keys():
        lb_description = [description[0].replace("\n", "") for description in included_articles[i].lb_annotations['annotations']['TaskDescription']]
        print("LB description:\t", lb_description)
        
    print("\n--------------------------------\n")


In [None]:
!jupyter nbconvert --to scripts/3_llm_extract.ipynb


In [None]:
jupyter nbconvert --to html 3_llm_extract.ipynb

#### Extra: Integration with Cognitive Atlas via Jina AI endpoint

This section integrates with the Cognitive Atlas via a Jina AI endpoint. It uses the Jina AI endpoint to get the task description from the Cognitive Atlas.

In [56]:
import requests
JINA_AI_ENDPOINT = "https://r.jina.ai/"
cognitive_atlas_url = 'https://www.cognitiveatlas.org'


In [None]:
def get_cognitive_atlas_task(task_url):
    url = JINA_AI_ENDPOINT + cognitive_atlas_url + f"{task_url}"
    print(url)
    response = requests.get(url)
    return response.text

# test
get_cognitive_atlas_task("/task/id/trm_4f244f46ebf58")