Used GPT-4-o1-preview to generate an ontology. first prompted it to generate tasks (via chatgpt interface):

```
you are an expert in psychological research.  Researchers in the field of psychology study specific 
psychological constructs, which are the building blocks of the mind, such as 
memory, attention, theory of mind, and so on.  To study them, researchers use 
experimental tasks or surveys, which are meant to measure behavior related to the 
constructs.  In many cases psychological tasks have several different experimental
conditions, which are meant to manipulate the construct in different ways.  These are 
commonly compared to one another in order to measure the effect of the manipulation; we 
refer to these comparisons as contrasts.

your job is to generate a list of all of the psychological tasks and surveys used by researchers in this field.  Please be as specific and exhaustive as possible. 

you should return these as a JSON list, with no additional text.
```

This identified a list of 144 tasks.  These were then used to generate descriptions using the following prompt:

```
for each of these tasks, please generate:

1) a brief description of the task
2) a list of the psychological constructs that the task is used to assess
3) a small number of references for each task

Please return these as a dictionary of sub-dictionaries, with the task names as keys and with the elements 'description', 'constructs', and 'references' within each sub-dictionary
```

Results from this were stored in [gpt4_task_ontology.json]().

In [59]:
import json
import os
from openai import OpenAI
from dotenv import load_dotenv
from ontology_learner.gpt4_batch_utils import get_batch_results, save_batch_results
from llm_query.chat_client import ChatClientFactory
from tqdm import tqdm
from pathlib import Path

# Load environment variables from .env file
load_dotenv()

datadir = Path(os.getenv('DATADIR'))
print(datadir)


/Users/poldrack/Dropbox/data/ontology-learner/data


In [61]:
with open(datadir / 'gpt4/gpt4_task_ontology.json') as f:
    ontology = json.load(f)

print(len(ontology))

144


Here we extract all of the constructs for further annotation.

In [62]:
constructs = {}

for taskname, taskdict in ontology.items():
    for construct in taskdict['constructs']:
        if construct not in constructs:
            constructs[construct] = []
        constructs[construct].append(taskname)

print(len(constructs))

186


In [63]:
# create json list of constructs

with open(datadir / 'gpt4/gpt4_construct_list.json', 'w') as f:
    json.dump(list(constructs.keys()), f, indent=2)


This list was then fed into GPT-4-o1-preview with the following prompt:

```
The following is a list of psychological constructs identified above.  These represent only a fraction of all of the constructs that are studied in the field.  please use your expert knowledge of psychology to expand this list to contain a wider selection of the constructs studied within the field.  Please return your result as a json list.
```

the result was saved to [gpt4_expanded_construct_list.json]().

In [91]:
with open(datadir / 'gpt4/gpt4_expanded_construct_list.json') as f:
    expanded_constructs = json.load(f)

print(len(expanded_constructs))
expanded_constructs = list(set(expanded_constructs))
print(len(expanded_constructs))


866
807


I tried to further expand these but the chatgpt interface wouldn't do it due to the length of the list, so we then move to the API.  We also switch to GPT-4o due to cost of o1-preview.

In [12]:
def get_construct_prompt(construct):
    prompt = f"""
# CONTEXT #
Researchers in the field of cognitive neuroscience and psychology study specific 
psychological constructs, which are the building blocks of the mind, such as 
memory, attention, theory of mind, and so on.  

# OBJECTIVE #
Your job is to analyze a specific construct: {construct}.

- You should first determine whether it is truly a psychological construct, or whether it is some 
other kind of thing.  For example, "working memory" is a psychological construct, 
but "n-back task" is not (it is a task, not a construct).  Include a 'type' key in your response with the value 'construct' if it is 
truly a psychological construct or 'other' if it is not.

If it is a psychological construct, please do the following:
- provide a short description of the construct.
- provide a short list of widely cited publications that describe the construct. Include a 
'references' key in your response with a list of the references.
- provide a list of commonly used tasks or surveys that measure the construct.  Include a 'tasks' key in your response with a list of the tasks.
- Provide a list of other constructs that are related to this construct.  Include a 'related_constructs' key in your response with a list of the related constructs.
Be as specific as possible, using names that are as specific as possible.

# RESPONSE #
Please return the results in JSON format.  Use the following keys:
- type: 'construct', or 'other'
- description: a short description of the construct
- references: a list of references that use the construct
- tasks: a list of tasks used to measure the construct
- related_constructs: a list of other constructs that are related to this construct
Respond only with JSON, without any additional text or description.
"""
    return prompt


create batch submission

In [92]:
api_key = os.environ.get("OPENAI")
client = OpenAI(api_key=api_key)

system_msg = """
    You are an expert in psychology and neuroscience.
    You should be as specific and as comprehensive as possible in your responses.
    Your response should be a JSON object with no additional text.  
    """

# wanted to use 01-preview but it's too expensive so we fall back to GPT-4o
model = 'gpt-4o'
client = ChatClientFactory.create_client("openai", api_key, 
                                            system_msg=system_msg,
                                            model=model)


In [93]:

batchfile = datadir / 'gpt4/gpt4_construct_expansion_batch.jsonl'

if batchfile.exists():
    batchfile.unlink()

for construct in expanded_constructs:

    prompt = get_construct_prompt(construct)
    kwargs = {'model': model,  'messages': [{"role": "user", "content": prompt}]}
    try:
        batch_request = client.create_batch_request(construct, prompt)
    except Exception as e:
        print(f'error processing {pmcid}: {e}')
        continue

    with open(batchfile, 'a') as f:
        f.write(json.dumps(batch_request) + '\n')


Run batch request

In [94]:

batch_client = OpenAI(api_key=api_key)

batch_input_file = batch_client.files.create(file=open(batchfile, "rb"),
                                        purpose="batch")

batch_input_file_id = batch_input_file.id

batch_metadata = batch_client.batches.create(
    input_file_id=batch_input_file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
        "description": "construct annotation"
    }
)


In [90]:
print(batch_client.batches.retrieve(batch_metadata.id))

Batch(id='batch_674e46adb93c8190beb9deb06e953f58', completion_window='24h', created_at=1733183149, endpoint='/v1/chat/completions', input_file_id='file-BxE44oJVDJv7AYusX95EL6', object='batch', status='failed', cancelled_at=None, cancelling_at=None, completed_at=None, error_file_id=None, errors=Errors(data=[BatchError(code='duplicate_custom_id', line=190, message='The custom_id for this request is a duplicate of another request. The custom_id parameter must be unique for each request in a batch.', param='custom_id'), BatchError(code='duplicate_custom_id', line=246, message='The custom_id for this request is a duplicate of another request. The custom_id parameter must be unique for each request in a batch.', param='custom_id'), BatchError(code='duplicate_custom_id', line=300, message='The custom_id for this request is a duplicate of another request. The custom_id parameter must be unique for each request in a batch.', param='custom_id'), BatchError(code='duplicate_custom_id', line=301, m

In [95]:
print(batch_client.batches.retrieve(batch_metadata.id).status)
import time
while batch_client.batches.retrieve(batch_metadata.id).status != 'completed':
    time.sleep(60)
    print(batch_client.batches.retrieve(batch_metadata.id).status)
# os.system('say "your program has finished"')


in_progress
in_progress
in_progress
in_progress
in_progress
in_progress
in_progress
in_progress
in_progress
in_progress
in_progress
in_progress
in_progress
in_progress
in_progress
in_progress
in_progress
in_progress
in_progress
finalizing
finalizing
completed


In [97]:
import sys
sys.path.append('..')
from gpt4_batch_utils import get_batch_results, save_batch_results
batch_results = get_batch_results(batch_client, batch_metadata.id)
outdir = datadir / 'gpt4/construct_refinement_results'
outdir.mkdir(exist_ok=True, parents=True)

outfile = save_batch_results(batch_results, batch_metadata.id, outdir)