## Query OpenAlex for works authored by a person
This notebook queries the [OpenAlex API](https://docs.openalex.org/api) via its `/works` endpoint for works authored by a person. It takes an ORCID URL from a list of ORCID IDs as input which is used to filter for works where '`authorships.author.orcid`' matches the given ORCID URL.
The notebook iterates through the given list and displys all DOIs.

## Concept ID to ORCID

Usecase: Expertensuche und Trackrecord

Anmerkungen: Doppelsuche nötig um spezifische Uni zu suchen (ORCID und ROR)
Muss Endpoint geändert werden?

In [1]:
# Prerequisites:
import requests         # dependency to make HTTP calls

In [2]:
# List of all 86 ORCID IDs of members of the University of Osnabrück
list_of_ids=["0000-0001-5380-4449",       
"0000-0001-5406-9458",
"0000-0001-5449-4593",
"0000-0001-5913-890X",
"0000-0001-6604-6253",
"0000-0001-7263-2670",
"0000-0001-7364-4315",
"0000-0001-7389-8024",
"0000-0001-7973-3140",
"0000-0001-8234-9166",
"0000-0001-8307-2189",
"0000-0001-8343-8654",
"0000-0001-8481-6047",
"0000-0001-8498-9466",
"0000-0001-8585-781X",
"0000-0001-9469-2367",
"0000-0002-0256-0680",
"0000-0002-0684-6707",
"0000-0002-0735-5088",
"0000-0002-1187-5166",
"0000-0002-1273-5819",
"0000-0002-1417-2722",
"0000-0002-1424-6314",
"0000-0002-1846-647X",
"0000-0002-2050-9221",
"0000-0002-2143-2270",
"0000-0002-2194-8293",
"0000-0002-2224-4503",
"0000-0002-2456-1174",
"0000-0002-2572-3390",
"0000-0002-2586-3748",
"0000-0002-2747-0913",
"0000-0002-2768-8381",
"0000-0002-2769-0692",
"0000-0002-2845-6945",
"0000-0002-2950-534X",
"0000-0002-3043-3718",
"0000-0002-3108-5217",
"0000-0002-3416-2652",
"0000-0002-3650-1056",
"0000-0002-3796-3500",
"0000-0002-3912-9093",
"0000-0002-4156-3761",
"0000-0002-4398-2337",
"0000-0002-4467-1864",
"0000-0002-4681-5550",
"0000-0002-4789-7084",
"0000-0002-5039-6950",
"0000-0002-5229-0500",
"0000-0002-5241-8498",
"0000-0002-5535-8179",
"0000-0002-5581-7371",
"0000-0002-5861-8896",
"0000-0002-5868-755X",
"0000-0002-6328-7745",
"0000-0002-6371-9624",
"0000-0002-6649-5064",
"0000-0002-6666-1499",
"0000-0002-7366-679X",
"0000-0002-7541-4369",
"0000-0002-7839-6397",
"0000-0002-7870-7343",
"0000-0002-7972-6925",
"0000-0002-8449-1593",
"0000-0002-8722-3332",
"0000-0002-8845-6859",
"0000-0002-9686-8810",
"0000-0003-0608-0884",
"0000-0003-0830-9603",
"0000-0003-0851-2767",
"0000-0003-0858-4760",
"0000-0003-1005-5753",
"0000-0003-1626-0598",
"0000-0003-1813-718X",
"0000-0003-1976-8186",
"0000-0003-2001-6440",
"0000-0003-2162-1968",
"0000-0003-2340-3462",
"0000-0003-2967-2858",
"0000-0003-3186-9000",
"0000-0003-3459-5148",
"0000-0003-3547-3257",
"0000-0003-3654-5267",
"0000-0003-4331-8695",
"0000-0003-4939-1666",
"0000-0003-4971-9991",]

In [3]:
Fachsuche=['Gender studies',
'Gender equality',
'Gender bias',
'Gender history',
'Gender equity',
'Gender schema theory',
'Gender role',
'Gender gap',
'Language and gender',
'Doing gender',
'Gender violence',
'Gender justice',
'Transgender',
'Gender relations',
'Gender analysis',
'Gender identity',
'Gender discrimination',
'Gender disparity',
'Male gender',
'Gender balance',
'Grammatical gender',
'Gender dysphoria',
'Gender mainstreaming',
'Gender diversity',
'Transgender Person',
'Gender inequality',
'Gender pay gap',
'Transgender people',
'Gender psychology',
'Gender Identity Disorder',
'Gender and development',
'Transgender women']

Fachsuche=['']

"0000-0001-5406-9458""0000-0001-5449-4593",

We use it to query the OpenAlex API for works that specified the ORCID URL within their metadata in the field '`authorships.author.orcid`'.
 Since the API uses [pagination](https://docs.openalex.org/api/get-lists-of-entities#pagination), we need to loop through all pages to get the complete result set.

In [4]:
# OpenAlex endpoint to query for works
OPENALEX_API_WORKS = "https://api.openalex.org/works"

# query all works that are connected to orcid
def query_openalex_for_person2works(orcid):
    page = 1
    max_page = 1
    
    while page <= max_page:
        params = {'filter': 'authorships.author.orcid:'+orcid, 'page': page}
        response = requests.get(url=OPENALEX_API_WORKS,
                                params=params,
                                headers= {'Accept': 'application/json'})
        response.raise_for_status()
        result=response.json()

        # calculate max page number in first loop
        if max_page == 1:
            max_page = determine_max_page(result)
        page = page + 1
        yield result

# calculate max number of result pages
def determine_max_page(response_data):
    item_count = response_data['meta']['count']
    items_per_page = response_data['meta']['per_page']
    max_page_ceil = item_count // items_per_page + bool(item_count % items_per_page)
    return max_page_ceil


# ---- example execution
# list_of_pages=query_openalex_for_person2works(example_orcid)

From the resulting list of works we extract and print out title and DOI. 

*Note: works that do not have a DOI assigned, will not be printed.*

In [5]:
# from the result pages we get from the OpenAlex API, extract the data about works
def extract_works_from_page(page):
    return [work for work in page.get('results') or []]

# extract DOI from work
def extract_doi(work):
    doi=work.get('ids', {}).get('doi') or ""
    doi_id=doi.replace("https://doi.org/", "") if doi else doi
    title=work.get('display_name', "")
    concept=work.get('concepts')
    return doi_id, title, concept

def main_search(orcid):
    global Error_count
    # Abfrage der DOI Liste
    result_doi=[]
    count_doi=0
    list_of_pages=query_openalex_for_person2works(orcid)
    for page in list_of_pages or []:
        works=extract_works_from_page(page)
        for work in works or []:
            doi,title,concept=extract_doi(work)
            if doi:
                add=[]
                add.append(orcid)
                add.append(doi)
                add.append(title)
                add_concept=[]
                for item in concept:
                # Sublevel einfügen, am Besten alle Level!
                    all_concepts=[item['display_name'],'Level:'+str(item['level']),item['score']]
                    add_concept.append(all_concepts)
                add.append(add_concept)
                result_doi.append(add)
    # Beginn der Expertensuche            
    dict_gesamt={}
    dict_gesamt.update({'ID':orcid})
    dict_gesamt.update({'Count DOI:':count_doi})
    add=[]
    dedub_add=[]
    # Erstellen einer Liste von concepts aus der DOI Liste
    for item in result_doi:
        if orcid in item:
            count_doi=count_doi+1
            for item2 in item[3]:
                new=item2[0]
                add.append(new) 
        dict_gesamt.update({'Count DOI:':count_doi})
    # Dedublizierung der concepts liste
    for item in add:
        if item not in dedub_add:
            dedub_add.append(item)
    # Scores für jedes concept entwickeln
    for single_concept in dedub_add:
        score_concept=0
        concept_count=0
        for item in result_doi:
            for item2 in item[3]:
                if single_concept in item2[0]:
                    score_concept=score_concept+float(item2[2])
                    concept_count=concept_count+1
            if concept_count>0:
                final_score=score_concept/concept_count
                dict_gesamt.update({single_concept:final_score}) 
    print('#######',dict_gesamt)
    # Fehlersuche. Suche nach unmöglichen scores.
    dict_error=dict_gesamt.copy()
    del dict_error['ID']
    del dict_error['Count DOI:']
    error_check=dict_error.values()
    for item in error_check:
        if item >1:
            Error_count=Error_count+1
            print('############Error#############')
    # Abfrage nach Experten
    check=0
    expert=['### Orcid:', dict_gesamt['ID']]
    for item in Fachsuche:
        if item in dict_gesamt.keys() and dict_gesamt[item]>0.4:
            check=1
            print('Fach:', item,'Experte:',dict_gesamt['ID'],'Score:',dict_gesamt[item])
            add=['Fach:'+item,'Score:', dict_gesamt[item]]
            expert.append(add)
    if check ==1:
        list_experts.append(expert)

In [6]:
# main programm:
global Error_count
Error_count=0
global list_experts
list_experts=[]
for item in list_of_ids:
    main_search(item) 
print('###################################################')
print('######### Error Count:',Error_count, '#############')
print('Fach:', Fachsuche)
print('###################################################')
print('# Anzahl der Experten:', len(list_experts))
print('# Expertenliste inkl. scores:',list_experts)

####### {'ID': '0000-0001-5380-4449', 'Count DOI:': 0}
####### {'ID': '0000-0001-5406-9458', 'Count DOI:': 315, 'Medicine': 0.6345258715057915, 'Psoriasis': 0.7017200244444444, 'Epidemiology': 0.54295541, 'Incidence (geometry)': 0.54802386875, 'Systematic review': 0.564053518, 'Disease': 0.3067633352, 'MEDLINE': 0.4617296972727272, 'Family medicine': 0.4089597612903226, 'Dermatology': 0.48653290230769247, 'Pathology': 0.1005983243188406, 'Physics': 0.11925157371428571, 'Political science': 0.1483437942857143, 'Law': 0.030372287666666664, 'Optics': 0.09048074416666667, 'Biology': 0.2256722616086957, 'Enhancer': 0.76068138, 'Alpha chain': 0.6404971, 'Transcription factor': 0.488705815, 'Molecular biology': 0.59607356, 'DNA-binding protein': 0.5112319, 'Gene': 0.2297282303846154, 'Interleukin 10 receptor, alpha subunit': 0.63363745, 'Gene expression': 0.403453795, 'Transcription (linguistics)': 0.4428499, 'Regulation of gene expression': 0.43804314499999997, 'G alpha subunit': 0.37397757,

####### {'ID': '0000-0001-5449-4593', 'Count DOI:': 34, 'Biology': 0.7579469724242425, 'Saccharomyces cerevisiae': 0.6400680464, 'Cell biology': 0.567886374074074, 'Signal transduction': 0.49441865454545453, 'TOR signaling': 0.45588772, 'ASK1': 0.46630967, 'Kinase': 0.45080165, 'Protein kinase A': 0.430043698, 'Protein kinase C': 0.4604346333333334, 'Yeast': 0.5052071942105263, 'Biochemistry': 0.43648181666666663, 'Mitogen-activated protein kinase kinase': 0.3298098566666667, 'Formins': 0.7833252725000001, 'Hypha': 0.6013083475, 'Tip growth': 0.5994700500000001, 'Morphogenesis': 0.5223969833333333, 'Mutant': 0.5225844629411764, 'Cell polarity': 0.5630540399999999, 'Actin': 0.4455779939999999, 'Fungal protein': 0.4894607766666667, 'Actin cytoskeleton': 0.38267656000000005, 'Cytoskeleton': 0.3165084071428571, 'Botany': 0.1855292, 'Genetics': 0.3640302689473684, 'Gene': 0.3241019549130435, 'Pollen': 0.043456965, 'Pollination': 0.0, 'Cell': 0.43049750462745107, 'Pollen tube': 0.08691393, '

KeyboardInterrupt: 