# nl-metadata-stats

The `nl-stats` script retrieves the latest number of research outputs from OpenAIRE related to Dutch institutions and their associated data source systems. It uses the [beta graph API](https://graph.openaire.eu/docs/apis/graph-api/).


## initial setup

### 1. Get your API credentials as registered service
* [Register and login here: https://develop.openaire.eu/](https://develop.openaire.eu/)
* [Read more instructions here](https://graph.openaire.eu/docs/apis/authentication#registered-services)

### 2. Configure `config.yaml`
1. Rename `config-example.yaml` to `config.yaml`.
2. Add the following details to `config.yaml`:
   - `CLIENT_ID`: Your OpenAIRE client ID.
   - `CLIENT_SECRET`: Your OpenAIRE client secret.
   - `Org_data_file`: Path to the CSV file containing the list of Dutch institutions (e.g., `rpo_nl_list_test_20240201.csv`). This CSV fil contains at least a column named `ROR_LINK`, containgin the https formated url of the ROR id of an organisation. Find Research Organization Registry (ROR) ids here: https://ror.org 

## 1. Load `config.yaml`

Fetch the {ACCESS_TOKEN} by using the {CLIENT_ID} and {CLIENT_SECRET} in the `config.yaml` file

```
    load: config.yaml
```

In [None]:
import yaml
import requests
from requests.auth import HTTPBasicAuth

# Load the config.yaml file
with open('config.yaml', 'r') as file:
    config = yaml.safe_load(file)

CLIENT_ID = config['CLIENT_ID']
CLIENT_SECRET = config['CLIENT_SECRET']

# Fetch the ACCESS_TOKEN
auth_url = "https://aai.openaire.eu/oidc/token"
auth_response = requests.post(auth_url, data={
    'grant_type': 'client_credentials'
}, auth=HTTPBasicAuth(CLIENT_ID, CLIENT_SECRET))

if auth_response.status_code == 200:
    access_token = auth_response.json().get('access_token')
    print(f"ACCESS_TOKEN: {access_token}")
else:
    print(f"Failed to get access token: {auth_response.status_code}")
    access_token = None

## 2. Load data files with ROR Links

get the data file with all the Dutch institutions
```
    load: rpo_nl_list_test_20240201.csv
```

In [63]:
import pandas as pd

# Load the data file with all the Dutch institutions
org_data_file = config['Org_data_file']
df_orgs = pd.read_csv(org_data_file)

# Display the first few rows of the dataframe
df_orgs.head()

Unnamed: 0,full_name_in_English,acronym_EN,acronym_AGG,main_grouping,ROR,ROR_LINK,OpenAlex_ID,OpenAlex_LINK,OpenAIRE_ID,OpenAIRE_LINK
0,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...
1,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...
2,Radboud University Nijmegen,RU,RU/RadboudUMC,UNL,016xsfp80,https://ror.org/016xsfp80,I145872427,https://openalex.org/I145872427,openorgs____::a3af79fec4d09764e56cd6d4df1d976a,https://explore.openaire.eu/search/organizatio...
3,Radboud University Nijmegen,RU,RU/RadboudUMC,UNL,016xsfp80,https://ror.org/016xsfp80,I145872427,https://openalex.org/I145872427,pending_org_::b7587eb8d95a6c85a12ca952079a60b4,https://explore.openaire.eu/search/organizatio...
4,University of Groningen,RUG,RUG/UMCG,UNL,012p63287,https://ror.org/012p63287,I169381384,https://openalex.org/I169381384,openorgs____::81371ea94b1a09d3243e73d6ec3527ec,https://explore.openaire.eu/search/organizatio...


## 3. get the OpenAIRE Organisation ID

use the {ROR_LINK} of the institutions to get the OpenAIRE Organisation ID {OpenORG_ID}

```
    request: https://api.openaire.eu/graph/organizations?pid={ROR_LINK}

    result: OpenORGS_ID=$.results[].id (keep only id's that do have a prefix containing "openorgs")
```

In [64]:
# Define a function to get OpenAIRE Organisation ID using ROR_LINK
def get_openorg_id(ror_link, access_token):
    url = f"https://api.openaire.eu/graph/organizations?pid={ror_link}"
    headers = {
        "accept": "application/json",
        "Authorization": f"Bearer {access_token}"
    }
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        data = response.json()
        openorg_ids = [result['id'] for result in data['results'] if result['id'].startswith('openorgs')]
        return openorg_ids[0] if openorg_ids else None
    else:
        print(f"Failed to retrieve OpenAIRE Organisation ID for {ror_link}: {response.status_code}")
        return None

# Apply the function to the dataframe and create a new column for OpenAIRE Organisation ID
df_orgs['OpenAIRE_Org_ID'] = df_orgs['ROR_LINK'].apply(lambda x: get_openorg_id(x, access_token))

# Display the updated dataframe
df_orgs.head()

Unnamed: 0,full_name_in_English,acronym_EN,acronym_AGG,main_grouping,ROR,ROR_LINK,OpenAlex_ID,OpenAlex_LINK,OpenAIRE_ID,OpenAIRE_LINK,OpenAIRE_Org_ID
0,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,openorgs____::2f735203eb40d8389a881e874bee537a
1,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6
2,Radboud University Nijmegen,RU,RU/RadboudUMC,UNL,016xsfp80,https://ror.org/016xsfp80,I145872427,https://openalex.org/I145872427,openorgs____::a3af79fec4d09764e56cd6d4df1d976a,https://explore.openaire.eu/search/organizatio...,openorgs____::a3af79fec4d09764e56cd6d4df1d976a
3,Radboud University Nijmegen,RU,RU/RadboudUMC,UNL,016xsfp80,https://ror.org/016xsfp80,I145872427,https://openalex.org/I145872427,pending_org_::b7587eb8d95a6c85a12ca952079a60b4,https://explore.openaire.eu/search/organizatio...,openorgs____::a3af79fec4d09764e56cd6d4df1d976a
4,University of Groningen,RUG,RUG/UMCG,UNL,012p63287,https://ror.org/012p63287,I169381384,https://openalex.org/I169381384,openorgs____::81371ea94b1a09d3243e73d6ec3527ec,https://explore.openaire.eu/search/organizatio...,openorgs____::81371ea94b1a09d3243e73d6ec3527ec


In [65]:
# Add a new column 'OpenAIRE_Org_ID_Explore_URL'
df_orgs['OpenAIRE_Org_ID_Explore_URL'] = df_orgs['OpenAIRE_Org_ID'].apply(
    lambda x: f"https://explore.openaire.eu/search/organization?organizationId={x}" if pd.notnull(x) else None
)

# Display the updated dataframe
df_orgs.head()

Unnamed: 0,full_name_in_English,acronym_EN,acronym_AGG,main_grouping,ROR,ROR_LINK,OpenAlex_ID,OpenAlex_LINK,OpenAIRE_ID,OpenAIRE_LINK,OpenAIRE_Org_ID,OpenAIRE_Org_ID_Explore_URL
0,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...
1,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...
2,Radboud University Nijmegen,RU,RU/RadboudUMC,UNL,016xsfp80,https://ror.org/016xsfp80,I145872427,https://openalex.org/I145872427,openorgs____::a3af79fec4d09764e56cd6d4df1d976a,https://explore.openaire.eu/search/organizatio...,openorgs____::a3af79fec4d09764e56cd6d4df1d976a,https://explore.openaire.eu/search/organizatio...
3,Radboud University Nijmegen,RU,RU/RadboudUMC,UNL,016xsfp80,https://ror.org/016xsfp80,I145872427,https://openalex.org/I145872427,pending_org_::b7587eb8d95a6c85a12ca952079a60b4,https://explore.openaire.eu/search/organizatio...,openorgs____::a3af79fec4d09764e56cd6d4df1d976a,https://explore.openaire.eu/search/organizatio...
4,University of Groningen,RUG,RUG/UMCG,UNL,012p63287,https://ror.org/012p63287,I169381384,https://openalex.org/I169381384,openorgs____::81371ea94b1a09d3243e73d6ec3527ec,https://explore.openaire.eu/search/organizatio...,openorgs____::81371ea94b1a09d3243e73d6ec3527ec,https://explore.openaire.eu/search/organizatio...


## 4. get the number of Research products associated to the organisation

use the {OpenOrgs_ID} to get the number of Research products associated to the organisation
```
    request: https://api.openaire.eu/graph/researchProducts?relOrganizationId={OpenOrgs_ID}

    result: numFound_ResearchProducts_OpenOrgs=$.header.numFound
```

In [66]:
# Define a function to get the number of research products associated with an organization
def get_num_research_products(openorg_id, access_token):
    url = f"https://api.openaire.eu/graph/researchProducts?relOrganizationId={openorg_id}"
    headers = {
        "accept": "application/json",
        "Authorization": f"Bearer {access_token}"
    }
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        data = response.json()
        return data['header']['numFound']
    else:
        print(f"Failed to retrieve research products for {openorg_id}: {response.status_code}")
        return None

# Apply the function to the dataframe and create a new column for the number of research products
df_orgs['numFound_ResearchProducts_OpenOrgs'] = df_orgs['OpenAIRE_Org_ID'].apply(lambda x: get_num_research_products(x, access_token) if x else None)

# Display the updated dataframe
df_orgs.head()

Unnamed: 0,full_name_in_English,acronym_EN,acronym_AGG,main_grouping,ROR,ROR_LINK,OpenAlex_ID,OpenAlex_LINK,OpenAIRE_ID,OpenAIRE_LINK,OpenAIRE_Org_ID,OpenAIRE_Org_ID_Explore_URL,numFound_ResearchProducts_OpenOrgs
0,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,233181.0
1,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,14584.0
2,Radboud University Nijmegen,RU,RU/RadboudUMC,UNL,016xsfp80,https://ror.org/016xsfp80,I145872427,https://openalex.org/I145872427,openorgs____::a3af79fec4d09764e56cd6d4df1d976a,https://explore.openaire.eu/search/organizatio...,openorgs____::a3af79fec4d09764e56cd6d4df1d976a,https://explore.openaire.eu/search/organizatio...,261058.0
3,Radboud University Nijmegen,RU,RU/RadboudUMC,UNL,016xsfp80,https://ror.org/016xsfp80,I145872427,https://openalex.org/I145872427,pending_org_::b7587eb8d95a6c85a12ca952079a60b4,https://explore.openaire.eu/search/organizatio...,openorgs____::a3af79fec4d09764e56cd6d4df1d976a,https://explore.openaire.eu/search/organizatio...,261058.0
4,University of Groningen,RUG,RUG/UMCG,UNL,012p63287,https://ror.org/012p63287,I169381384,https://openalex.org/I169381384,openorgs____::81371ea94b1a09d3243e73d6ec3527ec,https://explore.openaire.eu/search/organizatio...,openorgs____::81371ea94b1a09d3243e73d6ec3527ec,https://explore.openaire.eu/search/organizatio...,122895.0


## 5. get the number of Projects associated to the organisation

use the {OpenOrgs_ID} to get the number of Projects associated to the organisation
```
    request: https://api.openaire.eu/graph/projects?relOrganizationId={OpenOrgs_ID}

    result: numFound_ResearchProjects_OpenOrgs=$.header.numFound

```

In [67]:
# Define a function to get the number of projects associated with an organization
def get_num_projects(openorg_id, access_token):
    url = f"https://api.openaire.eu/graph/projects?relOrganizationId={openorg_id}"
    headers = {
        "accept": "application/json",
        "Authorization": f"Bearer {access_token}"
    }
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        data = response.json()
        return data['header']['numFound']
    else:
        print(f"Failed to retrieve projects for {openorg_id}: {response.status_code}")
        return None

# Apply the function to the dataframe and create a new column for the number of projects
df_orgs['numFound_ResearchProjects_OpenOrgs'] = df_orgs['OpenAIRE_Org_ID'].apply(lambda x: get_num_projects(x, access_token) if x else None)

# Display the updated dataframe
df_orgs.head()

Unnamed: 0,full_name_in_English,acronym_EN,acronym_AGG,main_grouping,ROR,ROR_LINK,OpenAlex_ID,OpenAlex_LINK,OpenAIRE_ID,OpenAIRE_LINK,OpenAIRE_Org_ID,OpenAIRE_Org_ID_Explore_URL,numFound_ResearchProducts_OpenOrgs,numFound_ResearchProjects_OpenOrgs
0,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,233181.0,297.0
1,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,14584.0,62.0
2,Radboud University Nijmegen,RU,RU/RadboudUMC,UNL,016xsfp80,https://ror.org/016xsfp80,I145872427,https://openalex.org/I145872427,openorgs____::a3af79fec4d09764e56cd6d4df1d976a,https://explore.openaire.eu/search/organizatio...,openorgs____::a3af79fec4d09764e56cd6d4df1d976a,https://explore.openaire.eu/search/organizatio...,261058.0,54.0
3,Radboud University Nijmegen,RU,RU/RadboudUMC,UNL,016xsfp80,https://ror.org/016xsfp80,I145872427,https://openalex.org/I145872427,pending_org_::b7587eb8d95a6c85a12ca952079a60b4,https://explore.openaire.eu/search/organizatio...,openorgs____::a3af79fec4d09764e56cd6d4df1d976a,https://explore.openaire.eu/search/organizatio...,261058.0,54.0
4,University of Groningen,RUG,RUG/UMCG,UNL,012p63287,https://ror.org/012p63287,I169381384,https://openalex.org/I169381384,openorgs____::81371ea94b1a09d3243e73d6ec3527ec,https://explore.openaire.eu/search/organizatio...,openorgs____::81371ea94b1a09d3243e73d6ec3527ec,https://explore.openaire.eu/search/organizatio...,122895.0,612.0


## 6. get the Data sources related to Organisation

use the {OpenOrgs_ID} to get the Data sources related to Organisation

```
    request: https://api.openaire.eu/graph/dataSources?relOrganizationId={OpenOrgs_ID}

    results: for each $.results DataSource_ID=$.results[].id , DataSource_Name=$.results[].officialName , DataSource_Compatibility=.results[].openaireCompatibility , DataSource_LastValidated=$.results[].dateOfValidation , DataSource_URL=$.results[].websiteUrl
```

In [68]:
# Define a function to get data sources related to an organization
def get_data_sources(openorg_id, access_token):
    url = f"https://api.openaire.eu/graph/dataSources?relOrganizationId={openorg_id}"
    headers = {
        "accept": "application/json",
        "Authorization": f"Bearer {access_token}"
    }
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        data = response.json()
        data_sources = []
        for result in data['results']:
            data_sources.append({
                'DataSource_ID': result['id'],
                'DataSource_Name': result['officialName'],
                'DataSource_Compatibility': result['openaireCompatibility'],
                'DataSource_LastValidated': result['dateOfValidation'],
                'DataSource_URL': result['websiteUrl']
            })
        return data_sources
    else:
        print(f"Failed to retrieve data sources for {openorg_id}: {response.status_code}")
        return None

# Apply the function to the dataframe and create a new column for data sources
df_orgs['DataSources'] = df_orgs['OpenAIRE_Org_ID'].apply(lambda x: get_data_sources(x, access_token) if x else None)

# Display the updated dataframe
df_orgs.head()

Unnamed: 0,full_name_in_English,acronym_EN,acronym_AGG,main_grouping,ROR,ROR_LINK,OpenAlex_ID,OpenAlex_LINK,OpenAIRE_ID,OpenAIRE_LINK,OpenAIRE_Org_ID,OpenAIRE_Org_ID_Explore_URL,numFound_ResearchProducts_OpenOrgs,numFound_ResearchProjects_OpenOrgs,DataSources
0,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,233181.0,297.0,[{'DataSource_ID': 're3data_____::5cc3941d58ed...
1,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,14584.0,62.0,[{'DataSource_ID': 'doajarticles::582567e92cab...
2,Radboud University Nijmegen,RU,RU/RadboudUMC,UNL,016xsfp80,https://ror.org/016xsfp80,I145872427,https://openalex.org/I145872427,openorgs____::a3af79fec4d09764e56cd6d4df1d976a,https://explore.openaire.eu/search/organizatio...,openorgs____::a3af79fec4d09764e56cd6d4df1d976a,https://explore.openaire.eu/search/organizatio...,261058.0,54.0,[{'DataSource_ID': 'MetisRadboud::f66f1bd36967...
3,Radboud University Nijmegen,RU,RU/RadboudUMC,UNL,016xsfp80,https://ror.org/016xsfp80,I145872427,https://openalex.org/I145872427,pending_org_::b7587eb8d95a6c85a12ca952079a60b4,https://explore.openaire.eu/search/organizatio...,openorgs____::a3af79fec4d09764e56cd6d4df1d976a,https://explore.openaire.eu/search/organizatio...,261058.0,54.0,[{'DataSource_ID': 'MetisRadboud::f66f1bd36967...
4,University of Groningen,RUG,RUG/UMCG,UNL,012p63287,https://ror.org/012p63287,I169381384,https://openalex.org/I169381384,openorgs____::81371ea94b1a09d3243e73d6ec3527ec,https://explore.openaire.eu/search/organizatio...,openorgs____::81371ea94b1a09d3243e73d6ec3527ec,https://explore.openaire.eu/search/organizatio...,122895.0,612.0,[{'DataSource_ID': 'opendoar____::33e8075e9970...


In [69]:
# Create a new dataframe with OpenAIRE_Org_ID and DataSources
df_data_sources = df_orgs[['OpenAIRE_Org_ID', 'DataSources']].explode('DataSources').reset_index(drop=True)

# Normalize the DataSources column to separate columns
df_data_sources = pd.concat([df_data_sources.drop(['DataSources'], axis=1), df_data_sources['DataSources'].apply(pd.Series)], axis=1)

# Display the new dataframe
df_data_sources.head()

Unnamed: 0,OpenAIRE_Org_ID,DataSource_ID,DataSource_Name,DataSource_Compatibility,DataSource_LastValidated,DataSource_URL,0
0,openorgs____::2f735203eb40d8389a881e874bee537a,re3data_____::5cc3941d58ed76dbfafb47ff82e339c0,EUR Data Repository,Not yet registered,,https://datarepository.eur.nl/,
1,openorgs____::2f735203eb40d8389a881e874bee537a,eurocrisdris::f4d76177b6b0596c89958861709d3c77,Erasmus University Rotterdam - Research Inform...,OpenAIRE CRIS v1.1,,https://pure.eur.nl/,
2,openorgs____::2f735203eb40d8389a881e874bee537a,opendoar____::9c3b1830513cc3b8fc4b76635d32e692,Erasmus University Institutional Repository,collected from a compatible aggregator,,http://repub.eur.nl/,
3,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,doajarticles::582567e92cabfad308e83658f3844d51,Locus,collected from a compatible aggregator,,https://locus.ou.nl/,
4,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,opendoar____::147540e129e096fa91700e9db6588354,Open University of the Netherlands Research Po...,"OpenAIRE 3.0 (OA, funding)",,https://research.ou.nl/,


In [70]:
# Drop the last column '0' if it exists
if '0' in df_data_sources.columns:
    df_data_sources = df_data_sources.drop(columns=['0'])

# Add a new column 'DataSource_Explore_URL'
df_data_sources['DataSource_Explore_URL'] = df_data_sources['DataSource_ID'].apply(
    lambda x: f"https://explore.openaire.eu/search/dataprovider?datasourceId={x}" if pd.notnull(x) else None
)

# Display the updated dataframe
df_data_sources.head()

Unnamed: 0,OpenAIRE_Org_ID,DataSource_ID,DataSource_Name,DataSource_Compatibility,DataSource_LastValidated,DataSource_URL,0,DataSource_Explore_URL
0,openorgs____::2f735203eb40d8389a881e874bee537a,re3data_____::5cc3941d58ed76dbfafb47ff82e339c0,EUR Data Repository,Not yet registered,,https://datarepository.eur.nl/,,https://explore.openaire.eu/search/dataprovide...
1,openorgs____::2f735203eb40d8389a881e874bee537a,eurocrisdris::f4d76177b6b0596c89958861709d3c77,Erasmus University Rotterdam - Research Inform...,OpenAIRE CRIS v1.1,,https://pure.eur.nl/,,https://explore.openaire.eu/search/dataprovide...
2,openorgs____::2f735203eb40d8389a881e874bee537a,opendoar____::9c3b1830513cc3b8fc4b76635d32e692,Erasmus University Institutional Repository,collected from a compatible aggregator,,http://repub.eur.nl/,,https://explore.openaire.eu/search/dataprovide...
3,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,doajarticles::582567e92cabfad308e83658f3844d51,Locus,collected from a compatible aggregator,,https://locus.ou.nl/,,https://explore.openaire.eu/search/dataprovide...
4,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,opendoar____::147540e129e096fa91700e9db6588354,Open University of the Netherlands Research Po...,"OpenAIRE 3.0 (OA, funding)",,https://research.ou.nl/,,https://explore.openaire.eu/search/dataprovide...


In [71]:
# Combine the data frames using the OpenAIRE_Org_ID
df_combined = pd.merge(df_orgs, df_data_sources, on='OpenAIRE_Org_ID', how='inner', suffixes=('_orgs', '_data_sources'))

# Display the combined dataframe
df_combined.head()

Unnamed: 0,full_name_in_English,acronym_EN,acronym_AGG,main_grouping,ROR,ROR_LINK,OpenAlex_ID,OpenAlex_LINK,OpenAIRE_ID,OpenAIRE_LINK,...,numFound_ResearchProducts_OpenOrgs,numFound_ResearchProjects_OpenOrgs,DataSources,DataSource_ID,DataSource_Name,DataSource_Compatibility,DataSource_LastValidated,DataSource_URL,0,DataSource_Explore_URL
0,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,...,233181.0,297.0,[{'DataSource_ID': 're3data_____::5cc3941d58ed...,re3data_____::5cc3941d58ed76dbfafb47ff82e339c0,EUR Data Repository,Not yet registered,,https://datarepository.eur.nl/,,https://explore.openaire.eu/search/dataprovide...
1,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,...,233181.0,297.0,[{'DataSource_ID': 're3data_____::5cc3941d58ed...,eurocrisdris::f4d76177b6b0596c89958861709d3c77,Erasmus University Rotterdam - Research Inform...,OpenAIRE CRIS v1.1,,https://pure.eur.nl/,,https://explore.openaire.eu/search/dataprovide...
2,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,...,233181.0,297.0,[{'DataSource_ID': 're3data_____::5cc3941d58ed...,opendoar____::9c3b1830513cc3b8fc4b76635d32e692,Erasmus University Institutional Repository,collected from a compatible aggregator,,http://repub.eur.nl/,,https://explore.openaire.eu/search/dataprovide...
3,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,...,14584.0,62.0,[{'DataSource_ID': 'doajarticles::582567e92cab...,doajarticles::582567e92cabfad308e83658f3844d51,Locus,collected from a compatible aggregator,,https://locus.ou.nl/,,https://explore.openaire.eu/search/dataprovide...
4,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,...,14584.0,62.0,[{'DataSource_ID': 'doajarticles::582567e92cab...,opendoar____::147540e129e096fa91700e9db6588354,Open University of the Netherlands Research Po...,"OpenAIRE 3.0 (OA, funding)",,https://research.ou.nl/,,https://explore.openaire.eu/search/dataprovide...


## 7. get the number of Research products associated to the DataSource

use the {DataSource_ID} to get the number of Research products associated to the DataSource

```
    request: https://api.openaire.eu/graph/researchProducts?relCollectedFromDatasourceId={DataSource_ID}
   
    result: numFound_ResearchProducts_DataSource=$.header.numFound
```

In [72]:
# Define a function to get the number of research products associated with a data source
def get_num_research_products_datasource(datasource_id, access_token):
    url = f"https://api.openaire.eu/graph/researchProducts?relCollectedFromDatasourceId={datasource_id}"
    headers = {
        "accept": "application/json",
        "Authorization": f"Bearer {access_token}"
    }
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        data = response.json()
        return data['header']['numFound']
    else:
        print(f"Failed to retrieve research products for data source {datasource_id}: {response.status_code}")
        return None

# Apply the function to the dataframe and create a new column for the number of research products
df_combined['numFound_ResearchProducts_DataSource'] = df_combined['DataSource_ID'].apply(lambda x: get_num_research_products_datasource(x, access_token) if x else None)

# Display the updated dataframe
df_combined.head()

Unnamed: 0,full_name_in_English,acronym_EN,acronym_AGG,main_grouping,ROR,ROR_LINK,OpenAlex_ID,OpenAlex_LINK,OpenAIRE_ID,OpenAIRE_LINK,...,numFound_ResearchProjects_OpenOrgs,DataSources,DataSource_ID,DataSource_Name,DataSource_Compatibility,DataSource_LastValidated,DataSource_URL,0,DataSource_Explore_URL,numFound_ResearchProducts_DataSource
0,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,...,297.0,[{'DataSource_ID': 're3data_____::5cc3941d58ed...,re3data_____::5cc3941d58ed76dbfafb47ff82e339c0,EUR Data Repository,Not yet registered,,https://datarepository.eur.nl/,,https://explore.openaire.eu/search/dataprovide...,0
1,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,...,297.0,[{'DataSource_ID': 're3data_____::5cc3941d58ed...,eurocrisdris::f4d76177b6b0596c89958861709d3c77,Erasmus University Rotterdam - Research Inform...,OpenAIRE CRIS v1.1,,https://pure.eur.nl/,,https://explore.openaire.eu/search/dataprovide...,200318
2,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,...,297.0,[{'DataSource_ID': 're3data_____::5cc3941d58ed...,opendoar____::9c3b1830513cc3b8fc4b76635d32e692,Erasmus University Institutional Repository,collected from a compatible aggregator,,http://repub.eur.nl/,,https://explore.openaire.eu/search/dataprovide...,0
3,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,...,62.0,[{'DataSource_ID': 'doajarticles::582567e92cab...,doajarticles::582567e92cabfad308e83658f3844d51,Locus,collected from a compatible aggregator,,https://locus.ou.nl/,,https://explore.openaire.eu/search/dataprovide...,0
4,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,...,62.0,[{'DataSource_ID': 'doajarticles::582567e92cab...,opendoar____::147540e129e096fa91700e9db6588354,Open University of the Netherlands Research Po...,"OpenAIRE 3.0 (OA, funding)",,https://research.ou.nl/,,https://explore.openaire.eu/search/dataprovide...,13683


## 8. get the number of Research products in the Data Source AND the associated Organisation
use the {OpenOrgs_ID} and the {DataSource_ID} to get the number of Research products in the Data Source that is associated to its Organisation

```
    request: https://api.openaire.eu/graph/researchProducts?relOrganizationId={OpenOrgs_ID}&relCollectedFromDatasourceId={DataSource_ID}
    
    result: numFound_ResearchProducts_DataSource_AND_OpenOrgs=$.header.numFound
```

In [73]:
# Define a function to get the number of research products in the Data Source that is associated with its Organisation
def get_num_research_products_datasource_and_org(openorg_id, datasource_id, access_token):
    url = f"https://api.openaire.eu/graph/researchProducts?relOrganizationId={openorg_id}&relCollectedFromDatasourceId={datasource_id}"
    headers = {
        "accept": "application/json",
        "Authorization": f"Bearer {access_token}"
    }
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        data = response.json()
        return data['header']['numFound']
    else:
        print(f"Failed to retrieve research products for organization {openorg_id} and data source {datasource_id}: {response.status_code}")
        return None

# Apply the function to the dataframe and create a new column for the number of research products in the Data Source and the associated Organisation
df_combined['numFound_ResearchProducts_DataSource_AND_OpenOrgs'] = df_combined.apply(
    lambda row: get_num_research_products_datasource_and_org(row['OpenAIRE_Org_ID'], row['DataSource_ID'], access_token) if row['OpenAIRE_Org_ID'] and row['DataSource_ID'] and row['numFound_ResearchProducts_DataSource'] != 0 else None, axis=1
)

# Display the updated dataframe
df_combined.head()

Unnamed: 0,full_name_in_English,acronym_EN,acronym_AGG,main_grouping,ROR,ROR_LINK,OpenAlex_ID,OpenAlex_LINK,OpenAIRE_ID,OpenAIRE_LINK,...,DataSources,DataSource_ID,DataSource_Name,DataSource_Compatibility,DataSource_LastValidated,DataSource_URL,0,DataSource_Explore_URL,numFound_ResearchProducts_DataSource,numFound_ResearchProducts_DataSource_AND_OpenOrgs
0,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,...,[{'DataSource_ID': 're3data_____::5cc3941d58ed...,re3data_____::5cc3941d58ed76dbfafb47ff82e339c0,EUR Data Repository,Not yet registered,,https://datarepository.eur.nl/,,https://explore.openaire.eu/search/dataprovide...,0,
1,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,...,[{'DataSource_ID': 're3data_____::5cc3941d58ed...,eurocrisdris::f4d76177b6b0596c89958861709d3c77,Erasmus University Rotterdam - Research Inform...,OpenAIRE CRIS v1.1,,https://pure.eur.nl/,,https://explore.openaire.eu/search/dataprovide...,200318,200122.0
2,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,...,[{'DataSource_ID': 're3data_____::5cc3941d58ed...,opendoar____::9c3b1830513cc3b8fc4b76635d32e692,Erasmus University Institutional Repository,collected from a compatible aggregator,,http://repub.eur.nl/,,https://explore.openaire.eu/search/dataprovide...,0,
3,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,...,[{'DataSource_ID': 'doajarticles::582567e92cab...,doajarticles::582567e92cabfad308e83658f3844d51,Locus,collected from a compatible aggregator,,https://locus.ou.nl/,,https://explore.openaire.eu/search/dataprovide...,0,
4,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,...,[{'DataSource_ID': 'doajarticles::582567e92cab...,opendoar____::147540e129e096fa91700e9db6588354,Open University of the Netherlands Research Po...,"OpenAIRE 3.0 (OA, funding)",,https://research.ou.nl/,,https://explore.openaire.eu/search/dataprovide...,13683,13683.0


## 9. missing number of Research products in Data Source

calculate the missing number of Research products in the Data source

```
    result: numMissing_ResearchProducts_in_DataSource={numFound_ResearchProducts_DataSource}-{numFound_ResearchProducts_DataSource_AND_OpenOrgs}
```

In [74]:
# Calculate the missing number of research products in the data source
df_combined['numMissing_ResearchProducts_in_DataSource'] = df_combined['numFound_ResearchProducts_DataSource'] - df_combined['numFound_ResearchProducts_DataSource_AND_OpenOrgs']

# Display the updated dataframe
df_combined.head()

Unnamed: 0,full_name_in_English,acronym_EN,acronym_AGG,main_grouping,ROR,ROR_LINK,OpenAlex_ID,OpenAlex_LINK,OpenAIRE_ID,OpenAIRE_LINK,...,DataSource_ID,DataSource_Name,DataSource_Compatibility,DataSource_LastValidated,DataSource_URL,0,DataSource_Explore_URL,numFound_ResearchProducts_DataSource,numFound_ResearchProducts_DataSource_AND_OpenOrgs,numMissing_ResearchProducts_in_DataSource
0,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,...,re3data_____::5cc3941d58ed76dbfafb47ff82e339c0,EUR Data Repository,Not yet registered,,https://datarepository.eur.nl/,,https://explore.openaire.eu/search/dataprovide...,0,,
1,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,...,eurocrisdris::f4d76177b6b0596c89958861709d3c77,Erasmus University Rotterdam - Research Inform...,OpenAIRE CRIS v1.1,,https://pure.eur.nl/,,https://explore.openaire.eu/search/dataprovide...,200318,200122.0,196.0
2,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,...,opendoar____::9c3b1830513cc3b8fc4b76635d32e692,Erasmus University Institutional Repository,collected from a compatible aggregator,,http://repub.eur.nl/,,https://explore.openaire.eu/search/dataprovide...,0,,
3,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,...,doajarticles::582567e92cabfad308e83658f3844d51,Locus,collected from a compatible aggregator,,https://locus.ou.nl/,,https://explore.openaire.eu/search/dataprovide...,0,,
4,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,...,opendoar____::147540e129e096fa91700e9db6588354,Open University of the Netherlands Research Po...,"OpenAIRE 3.0 (OA, funding)",,https://research.ou.nl/,,https://explore.openaire.eu/search/dataprovide...,13683,13683.0,0.0


## 10. missing number of Research products associated to Organisation

calculate the the missing number of Research products that should be associated to the Organisation.

```
    result: numMissing_ResearchProducts_in_OpenOrgs={numFound_ResearchProducts_OpenOrgs}-{numFound_ResearchProducts_DataSource_AND_OpenOrgs}
```

In [75]:
# Calculate the missing number of research products that should be associated with the Organisation
df_combined['numMissing_ResearchProducts_in_OpenOrgs'] = df_combined.apply(
    lambda row: df_orgs.loc[df_orgs['OpenAIRE_Org_ID'] == row['OpenAIRE_Org_ID'], 'numFound_ResearchProducts_OpenOrgs'].values[0] - row['numFound_ResearchProducts_DataSource_AND_OpenOrgs'] if row['OpenAIRE_Org_ID'] and row['numFound_ResearchProducts_DataSource_AND_OpenOrgs'] is not None else None, axis=1
)

# Display the updated dataframe
df_combined.head()

Unnamed: 0,full_name_in_English,acronym_EN,acronym_AGG,main_grouping,ROR,ROR_LINK,OpenAlex_ID,OpenAlex_LINK,OpenAIRE_ID,OpenAIRE_LINK,...,DataSource_Name,DataSource_Compatibility,DataSource_LastValidated,DataSource_URL,0,DataSource_Explore_URL,numFound_ResearchProducts_DataSource,numFound_ResearchProducts_DataSource_AND_OpenOrgs,numMissing_ResearchProducts_in_DataSource,numMissing_ResearchProducts_in_OpenOrgs
0,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,...,EUR Data Repository,Not yet registered,,https://datarepository.eur.nl/,,https://explore.openaire.eu/search/dataprovide...,0,,,
1,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,...,Erasmus University Rotterdam - Research Inform...,OpenAIRE CRIS v1.1,,https://pure.eur.nl/,,https://explore.openaire.eu/search/dataprovide...,200318,200122.0,196.0,33059.0
2,Erasmus University Rotterdam,EUR,EUR/ErasmusMC,UNL,057w15z03,https://ror.org/057w15z03,I913958620,https://openalex.org/I913958620,openorgs____::2f735203eb40d8389a881e874bee537a,https://explore.openaire.eu/search/organizatio...,...,Erasmus University Institutional Repository,collected from a compatible aggregator,,http://repub.eur.nl/,,https://explore.openaire.eu/search/dataprovide...,0,,,
3,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,...,Locus,collected from a compatible aggregator,,https://locus.ou.nl/,,https://explore.openaire.eu/search/dataprovide...,0,,,
4,Open University,OU,OU,UNL,018dfmf50,https://ror.org/018dfmf50,I7876267,https://openalex.org/I7876267,openorgs____::54f2e88f3eb801dc7e49a4ca90fdd1b6,https://explore.openaire.eu/search/organizatio...,...,Open University of the Netherlands Research Po...,"OpenAIRE 3.0 (OA, funding)",,https://research.ou.nl/,,https://explore.openaire.eu/search/dataprovide...,13683,13683.0,0.0,901.0


## 11. Write output CSV

write a timestamped csv file (a column 'retrieved on' with the timestamp, and the timestamp on the filename yyyy-mm-dd_HH-MM_nl-stats.csv)

In [76]:
from datetime import datetime

# Add a 'retrieved on' column with the current timestamp
df_combined['retrieved on'] = datetime.now().strftime('%Y-%m-%d %H:%M:%S')

# Generate the timestamped filename
timestamp = datetime.now().strftime('%Y-%m-%d_%H-%M')
output_filename = f"nl-metadata-stats_{timestamp}_for_{org_data_file}"

# Write the dataframe to a CSV file
df_combined.to_csv(output_filename, index=False)

print(f"Dataframe written to {output_filename}")

Dataframe written to nl-metadata-stats_2025-03-01_06-13_for_rpo_nl_list_long_20240201.csv
