In [1]:
from datetime import datetime
datetime.now()

datetime.datetime(2025, 3, 10, 10, 50, 58, 666709)

# ESGVOC library tutorial

prerequesite:
```bash
pip install esgvoc  
esgvoc install # in order to get the latest CVs
```



The esgvoc library supports a wide range of use cases, including:
* Listing:  
All data descriptors from the universe.  
All terms of one data descriptor from the universe.  
All available projects.  
All collections from a project.  
All terms from a project.  
All terms of a collection from a project.

* Validating an input string against:  
All terms of a project.  
All terms of a collection from a project.  
All terms from all projects (cross-validation).


## Universe and projects organization

The universe CV (Controlled Vocabularies) follows this organizational pattern:
```bash
<universe><DataDescriptor><Term>
```
Similarly, all CVs are organized as:

```bash
<project><collection><Term>   
```

## ESGVOC API organization

The API functions are sorted as follows:

- **get** functions return a list of something (collections from a project, terms from a collection, etc.)
- **find** functions try to find terms corresponding to an input string.  
- **valid** functions check the compliance of an input string to the DRS of terms.

In [2]:
import esgvoc.api as ev

## Universe 

### Listing 

In [3]:
ev.get_all_data_descriptors_in_universe()

['physic_index',
 'realisation_index',
 'temporal_label',
 'mip_era',
 'horizontal_label',
 'directory_date',
 'initialisation_index',
 'sub_experiment',
 'forcing_index',
 'consortium',
 'license',
 'variable',
 'frequency',
 'source_type',
 'activity',
 'vertical_label',
 'source',
 'date',
 'branded_variable',
 'model_component',
 'product',
 'institution',
 'resolution',
 'time_range',
 'table',
 'variant_label',
 'branded_suffix',
 'organisation',
 'experiment',
 'area_label',
 'realm',
 'grid']

In [4]:
ev.get_all_terms_in_data_descriptor(data_descriptor_id="activity")[:3] 
# each datadescriptor from the above cell could be use as argument
# [:3] just to limit the result with the 3 first one

[Activity(id='dynvarmip', type='activity', drs_name='DynVarMIP', name='DynVarMIP', long_name='Dynamics and Variability Model Intercomparison Project', url=None, @context='000_context.jsonld', cmip_acronym='DynVarMIP'),
 Activity(id='lumip', type='activity', drs_name='LUMIP', name='LUMIP', long_name='Land-Use Model Intercomparison Project', url=None, @context='000_context.jsonld', cmip_acronym='LUMIP'),
 Activity(id='pmip', type='activity', drs_name='PMIP', name='PMIP', long_name='Palaeoclimate Modelling Intercomparison Project', url=None, @context='000_context.jsonld', cmip_acronym='PMIP')]

In [5]:
ev.find_terms_in_data_descriptor(data_descriptor_id="activity", term_id="aerchemmip") 

[Activity(id='aerchemmip', type='activity', drs_name='AerChemMIP', name='AerChemMIP', long_name='Aerosols and Chemistry Model Intercomparison Project', url=None, @context='000_context.jsonld', cmip_acronym='AerChemMIP')]

### Little detour: pydantic model instance return


The result of the previous call is a list of instances of a pydantic model of the requested data descriptor. From the above example, the result is an **Activity** object that can be query directly in Python.

In [6]:
my_activity = ev.find_terms_in_data_descriptor(data_descriptor_id="activity", term_id="aerchemmip")[0]
print(my_activity.id)
print(my_activity.drs_name)
print(my_activity.long_name)
print(my_activity)


aerchemmip
AerChemMIP
Aerosols and Chemistry Model Intercomparison Project
id='aerchemmip' type='activity' drs_name='AerChemMIP' name='AerChemMIP' long_name='Aerosols and Chemistry Model Intercomparison Project' url=None @context='000_context.jsonld' cmip_acronym='AerChemMIP'


In [7]:
ev.find_terms_in_universe(term_id="aerchemmip") # give the same result as above

[Activity(id='aerchemmip', type='activity', drs_name='AerChemMIP', name='AerChemMIP', long_name='Aerosols and Chemistry Model Intercomparison Project', url=None, @context='000_context.jsonld', cmip_acronym='AerChemMIP')]

## Project example: CMIP6plus

In [8]:
ev.get_all_projects()

['cmip6', 'cmip6plus']

In [9]:
ev.get_all_collections_in_project(project_id="cmip6plus")

['member_id',
 'activity_id',
 'mip_era',
 'institution_id',
 'source_id',
 'time_range',
 'version',
 'table_id',
 'grid_label',
 'experiment_id',
 'variable_id']

In [10]:
ev.get_all_terms_in_collection(project_id="cmip6plus", collection_id="activity_id")

[Activity(id='cmip', type='activity', drs_name='CMIP', name='CMIP', long_name='CMIP DECK: 1pctCO2, abrupt4xCO2, amip, esm-piControl, esm-historical, historical, and piControl experiments', url='https://gmd.copernicus.org/articles/9/1937/2016/gmd-9-1937-2016.pdf', @context='000_context.jsonld', cmip_acronym='CMIP'),
 Activity(id='lesfmip', type='activity', drs_name='LESFMIP', name='LESFMIP', long_name='The Large Ensemble Single Forcing Model Intercomparison Project', url='https://www.frontiersin.org/articles/10.3389/fclim.2022.955414/full', @context='000_context.jsonld', cmip_acronym='LESFMIP')]

In [11]:
ev.find_terms_in_collection(project_id="cmip6plus", collection_id="activity_id", term_id="cmip")

[Activity(id='cmip', type='activity', drs_name='CMIP', name='CMIP', long_name='CMIP DECK: 1pctCO2, abrupt4xCO2, amip, esm-piControl, esm-historical, historical, and piControl experiments', url='https://gmd.copernicus.org/articles/9/1937/2016/gmd-9-1937-2016.pdf', @context='000_context.jsonld', cmip_acronym='CMIP')]

## Validating string against the project CV

In [12]:
valid_string = "IPSL" # the standard name of the institution : "Institut Pierre Simon Laplace"
unvalid_string = "ipsl" # NOT the DRS name ! but in that case it is the 'id' of the term 

### Queries based on the project and the collection ids

In [13]:
ev.valid_term_in_collection(value=valid_string, project_id="cmip6plus", collection_id="institution_id")

[MatchingTerm(project_id='cmip6plus', collection_id='institution_id', term_id='ipsl')]

In [14]:
ev.valid_term_in_collection(value=unvalid_string, project_id="cmip6plus", collection_id="institution_id")

[]

In [15]:
if ev.valid_term_in_collection(value=valid_string, project_id="cmip6plus", collection_id="institution_id"):
    print("Valid")
else:
    print("Unvalid")

Valid


In [16]:
if ev.valid_term_in_collection(value=unvalid_string, project_id="cmip6plus", collection_id="institution_id"):
    print("Valid")
else:
    print("Unvalid")

Unvalid


### Queries based only on the project id

In [17]:
ev.valid_term_in_project(value=valid_string, project_id="cmip6plus")

[MatchingTerm(project_id='cmip6plus', collection_id='institution_id', term_id='ipsl')]

### Across all projects


In [18]:
print(ev.valid_term_in_all_projects(value=valid_string))
print(ev.valid_term_in_all_projects(value=unvalid_string))

[MatchingTerm(project_id='cmip6', collection_id='institution_id', term_id='ipsl'), MatchingTerm(project_id='cmip6plus', collection_id='institution_id', term_id='ipsl')]
[]
