![PANGAEA_Banner.png](https://github.com/pangaea-data-publisher/community-workshop-material/raw/master/banner.png)

# Detailed examples on metadata extraction with PANGAEApy  
By: Kathrin Riemann-Campe  
Last updated: 2025-05-07  

This notebook will show you the available list of metadata, which is callable via PANGAEApy PanDataSet version 1.0.22.  

If you are looking for further detailed examples on how to write metadata in dataframes, tables and into files, go to [github page of PANGAEA community workshop](https://github.com/pangaea-data-publisher/community-workshop-material/tree/master/Python/PANGAEApy_practical).

If you are looking for detailed examples on how to search with PANGAEApy PanQuery, go to [github page of PANGAEA community workshop](https://github.com/pangaea-data-publisher/community-workshop-material/tree/master/Python/PANGAEApy_practical). There will be also examples on how to download and write out metadata and data.

## How to install and upgrade PANGAEApy

If you need to install PANGAEApy use pip  
_!pip install pangaeapy_

If you need to upgrade PANGAEApy use pip   
_!pip install pangaeapy --upgrade_

Check your version of PANGAEApy via pip  
_!pip show pangaeapy_

for details see https://pypi.org/project/pangaeapy/ 

In [None]:
!pip show pangaeapy

In [None]:
### import necessary packages
import pangaeapy as pan
from pangaeapy.pandataset import PanDataSet

import pandas as pd

import os

## How to find internal documentation

In [None]:
### show functions of pangaeapy
help(pan)

In [None]:
### show functions of pangaeapy 
# help(pan.pandataset)

## Example PANGAEA datasets and all available metadata information

example dataset URIs:  
https://doi.pangaea.de/10.1594/PANGAEA.982284 => Mastertrack  
https://doi.pangaea.de/10.1594/PANGAEA.961721 => CTD dataset with many events  
https://doi.pangaea.de/10.1594/PANGAEA.952014 => collection  
https://doi.pangaea.de/10.1594/PANGAEA.960624 => dataset with 1 event and many parameters

#### applying PanDataSet to get metadata

_PanDataSet(id=None, paramlist=None, deleteFlag='', enable_cache=False, include_data=True, expand_terms=[], auth_token=None, cache_expiry_days=1)_  

id can be URI or just dataset id number => same result

In [None]:
# ds = PanDataSet('https://doi.pangaea.de/10.1594/PANGAEA.982284', include_data=False) # just metadata
# ds = PanDataSet('doi:10.1594/PANGAEA.982284', include_data=False) # just metadata
ds = PanDataSet(id=961721, include_data=False) # just metadata

In [None]:
ds.id

In [None]:
ds.uri

In [None]:
ds.doi

In [None]:
ds.title

In [None]:
ds.abstract

In [None]:
ds.year

In [None]:
ds.authors # list of PanAuthor

In [None]:
### PanAuthor(lastname, firstname=None, orcid=None, id=None, affiliations=None)
for au in ds.authors:
    print(au.fullname)
    print(au.lastname)
    print(au.firstname)
    print(au.ORCID)
    print(au.id)
    print(au.affiliations)

In [None]:
### PanAuthor(lastname, firstname=None, orcid=None, id=None, affiliations=None)
# output as dataframe
df_authors = pd.DataFrame()
for ind, value in enumerate(ds.authors):
    df_authors.loc[ind,'fullname'] = value.fullname
    df_authors.loc[ind,'lastname'] = value.lastname
    df_authors.loc[ind,'firstname'] = value.firstname
    df_authors.loc[ind,'ORCID'] = value.ORCID
    df_authors.loc[ind,'id'] = value.id
    df_authors.loc[ind,'affiliations'] = value.affiliations
df_authors

In [None]:
ds.citation

In [None]:
ds.parameters # list of PanParam (parameter objects)

In [None]:
ds.parameters.values()

In [None]:
### PanParam(id, name, shortName, synonym, type, source, unit=None, unit_id=None, format=None, terms=[], comment=None, PI={}, dataseries=None, colno=None, methodid=None, method)
for param in ds.params.values():
    print(param.id)
    print(param.name)
    print(param.shortName)
    print(param.synonym) # dictionary
    print(param.type)
    print(param.source)
    print(param.unit)
    print(param.unit_id)
    print(param.format)
    print(param.terms) # list of dictionaries
    print(param.comment)
    print(param.PI)
    print(param.dataseries)
    print(param.colno)
    print(param.methodid)
    print(param.method) # PanMethod

In [None]:
### combine list of all parameter names with unit
params = "; ".join([f'{param.name} [{param.unit}]' if param.unit else param.name for param in ds.params.values()])
print(params)

In [None]:
### PanParam(id, name, shortName, synonym, type, source, unit=None, unit_id=None, format=None, terms=[], comment=None, PI={}, dataseries=None, colno=None, methodid=None, method)
df_params = pd.DataFrame()
for ind, value in enumerate(ds.params.values()):
    df_params.loc[ind,'id'] = value.id
    df_params.loc[ind,'name'] = value.name
    df_params.loc[ind,'shortName'] = value.shortName
    # df_params.loc[ind,'synonym'] = value.synonym # is dictionary not value
    df_params.loc[ind,'type'] = value.type
    df_params.loc[ind,'source'] = value.source
    df_params.loc[ind,'unit'] = value.unit
    df_params.loc[ind,'unit_id'] = value.unit_id
    df_params.loc[ind,'format'] = value.format
    # df_params.loc[ind,'terms'] = value.terms # is dictionary not value
    df_params.loc[ind,'comment'] = value.comment
    # df_params.loc[ind,'PI'] = value.PI # is dictionary not value
    df_params.loc[ind,'dataseries'] = value.dataseries
    df_params.loc[ind,'colno'] = value.colno
    df_params.loc[ind,'method id'] = value.methodid
    if value.method is not None:
        df_params.loc[ind,'method name'] = value.method.name #PanMethod
        df_params.loc[ind,'method name'] = 'None'
df_params

In [None]:
### PanMethod(id, name, terms=[])
for param in ds.params.values():
    if param.method is not None:
        print(param.method.id)   
        print(param.method.name)  
        print(param.method.terms) 
        # print(param.method.__dict__)

In [None]:
ds.events # list of PanEvent

In [None]:
### PanEvent(label, latitude=None, longitude=None, latitude2=None, longitude2=None, elevation=None, datetime=None, datetime2=None, basis=None, location=None, campaign=None, id=None, method=None)
for ev in ds.events:
    print(ev.label)
    print(ev.latitude)
    print(ev.longitude)
    print(ev.latitude2)
    print(ev.longitude2)
    print(ev.elevation)    
    print(ev.datetime)
    print(ev.datetime2)
    print(ev.device)
    print(ev.basis) # PanBasis
    print(ev.location)
    print(ev.campaign) # PanCampaign
    print(ev.id)
    print(ev.deviceid)
    print(ev.method) # PanMethod

In [None]:
### PanBasis(name=None, URI=None, callSign=None, IMOnumber=None)
for ev in ds.events:
    print(ev.basis.name)
    print(ev.basis.URI)
    print(ev.basis.callSign)
    print(ev.basis.IMOnumber)

In [None]:
### PanCampaign(name=None, URI=None, start=None, end=None, startlocation=None, endlocation=None, BSHID=None, expeditionprogram=None)
for ev in ds.events:
    print(ev.campaign.name)
    print(ev.campaign.URI)
    print(ev.campaign.start)
    print(ev.campaign.end)
    print(ev.campaign.startlocation)
    print(ev.campaign.endlocation)
    print(ev.campaign.BSHID)
    print(ev.campaign.expeditionprogram)
    

In [None]:
# function writes all event information into seperate dataframe
ds.getEventsAsFrame()

In [None]:
ds.projects # list of PanProject

In [None]:
### PanProject(label, name, URL=None, awardURI=None, id=None)
for pro in ds.projects:
    print(pro.label)
    print(pro.name)
    print(pro.URL)
    print(pro.awardURI)
    print(pro.id)    

In [None]:
ds.mintimeextent

In [None]:
ds.maxtimeextent

In [None]:
ds.loginstatus

In [None]:
ds.isCollection

In [None]:
ds.collection_members #list of DOIs of all child data sets in case the data set is a parent data set

In [None]:
ds.keywords

In [None]:
ds.moratorium

In [None]:
ds.datastatus

In [None]:
ds.registrystatus

In [None]:
ds.licence # PanLicence

In [None]:
### PanLicence(label, name, URI=None)
print(ds.licence.label)
print(ds.licence.name)
print(ds.licence.URI)

In [None]:
ds.getParamDict()

In [None]:
ds.info()

In [None]:
print(ds.geometryextent["meanLatitude"])
print(ds.geometryextent["meanLongitude"])

In [None]:
ds.geometryextent

In [None]:
ds.geometryextent['southBoundLatitude']

In [None]:
ds.date # publication date

## Example PANGAEA datasets including metadata and data

example dataset URIs:  
https://doi.pangaea.de/10.1594/PANGAEA.963942 => CTD dataset with 86 events  
https://doi.pangaea.de/10.1594/PANGAEA.960624 => dataset with 1 event and many parameters

In [None]:
#ds = PanDataSet('https://doi.pangaea.de/10.1594/PANGAEA.963942')
#ds = PanDataSet('doi:10.1594/PANGAEA.963942')
ds = PanDataSet(id=963942)

In [None]:
# ds.data includes header with parameter short names without unit 
ds.data.head() #pandas.DataFrame

#### Translate parameter short names to long parameter names
Because by default parameters are abbreviated without units

In [None]:
# Translate short parameters names to long names including unit
def get_long_parameters(ds):
    """Translate short parameters names to long names including unit

    Args:
        ds (PANGAEA dataset): PANGAEA dataset
    """
    ds.data.columns =  [f'{param.name} [{param.unit}]' if param.unit else param.name for param in ds.params.values()]


In [None]:
ds.data.head(2)

In [None]:
# Translate short parameters names to long names including unit
get_long_parameters(ds)

In [None]:
ds.data.head(2)

If you are looking for detailed examples on how to search with PANGAEApy PanQuery, go to [github page of PANGAEA community workshop](https://github.com/pangaea-data-publisher/community-workshop-material/tree/master/Python/PANGAEApy_practical).