# Display entries of compact dataframe column by column

Author: Lucie Luecke, 2024

This notebook goes through the columns of a **compact dataframe** (original databases or output database of databases) and displays the (meta)data.

Use this to familiarise yourself with the contents of a compact dataframe.

A compact dataframe has standardised columns and data formats for:

  - ```archiveType```
  - ```dataSetName```
  - ```datasetId```
  - ```geo_meanElev```
  - ```geo_meanLat```
  - ```geo_meanLon```
  - ```geo_siteName```
  - ```interpretation_direction``` (new in v2.0)
  - ```interpretation_variable```
  - ```interpretation_variableDetail```
  - ```interpretation_seasonality``` (new in v2.0)
  - ```originalDataURL```
  - ```originalDatabase```
  - ```paleoData_notes```
  - ```paleoData_proxy```
  - ```paleoData_sensorSpecies```
  - ```paleoData_units```
  - ```paleoData_values```
  - ```paleoData_variableName```
  - ```year```
  - ```yearUnits```
  - (optional: `DuplicateDetails`)


## Set up working environment

Make sure the repo_root is added correctly, it should be: your_root_dir/dod2k
This should be the working directory throughout this notebook (and all other notebooks).

In [1]:
%load_ext autoreload
%autoreload 2

import sys
import os
from pathlib import Path

# Add parent directory to path (works from any notebook in notebooks/)
# the repo_root should be the parent directory of the notebooks folder
current_dir = Path().resolve()
# Determine repo root
if current_dir.name == 'dod2k': repo_root = current_dir
elif current_dir.parent.name == 'dod2k': repo_root = current_dir.parent
else: raise Exception('Please review the repo root structure (see first cell).')

# Update cwd and path only if needed
if os.getcwd() != str(repo_root):
    os.chdir(repo_root)
if str(repo_root) not in sys.path:
    sys.path.insert(0, str(repo_root))

print(f"Repo root: {repo_root}")
if str(os.getcwd())==str(repo_root):
    print(f"Working directory matches repo root. ")

Repo root: /home/jupyter-lluecke/dod2k
Working directory matches repo root. 


In [2]:
import pandas as pd
import numpy as np

from dod2k_utilities import ut_functions as utf # contains utility functions


## Read dataframe

Read compact dataframe.

{db_name} refers to the database, including e.g.
  - database of databases:
    - dod2k_v2.0 (dod2k: duplicate free, merged database)
    - dod2k_v2.0_filtered_M (filtered for M sensitive proxies only)
    - dod2k_v2.0_filtered_M_TM (filtered for M and TM sensitive proxies only)
    - dod2k_v2.0_filtered_speleo (filtered for speleothem proxies only)
    - all_merged (NOT filtered for duplicates, only fusion of the input databases)
  - original databases:
    - fe23
    - ch2k
    - sisal
    - pages2k
    - iso2k

All compact dataframes are saved in {repo_root}/data/{db_name} as {db_name}_compact.csv.

In [3]:
# read dataframe, choose from the list below, or specify your own

db_name = 'dod2k_v2.0'

# load dataframe
df = utf.load_compact_dataframe_from_csv(db_name)
print(df.info())
df.name = db_name


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4957 entries, 0 to 4956
Data columns (total 22 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   archiveType                    4957 non-null   object 
 1   dataSetName                    4957 non-null   object 
 2   datasetId                      4957 non-null   object 
 3   duplicateDetails               4957 non-null   object 
 4   geo_meanElev                   4875 non-null   float32
 5   geo_meanLat                    4957 non-null   float32
 6   geo_meanLon                    4957 non-null   float32
 7   geo_siteName                   4957 non-null   object 
 8   interpretation_direction       4957 non-null   object 
 9   interpretation_seasonality     4957 non-null   object 
 10  interpretation_variable        4957 non-null   object 
 11  interpretation_variableDetail  4957 non-null   object 
 12  originalDataURL                4957 non-null   o

## Display dataframe

### Display identification metadata: dataSetName, datasetId, originalDataURL, originalDatabase

#### index

In [4]:
# # check index
print(df.index)

RangeIndex(start=0, stop=4957, step=1)


#### dataSetName (associated with each record, may not be unique)

In [5]:
# # check dataSetName
key = 'dataSetName'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

dataSetName: 
['NAm-MtLemon.Briffa.2002' 'NAm-MtLemon.Briffa.2002'
 'NAm-MtLemon.Briffa.2002' ... 'Sahiya cave'
 'Ocn-ArabianSea.Doose-Rolinski.2001, Ocn-ArabianSea.Doose-Rolinski.2001'
 'europe_swed019w, europe_swed021w']
["<class 'str'>"]
No. of unique values: 3845/4957


#### datasetId (unique identifier, as given by original authors, includes original database token)

In [6]:
# # check datasetId

print(len(df.datasetId.unique()))
print(len(df))
key = 'datasetId'
print('%s (starts with): '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print('datasetId starts with: ', np.unique([str(dd.split('_')[0]) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

4957
4957
datasetId (starts with): 
['pages2k_5' 'pages2k_7' 'pages2k_8' ... 'sisal_901.0_545'
 'dod2k_composite_z_pages2k_1686_pages2k_1688'
 'dod2k_composite_z_FE23_europe_swed019w_FE23_europe_swed021w']
["<class 'str'>"]
datasetId starts with:  ['FE23' 'ch2k' 'dod2k' 'iso2k' 'pages2k' 'sisal']
No. of unique values: 4957/4957


#### originalDataURL (URL/DOI of original published record where available)

In [7]:
# originalDataURL
key = 'originalDataURL'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([kk for kk in df[key] if 'this' in kk]))
print(np.unique([str(type(dd)) for dd in df[key]]))
# 'this study' should point to the correct URL (PAGES2k)
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

originalDataURL: 
['FE23_europe_swed019w: https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/europe/swed019w-noaa.rwl, FE23_europe_swed021w: https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/europe/swed021w-noaa.rwl'
 'This compilation' "['10.1002/2015GL063826']" ... 'this compilation'
 'www.ncdc.noaa.gov/paleo-search/study/27330'
 'www.ncdc.noaa.gov/paleo/study/2474']
['this compilation']
["<class 'str'>"]
No. of unique values: 3776/4957


#### originalDatabase (original database used as input for dataframe)

In [8]:
# # originalDataSet
key = 'originalDatabase'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
# Note: the last two records have missing URLs
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

originalDatabase: 
['CoralHydro2k v1.0.1' 'FE23 (Breitenmoser et al. (2014))' 'Iso2k v1.1.2'
 'PAGES 2k v2.2.0' 'SISAL v3' 'dod2k_composite_z']
["<class 'str'>"]
No. of unique values: 6/4957


### geographical metadata: elevation, latitude, longitude, site name

#### geo_meanElev (mean elevation in m)

In [9]:
# check Elevation
key = 'geo_meanElev'
print('%s: '%key)
print(df[key])
print(np.unique(['%d'%kk for kk in df[key] if np.isfinite(kk)]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

geo_meanElev: 
0       2700.0
1       2700.0
2       2700.0
3       2700.0
4       2700.0
         ...  
4952    1190.0
4953    1190.0
4954    1190.0
4955    -695.0
4956     400.0
Name: geo_meanElev, Length: 4957, dtype: float32
['-1' '-10' '-1011' ... '991' '994' '995']
["<class 'float'>"]
No. of unique values: 1091/4957


#### geo_meanLat (mean latitude in degrees N)

In [10]:
# # Latitude
key = 'geo_meanLat'
print('%s: '%key)
print(np.unique(['%d'%kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

geo_meanLat: 
['-1' '-10' '-11' '-12' '-13' '-14' '-15' '-16' '-17' '-18' '-19' '-20'
 '-21' '-22' '-23' '-24' '-25' '-26' '-27' '-28' '-29' '-3' '-31' '-32'
 '-33' '-34' '-35' '-36' '-37' '-38' '-39' '-4' '-40' '-41' '-42' '-43'
 '-44' '-45' '-46' '-47' '-5' '-50' '-51' '-53' '-54' '-6' '-64' '-66'
 '-69' '-7' '-70' '-71' '-72' '-73' '-74' '-75' '-76' '-77' '-78' '-79'
 '-8' '-82' '-84' '-89' '-9' '0' '1' '10' '11' '12' '13' '15' '16' '17'
 '18' '19' '2' '20' '21' '22' '23' '24' '25' '26' '27' '28' '29' '3' '30'
 '31' '32' '33' '34' '35' '36' '37' '38' '39' '4' '40' '41' '42' '43' '44'
 '45' '46' '47' '48' '49' '5' '50' '51' '52' '53' '54' '55' '56' '57' '58'
 '59' '6' '60' '61' '62' '63' '64' '65' '66' '67' '68' '69' '7' '70' '71'
 '72' '73' '75' '76' '77' '78' '79' '8' '80' '81' '82' '9']
["<class 'float'>"]
No. of unique values: 2169/4957


### geo_meanLon (mean longitude)

In [11]:
# # Longitude 
key = 'geo_meanLon'
print('%s: '%key)
print(np.unique(['%d'%kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

geo_meanLon: 
['-1' '-10' '-100' '-101' '-102' '-103' '-104' '-105' '-106' '-107' '-108'
 '-109' '-110' '-111' '-112' '-113' '-114' '-115' '-116' '-117' '-118'
 '-119' '-12' '-120' '-121' '-122' '-123' '-124' '-125' '-126' '-127'
 '-128' '-129' '-13' '-130' '-131' '-132' '-133' '-134' '-135' '-136'
 '-137' '-138' '-139' '-140' '-141' '-142' '-143' '-144' '-145' '-146'
 '-147' '-148' '-149' '-150' '-151' '-152' '-153' '-154' '-157' '-159'
 '-16' '-160' '-161' '-162' '-163' '-169' '-17' '-174' '-18' '-19' '-2'
 '-22' '-24' '-26' '-27' '-3' '-33' '-35' '-36' '-37' '-38' '-39' '-4'
 '-41' '-42' '-43' '-44' '-45' '-46' '-47' '-49' '-5' '-50' '-51' '-54'
 '-55' '-56' '-57' '-58' '-6' '-60' '-61' '-62' '-63' '-64' '-65' '-66'
 '-67' '-68' '-69' '-7' '-70' '-71' '-72' '-73' '-74' '-75' '-76' '-77'
 '-78' '-79' '-8' '-80' '-81' '-82' '-83' '-84' '-85' '-86' '-87' '-88'
 '-89' '-9' '-90' '-91' '-92' '-93' '-94' '-95' '-96' '-97' '-98' '-99'
 '0' '1' '10' '100' '101' '102' '103' '104' '105' '106'

#### geo_siteName (name of collection site)

In [12]:
# Site Name 
key = 'geo_siteName'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

geo_siteName: 
['Mt. Lemon' 'Mt. Lemon' 'Mt. Lemon' ... 'Sahiya cave' 'Arabian Sea'
 'COMPOSITE: Torneträskr+f.,Bartoli + Torneträskfos.,Bartoli']
["<class 'str'>"]
No. of unique values: 3527/4957


### proxy metadata: archive type, proxy type, interpretation

#### archiveType (archive type)

In [13]:
# archiveType
key = 'archiveType'
print('%s: '%key)
print(np.unique(df[key]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

archiveType: 
['Borehole' 'Coral' 'Documents' 'GlacierIce' 'GroundIce' 'LakeSediment'
 'MarineSediment' 'MolluskShell' 'Other' 'Sclerosponge' 'Speleothem'
 'Wood' 'speleothem']
["<class 'str'>"]
No. of unique values: 13/4957


#### paleoData_proxy (proxy type)

In [14]:
# paleoData_proxy
key = 'paleoData_proxy'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

paleoData_proxy: 
['ARSTAN' 'Mg/Ca' 'Sr/Ca' 'TEX86' 'Uk37' 'accumulation rate' 'alkenone'
 'borehole' 'calcification rate' 'chironomid' 'chloride'
 'chrysophyte assemblage' 'concentration' 'count' 'd13C' 'd18O' 'dD'
 'diatom' 'dinocyst' 'dust' 'effective precipitation' 'foraminifera'
 'growth rate' 'historical' 'humidification index' 'ice melt'
 'maximum latewood density' 'multiproxy' 'nitrate' 'pollen' 'reflectance'
 'residual chronology' 'ring width' 'sodium' 'sulfate' 'temperature'
 'thickness' 'varve thickness']
["<class 'str'>"]
No. of unique values: 38/4957


#### paleoData_sensorSpecies (further information on proxy type: species)

In [15]:
# climate_interpretation
key = 'paleoData_sensorSpecies'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')


paleoData_sensorSpecies: 
['ABAL' 'ABAM' 'ABBA' 'ABBO' 'ABCE' 'ABCI' 'ABCO' 'ABLA' 'ABMA' 'ABPI'
 'ABPN' 'ABPR' 'ABSB' 'ABSP' 'ACRU' 'ACSH' 'ADHO' 'ADUS' 'AGAU' 'ARAR'
 'ATCU' 'ATSE' 'AUCH' 'BEPU' 'CABU' 'CADE' 'CADN' 'CARO' 'CDAT' 'CDBR'
 'CDDE' 'CDLI' 'CEAN' 'CESP' 'CHLA' 'CHNO' 'Ceratoporella nicholsoni'
 'DABI' 'DACO' 'Diploria labyrinthiformis' 'Diploria strigosa' 'FAGR'
 'FASY' 'FICU' 'FRNI' 'HABI' 'Hydnophora microconos, Porites lobata'
 'JGAU' 'JUEX' 'JUFO' 'JUOC' 'JUPH' 'JUPR' 'JURE' 'JUSC' 'JUSP' 'JUVI'
 'LADE' 'LAGM' 'LALA' 'LALY' 'LAOC' 'LASI' 'LGFR' 'LIBI' 'LITU'
 'Montastraea faveolata' 'N/A' 'NA' 'NOBE' 'NOGU' 'NOME' 'NOPU' 'NOSO'
 'NaN' 'Orbicella faveolata' 'P. australiensis, possibly P. lobata' 'PCAB'
 'PCEN' 'PCGL' 'PCGN' 'PCMA' 'PCOB' 'PCOM' 'PCPU' 'PCRU' 'PCSH' 'PCSI'
 'PCSM' 'PCSP' 'PHAL' 'PHAS' 'PHGL' 'PHTR' 'PIAL' 'PIAM' 'PIAR' 'PIBA'
 'PIBN' 'PIBR' 'PICE' 'PICL' 'PICO' 'PIEC' 'PIED' 'PIFL' 'PIHA' 'PIHR'
 'PIJE' 'PIKO' 'PILA' 'PILE' 'PILO' 'PIMO' 'PIMU' 'PIMZ' '

#### paleoData_notes (notes)

In [16]:
# # paleoData_notes
key = 'paleoData_notes'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

paleoData_notes: 
['nan' 'nan' 'nan' ... 'calcite'
 'pages2k_1686: SON1997: T(sediments) = (Uk37-0.316)/(0.023); paleoData_units changed - was originally deg C; climateInterpretation_seasonality changed - was originally Annual, pages2k_1688: All O2K-LR records have been quality-controlled according to protocols published in Nature Geoscience supplement.; climateInterpretation_seasonality changed - was originally Annual'
 'FE23_europe_swed019w: Investigator: Schweingruber, FE23_europe_swed021w: Investigator: Schweingruber']
["<class 'str'>"]
No. of unique values: 426/4957


#### paleoData_variableName

In [17]:
# paleoData_variableName
key = 'paleoData_variableName'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))

paleoData_variableName: 
['ARSTAN' 'MAR' 'Mg/Ca' 'R650/R700' 'RABD660670' 'Sr/Ca' 'TEX86' 'Uk37'
 'calcification rate' 'chloride' 'composite' 'concentration' 'count'
 'd13C' 'd18O' 'd2H' 'dD' 'dust' 'effective precipitation' 'growth rate'
 'humidification index' 'ice melt' 'maximum latewood density' 'nan'
 'nitrate' 'precipitation' 'reflectance' 'residual chronology'
 'ring width' 'sodium' 'sulfate' 'temperature' 'thickness' 'year']
["<class 'str'>"]


### climate metadata: interpretation variable, direction, seasonality

#### interpretation_direction

In [18]:
# climate_interpretation
key = 'interpretation_direction'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

interpretation_direction: 
['Increase' 'N/A' 'NaN' 'None' 'T_air (positive), P_amount (negative)'
 'T_air (positive), P_amount (negative), SPEI (negative)' 'decrease'
 'decrease/increase'
 'depends (orbital timescale: More Indian Monsoon moisture-->more enriched. Since 3ka: Indian source has been stable, so amount effect dominates: more rainfall, more intense hydrological cycle -->More depleted)'
 'increase' 'negaitive' 'negative' 'positive'
 'positive for d18O-temperature relation, negative for d13C-precipiation amount']
No. of unique values: 14/4957


#### interpretation_seasonality

In [19]:
# climate_interpretation
key = 'interpretation_seasonality'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

interpretation_seasonality: 
['Annual' 'Apr' 'Apr-Jul' 'Apr-Jun' 'Apr-Sep' 'Aug' 'Aug-Jul' 'Dec-Feb'
 'Dec-Mar' 'Dec-May' 'Feb' 'Feb-Aug' 'Growing Season' 'Jan' 'Jan-Apr'
 'Jan-Jun' 'Jan-Mar' 'Jul' 'Jul-Dec' 'Jul-Sep' 'Jun' 'Jun-Aug' 'Jun-Jul'
 'Jun-Sep' 'Mar' 'Mar-Aug' 'Mar-May' 'Mar-Nov' 'Mar-Oct' 'May' 'May-Apr'
 'May-Dec' 'May-Jul' 'May-Oct' 'May-Sep' 'N/A' 'None' 'Nov-Apr' 'Nov-Feb'
 'Nov-Jan' 'Oct-Apr' 'Oct-Dec' 'Oct-Sep' 'Sep-Apr' 'Sep-Aug' 'Sep-Nov'
 'Sep-Oct' 'Spr-Sum' 'Summer' 'Wet Season' 'Winter' 'deleteMe' 'subannual']
No. of unique values: 53/4957


#### interpretation_variable

In [20]:
# climate_interpretation
key = 'interpretation_variable'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

interpretation_variable: 
['N/A' 'NOT temperature NOT moisture' 'None' 'circulationIndex'
 'circulationVariable' 'deleteMe' 'effectivePrecipitation' 'evaporation'
 'hydrologicBalance' 'moisture' 'precipitation' 'precipitationIsotope'
 'salinity' 'seasonality' 'streamflow' 'temperature'
 'temperature+moisture']
No. of unique values: 17/4957


#### interpretation_variableDetail

In [21]:
# climate_interpretation
key = 'interpretation_variableDetail'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

interpretation_variableDetail: 
['0.58 +- 0.11ppt/degrees C' 'Aleutian Low/westerly storm trajectories'
 'Amount of rainfall change'
 'Australian-Indonesian Summer monsoon; More negative d18O values correspond to stronger amount'
 'Australian-Indonesian monsoon rainfall'
 'Competing influence of polar and maritime airmasses'
 'Continental Sweden' 'E:P lake water' 'ENSO/PDO'
 'East Asian Monsoon Strength; more negative values of d18O are interpreted as indicative of increased monsoon strength'
 'East Asian Monsoon rainfall'
 'FE23_europe_swed019w: N/A, FE23_europe_swed021w: N/A'
 'Indian Monsoon Strength'
 'Indian Summer Monsoon; more negative values of d18O are interpreted as indicative of increased monsoon strength'
 'Indian monsoon'
 'Lower precipitation produces higher d13C and Sr/Ca values'
 'Maximum air temperature, seasonal' 'Maximum temperature'
 'Monsoon strength'
 'More negative d18O values correspond to stronger amount' 'N/A' 'NaN'
 'None' 'Precipitation' 'SAM' 'Seasonal' 'Se

### data 

#### paleoData_values

In [22]:
# # paleoData_values
key = 'paleoData_values'

print('%s: '%key)
for ii, vv in enumerate(df[key][:20]):
    try: 
        print('%-30s: %s -- %s'%(df['dataSetName'].iloc[ii][:30], str(np.nanmin(vv)), str(np.nanmax(vv))))
        print(type(vv))
    except: print(df['dataSetName'].iloc[ii], 'NaNs detected.')
print(np.unique([str(type(dd)) for dd in df[key]]))

paleoData_values: 
NAm-MtLemon.Briffa.2002       : 0.154 -- 2.91
<class 'numpy.ndarray'>
NAm-MtLemon.Briffa.2002       : 0.245 -- 1.655
<class 'numpy.ndarray'>
NAm-MtLemon.Briffa.2002       : 0.283 -- 1.666
<class 'numpy.ndarray'>
NAm-MtLemon.Briffa.2002       : 0.574 -- 0.951
<class 'numpy.ndarray'>
NAm-MtLemon.Briffa.2002       : 0.707 -- 1.118
<class 'numpy.ndarray'>
NAm-MtLemon.Briffa.2002       : 0.789 -- 1.102
<class 'numpy.ndarray'>
NAm-MtLemon.Briffa.2002       : 0.757 -- 1.114
<class 'numpy.ndarray'>
Arc-Arjeplog.Bjorklund.2014   : -3.532171 -- 2.5670047
<class 'numpy.ndarray'>
Arc-Arjeplog.Bjorklund.2014   : -4.1141653 -- 2.6139
<class 'numpy.ndarray'>
Asi-CHIN019.Li.2010           : 0.298 -- 1.664
<class 'numpy.ndarray'>
NAm-Landslide.Luckman.2006    : 0.057 -- 0.76
<class 'numpy.ndarray'>
NAm-Landslide.Luckman.2006    : 0.164 -- 1.781
<class 'numpy.ndarray'>
NAm-Landslide.Luckman.2006    : 0.125 -- 1.813
<class 'numpy.ndarray'>
NAm-Landslide.Luckman.2006    : 0.116 -- 1.889

#### paleoData_units

In [23]:
# paleoData_units
key = 'paleoData_units'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

paleoData_units: 
['cm' 'cm/yr' 'count' 'count/mL' 'degC' 'g/cm/yr' 'g/cm2/yr' 'g/cm3' 'mm'
 'mm/year' 'mm/yr' 'mmol/mol' 'nan' 'needsToBeChanged' 'ng/g' 'percent'
 'permil' 'ppb' 'standardized_anomalies' 'unitless' 'yr AD' 'z score'
 'z-scores']
["<class 'str'>"]
No. of unique values: 23/4957


#### year

In [24]:
# # year
key = 'year'
print('%s: '%key)
for ii, vv in enumerate(df[key][:20]):
    try: print('%-30s: %s -- %s'%(df['dataSetName'].iloc[ii][:30], str(np.nanmin(vv)), str(np.nanmax(vv))))
    except: print('NaNs detected.', vv)
print(np.unique([str(type(dd)) for dd in df[key]]))

year: 
NAm-MtLemon.Briffa.2002       : 1568.0 -- 1983.0
NAm-MtLemon.Briffa.2002       : 1568.0 -- 1983.0
NAm-MtLemon.Briffa.2002       : 1568.0 -- 1983.0
NAm-MtLemon.Briffa.2002       : 1568.0 -- 1983.0
NAm-MtLemon.Briffa.2002       : 1568.0 -- 1983.0
NAm-MtLemon.Briffa.2002       : 1568.0 -- 1983.0
NAm-MtLemon.Briffa.2002       : 1568.0 -- 1983.0
Arc-Arjeplog.Bjorklund.2014   : 1200.0 -- 2010.0
Arc-Arjeplog.Bjorklund.2014   : 1200.0 -- 2010.0
Asi-CHIN019.Li.2010           : 1509.0 -- 2006.0
NAm-Landslide.Luckman.2006    : 913.0 -- 2001.0
NAm-Landslide.Luckman.2006    : 913.0 -- 2001.0
NAm-Landslide.Luckman.2006    : 913.0 -- 2001.0
NAm-Landslide.Luckman.2006    : 913.0 -- 2001.0
NAm-SmithersSkiArea.Schweingru: 1680.0 -- 1983.0
NAm-SmithersSkiArea.Schweingru: 1680.0 -- 1983.0
NAm-SmithersSkiArea.Schweingru: 1680.0 -- 1983.0
NAm-SmithersSkiArea.Schweingru: 1680.0 -- 1983.0
NAm-SmithersSkiArea.Schweingru: 1680.0 -- 1983.0
NAm-SmithersSkiArea.Schweingru: 1680.0 -- 1983.0
["<class 'numpy.n

#### yearUnits

In [25]:
# yearUnits
key = 'yearUnits'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

yearUnits: 
['CE']
["<class 'str'>"]
No. of unique values: 1/4957
