Batch creating stations in OSCAR/Surface
---
> **Example #4:** We will create a list of stations from Brazil. The information about new stations is contained in an Excel file which I converted into CSV. Before creating new stations we need to process the information in the Excel sheet.
We need to do two things. First, to rename the columns so that they correspond to the expected names. Second, to translate the values into WIGOS Metadata Standard code-list values.

***Note***: ..



In [1]:
import json
import sys
import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger("dicttoxml").setLevel(logging.WARNING)
logging.getLogger("oscar-lib").setLevel(logging.INFO)


import pandas as pd
import re
logging.basicConfig(level=logging.INFO)

from oscar_lib import OscarClient, Station

INFO:root:loading schema files
INFO:root:schema parsed sucessfully
INFO:root:schema parsed sucessfully
INFO:root:loading XSLT files
INFO:root:XSLT parsed sucessfully
INFO:root:XSLT simple2wmdr parsed sucessfully


# checking the Excel sheet
---
> We need to make sure the values in the Excel sheet correspond to the WIGOS metadata standard codelists in https://codes.wmo.int/_wmdr

In [2]:
df_stations_orig = pd.read_csv(r'files/Stations_Template-INMET.csv',encoding="latin1") #read the CSV file into a dataframe
df_stations_orig.dropna(axis=0, how='any', thresh=None, subset=None, inplace=True) # remove empty rows
df_stations_orig = df_stations_orig.rename(columns=lambda x: x.strip().lower() ) #fix column names: remove potential whitespace and make lowecase
# rename the column names to expected format 
column_map = { 
        'identifier' : 'wigosid' , 
        'type' : 'stationtype' ,  
        'altitude' : 'elevation' , 
        'creation' : 'established' , 
        'operational status' : 'status',
        'real time' : 'real-time',
        'wmo region':'region',
        'country / territory' : 'country',
        'station url':'url',
        'site description' : 'description' , 
        'time zone':'timezone' }
df_stations_orig = df_stations_orig.rename(columns=column_map )
df_stations_orig.head(3)

Unnamed: 0,name,wigosid,stationtype,automatic,latitude,longitude,elevation,established,international,manufacturer,...,real-time,affiliations,frequency,variables,region,country,timezone,url,other link (url),description
0,Novo Aripuanã,0-76-0-1303304000000594,Land (Fixed),automatic,-5.141139,-60.380531,45.0,2019-07-18,yes,Vaisala,...,yes,GOS,1 hour,"Air Temperature (inst, max, min), Relative Hum...",III - South America,Brazil,UTC-4,http://tempo.inmet.gov.br/WSI/0-76-0-130330400...,https://cidades.ibge.gov.br/brasil/am/novo-ari...,Automatic weather station
1,Redenção,0-76-0-1506139000000593,Land (Fixed),automatic,-8.04325,-50.006917,199.0,2019-06-17,yes,Vaisala,...,yes,GOS,1 hour,"Air Temperature (inst, max, min), Relative Hum...",III - South America,Brazil,UTC-3,http://tempo.inmet.gov.br/WSI/0-76-0-150613900...,https://cidades.ibge.gov.br/brasil/pa/redencao...,Automatic weather station
2,Zé Doca,0-76-0-2114007000000596,Land (Fixed),automatic,-3.269194,-45.651083,45.5,2019-07-11,yes,Vaisala,...,yes,GOS,1 hour,"Air Temperature (inst, max, min), Relative Hum...",III - South America,Brazil,UTC-3,http://tempo.inmet.gov.br/WSI/0-76-0-211400700...,https://cidades.ibge.gov.br/brasil/ma/ze-doca/...,Automatic weather station


## map content to WIGOS metadata record values
we need to translate the following fields into WMDR codelists: stationtype, international, real-time, region, country and variables

In [3]:
# helper functions to parse the variables and frequency
def parseFrequency(x):
    """ Extract schedule and return as seconds
    Currently only hours are supported
    """
    m=re.search('(\d+)\s+hour',x)
    if m:
        return int(m.group(1)) * 60 * 60 # we can only recognize hours at the moment in this code
    else:
        return None

var_map = {
    'Air Temperature':224,
    'Relative Humidity':251,
    'Dewpoint':225,
    'Atmospheric Pressure':216,
    'Winds':[12005,12006],
    'Solar Radiation':572,
    'Precipitation':210
}
    
def parseVariables(variables):
    """ Extract variables from Excel sheet and return them as numeric WMDR codelist values"""
    res = []
    for v in re.sub('\(.+?\)','',variables).split(","):
        v=v.strip()
        tmp = var_map[v]
        if not isinstance(tmp, list):
            tmp = (tmp,)
        res.extend( tmp )
        
    return res

## mapping values from the Excel to WMDR codelist values

In [4]:
df_stations = df_stations_orig.copy()

# here we map the values in the sheet to the WMDR values.
type_map = { 'Land (Fixed)' : 'landFixed' }
automatic_map = {'automatic':'automaticReading'}
status_map = {'operational':'operational'}
country_map = {'Brazil':'BRA'}
region_map = {'III - South America':'southAmerica'}

# perform the mapping 
df_stations["stationtype"] = df_stations["stationtype"].map(type_map) 
df_stations["automatic"] = df_stations["automatic"].map(automatic_map) 
df_stations["status"] = df_stations["status"].map(status_map) 

df_stations["international"] = df_stations["international"].astype(bool)
df_stations["real-time"] = df_stations["real-time"].astype(bool)

df_stations["region"] = df_stations["region"].map(region_map)
df_stations["country"] = df_stations["country"].map(country_map)

df_stations["frequency"]=df_stations["frequency"].apply(parseFrequency).astype(int)
df_stations["variables"]=df_stations.variables.map(parseVariables)
df_stations.head(3)

Unnamed: 0,name,wigosid,stationtype,automatic,latitude,longitude,elevation,established,international,manufacturer,...,real-time,affiliations,frequency,variables,region,country,timezone,url,other link (url),description
0,Novo Aripuanã,0-76-0-1303304000000594,landFixed,automaticReading,-5.141139,-60.380531,45.0,2019-07-18,True,Vaisala,...,True,GOS,3600,"[224, 251, 225, 216, 12005, 12006, 572, 210]",southAmerica,BRA,UTC-4,http://tempo.inmet.gov.br/WSI/0-76-0-130330400...,https://cidades.ibge.gov.br/brasil/am/novo-ari...,Automatic weather station
1,Redenção,0-76-0-1506139000000593,landFixed,automaticReading,-8.04325,-50.006917,199.0,2019-06-17,True,Vaisala,...,True,GOS,3600,"[224, 251, 225, 216, 12005, 12006, 572, 210]",southAmerica,BRA,UTC-3,http://tempo.inmet.gov.br/WSI/0-76-0-150613900...,https://cidades.ibge.gov.br/brasil/pa/redencao...,Automatic weather station
2,Zé Doca,0-76-0-2114007000000596,landFixed,automaticReading,-3.269194,-45.651083,45.5,2019-07-11,True,Vaisala,...,True,GOS,3600,"[224, 251, 225, 216, 12005, 12006, 572, 210]",southAmerica,BRA,UTC-3,http://tempo.inmet.gov.br/WSI/0-76-0-211400700...,https://cidades.ibge.gov.br/brasil/ma/ze-doca/...,Automatic weather station


## creating Station objects and exporting the XML or uploading it to OSCAR/Surface

In [9]:
# a client object for the interaction with OSCAR/Surface
client = OscarClient(oscarurl = OscarClient.OSCAR_DEPL, token="my_token")

In [10]:
# default schedule . We overwrite values with values from the Excel sheet where available
default_schedule = {   
   "startMonth": 1,   "endMonth": 12,
   "startWeekday": 1, "endWeekday": 7,
   "startHour": 0,    "endHour": 23,
   "startMinute": 0,  "endMinute": 59,
   "interval": 60*60, "international": True , "real-time" : True # we overwrite these with the values from Excel
}

# loop over the rows in the Excel
for idx,row in df_stations[df_stations["in oscar"] == "yes"].iterrows():  
    # most parameters needed in the constructor of Station are already in the dictionary, as we renamed columns above
    params = dict(row)
    
    #adding custom information
    params["organization"] = "INMET" # I looked up this abbreviation from the field "supervising organization" in the station search dialogue in OSCAR
    params["urls"] = [params["url"], params["other link (url)"]] # pass multiple urls as a lisst
    params["description"] = params["description"] + ". Station manufacturer is " + params["manufacturer"] # we integrate the manufactures in the site description 
    
    # we need to take care of the observations and schedules
    observations = []
    for v in row["variables"]:
        new_schedule = default_schedule.copy()
        new_schedule["international"] = row["international"]
        new_schedule["interval"] = row["frequency"]
        new_schedule["real-time"] = row["real-time"]
        
        observation = { 
            "variable" : v , "observationsource" : params["automatic"] , 
            "affiliation": params["affiliations"] , "schedule" : new_schedule 
        }
        observations.append(observation)
        
    params["observations"] = observations
        
    s = Station(**params)
        
    try:
        s.validate()
        
        # write XML to disk
        with open(r"tmp/{}.xml".format(row["wigosid"]),"w",encoding="utf8") as f: 
            f.write(str(s))
            
        # upload to OSCAR
        status=client.uploadXML(str(s))
        print("uploaded {}, {}".format(row["wigosid"],status))

    except Exception as e:
        print("error:",e)


INFO:oscar_lib.oscar_client:upload ok, new id 44250 The list below is organized by section header and shows exceptions/issues – if any – that may have resulted from the processing of the XML (NB: Section headers are always displayed).
# Facility with identifier "0-76-0-1303304000000594"
REF_9: The elements "observation/OM_Observation/metadata/MD_Metadata/individualName:NAME" and "observation/OM_Observation/metadata/MD_Metadata/individualName:SURNAME" are mandatory.
REF_9: The field 'observation/OM_Observation/metadata/MD_Metadata/address/CI_Address/city' is mandatory.
REF_3: The country of contact is not provided in "observation/metadata/MD_metadata/contact/CI_ResponsibleParty". Contact is discarded.





INFO:oscar_lib.oscar_client:upload ok, new id 44251 The list below is organized by section header and shows exceptions/issues – if any – that may have resulted from the processing of the XML (NB: Section headers are always displayed).
# Facility with identifier "0-76-0-1506139000000593"
REF_9: The elements "observation/OM_Observation/metadata/MD_Metadata/individualName:NAME" and "observation/OM_Observation/metadata/MD_Metadata/individualName:SURNAME" are mandatory.
REF_9: The field 'observation/OM_Observation/metadata/MD_Metadata/address/CI_Address/city' is mandatory.
REF_3: The country of contact is not provided in "observation/metadata/MD_metadata/contact/CI_ResponsibleParty". Contact is discarded.





INFO:oscar_lib.oscar_client:upload ok, new id 44252 The list below is organized by section header and shows exceptions/issues – if any – that may have resulted from the processing of the XML (NB: Section headers are always displayed).
# Facility with identifier "0-76-0-2114007000000596"
REF_9: The elements "observation/OM_Observation/metadata/MD_Metadata/individualName:NAME" and "observation/OM_Observation/metadata/MD_Metadata/individualName:SURNAME" are mandatory.
REF_9: The field 'observation/OM_Observation/metadata/MD_Metadata/address/CI_Address/city' is mandatory.
REF_3: The country of contact is not provided in "observation/metadata/MD_metadata/contact/CI_ResponsibleParty". Contact is discarded.





INFO:oscar_lib.oscar_client:upload ok, new id 44253 The list below is organized by section header and shows exceptions/issues – if any – that may have resulted from the processing of the XML (NB: Section headers are always displayed).
# Facility with identifier "0-76-0-2200608000000597"
REF_9: The elements "observation/OM_Observation/metadata/MD_Metadata/individualName:NAME" and "observation/OM_Observation/metadata/MD_Metadata/individualName:SURNAME" are mandatory.
REF_9: The field 'observation/OM_Observation/metadata/MD_Metadata/address/CI_Address/city' is mandatory.
REF_3: The country of contact is not provided in "observation/metadata/MD_metadata/contact/CI_ResponsibleParty". Contact is discarded.





INFO:oscar_lib.oscar_client:upload ok, new id 44254 The list below is organized by section header and shows exceptions/issues – if any – that may have resulted from the processing of the XML (NB: Section headers are always displayed).
# Facility with identifier "0-76-0-3550902000000592"
REF_9: The elements "observation/OM_Observation/metadata/MD_Metadata/individualName:NAME" and "observation/OM_Observation/metadata/MD_Metadata/individualName:SURNAME" are mandatory.
REF_9: The field 'observation/OM_Observation/metadata/MD_Metadata/address/CI_Address/city' is mandatory.
REF_3: The country of contact is not provided in "observation/metadata/MD_metadata/contact/CI_ResponsibleParty". Contact is discarded.





INFO:oscar_lib.oscar_client:upload ok, new id 44255 The list below is organized by section header and shows exceptions/issues – if any – that may have resulted from the processing of the XML (NB: Section headers are always displayed).
# Facility with identifier "0-76-0-4304663000000595"
REF_9: The elements "observation/OM_Observation/metadata/MD_Metadata/individualName:NAME" and "observation/OM_Observation/metadata/MD_Metadata/individualName:SURNAME" are mandatory.
REF_9: The field 'observation/OM_Observation/metadata/MD_Metadata/address/CI_Address/city' is mandatory.
REF_3: The country of contact is not provided in "observation/metadata/MD_metadata/contact/CI_ResponsibleParty". Contact is discarded.





INFO:oscar_lib.oscar_client:upload ok, new id 44256 The list below is organized by section header and shows exceptions/issues – if any – that may have resulted from the processing of the XML (NB: Section headers are always displayed).
# Facility with identifier "0-76-0-4204202000000588"
REF_9: The elements "observation/OM_Observation/metadata/MD_Metadata/individualName:NAME" and "observation/OM_Observation/metadata/MD_Metadata/individualName:SURNAME" are mandatory.
REF_9: The field 'observation/OM_Observation/metadata/MD_Metadata/address/CI_Address/city' is mandatory.
REF_3: The country of contact is not provided in "observation/metadata/MD_metadata/contact/CI_ResponsibleParty". Contact is discarded.





INFO:oscar_lib.oscar_client:upload ok, new id 44257 The list below is organized by section header and shows exceptions/issues – if any – that may have resulted from the processing of the XML (NB: Section headers are always displayed).
# Facility with identifier "0-76-0-4203600000000587"
REF_9: The elements "observation/OM_Observation/metadata/MD_Metadata/individualName:NAME" and "observation/OM_Observation/metadata/MD_Metadata/individualName:SURNAME" are mandatory.
REF_9: The field 'observation/OM_Observation/metadata/MD_Metadata/address/CI_Address/city' is mandatory.
REF_3: The country of contact is not provided in "observation/metadata/MD_metadata/contact/CI_ResponsibleParty". Contact is discarded.





INFO:oscar_lib.oscar_client:upload ok, new id 44258 The list below is organized by section header and shows exceptions/issues – if any – that may have resulted from the processing of the XML (NB: Section headers are always displayed).
# Facility with identifier "0-76-0-5107354000000589"
REF_9: The elements "observation/OM_Observation/metadata/MD_Metadata/individualName:NAME" and "observation/OM_Observation/metadata/MD_Metadata/individualName:SURNAME" are mandatory.
REF_9: The field 'observation/OM_Observation/metadata/MD_Metadata/address/CI_Address/city' is mandatory.
REF_3: The country of contact is not provided in "observation/metadata/MD_metadata/contact/CI_ResponsibleParty". Contact is discarded.





INFO:oscar_lib.oscar_client:upload ok, new id 44259 The list below is organized by section header and shows exceptions/issues – if any – that may have resulted from the processing of the XML (NB: Section headers are always displayed).
# Facility with identifier "0-76-0-5100359000000590"
REF_9: The elements "observation/OM_Observation/metadata/MD_Metadata/individualName:NAME" and "observation/OM_Observation/metadata/MD_Metadata/individualName:SURNAME" are mandatory.
REF_9: The field 'observation/OM_Observation/metadata/MD_Metadata/address/CI_Address/city' is mandatory.
REF_3: The country of contact is not provided in "observation/metadata/MD_metadata/contact/CI_ResponsibleParty". Contact is discarded.





INFO:oscar_lib.oscar_client:upload ok, new id 44260 The list below is organized by section header and shows exceptions/issues – if any – that may have resulted from the processing of the XML (NB: Section headers are always displayed).
# Facility with identifier "0-76-0-5107701000000591"
REF_9: The elements "observation/OM_Observation/metadata/MD_Metadata/individualName:NAME" and "observation/OM_Observation/metadata/MD_Metadata/individualName:SURNAME" are mandatory.
REF_9: The field 'observation/OM_Observation/metadata/MD_Metadata/address/CI_Address/city' is mandatory.
REF_3: The country of contact is not provided in "observation/metadata/MD_metadata/contact/CI_ResponsibleParty". Contact is discarded.



