Example SmartSMEAR API calls, see https://avaa.tdata.fi/web/smart/smear/api

In [None]:
urlstring = "https://avaa.tdata.fi/smear-services/smeardata.jsp?variables=Pamb0,UV_B&table=HYY_META&from=2016-02-11%2000:00:00.989&to=2016-02-12%2009:06:07.989&quality=ANY&averaging=30MIN&type=ARITHMETIC"

In [None]:
print(urlstring)

The Pandas libary, which replicates much of R's usefullness, has a read_csv function which you feed URLs to

In [None]:
import pandas as pd

In [None]:
data = pd.read_csv(urlstring)
data.head()

we will utlise numpy and datetime modules to convert datetime columns to more convenient data type:


In [None]:
import numpy as np
import datetime

In [None]:
date_numpy = data.values[:,0:6]

# convert numpy.float64 to int
date_numpy = date_numpy.astype(int)

# convert numpy array to datetime:
date_time= np.array([datetime.datetime(*x) for x in date_numpy])

# print out first three datetimes
print(date_time[:3])

API makes your life easier when doing dynamic data retrievals within data processing/analysis scripts.

For example, we can use string formatting to seperate out the variables, which is useful for when we want to chose different times or dates or a different table:


In [None]:
date_start="2016-02-14%2000:00:00"
date_end="2016-02-15%2009:06:00"
table="HYY_META"
quality="ANY"
averaging="NONE"
stype="NONE"
variables="Pamb0,UV_B"
urlstring2=("https://avaa.tdata.fi/smear-services/smeardata.jsp?"
             "table={0}" 
             "&variables={1}"
             "&from={2}"
             "&to={3}"
             "&quality={4}"
             "&averaging={5}"
             "&type={6}"
             "&format=csv").format(table,variables,date_start,
                                   date_end,quality,averaging,stype)



In [None]:
data2 = pd.read_csv(urlstring2)
data2.head()

Below are two simple functions for constructing API call from given parameters and downloading data. Named parameters are used so the user can give table and variables separately or use table.variable notation, give parameters in any order and skip irrelevant parameters. Different types of error affect the returned data in different ways. Be careful and take note of the column names of the returned data frame!

In [None]:
def datetime_converter(df):
    date_numpy=df.values[:,0:6]

    # convert numpy.float64 to int
    date_numpy=date_numpy.astype(int)

    # convert numpy array to datetime:
    date_time= np.array([datetime.datetime(*x) for x in date_numpy])
    
    return date_time 


def getVariables(variables,date_start,date_end,
                 table="HYY_META",quality="ANY",averaging="NONE",
                 stype="NONE",index_date=False):
    """
    e.g.
    date format 2015-01-01 00:00:00
    TableName="VAR_DMPS"
    From="2017-05-01%2000:00:00"
    To="2017-05-01%2001:00:00"
    Quality="checked" #"ANY"
    Averaging="30MIN"
    Type="ARITHMETIC"
    
    """
    url=("https://avaa.tdata.fi/smear-services/smeardata.jsp?"
         "table={0}" 
         "&variables={1}"
         "&from={2}"
         "&to={3}"
         "&quality={4}"
         "&averaging={5}"
         "&type={6}"
         "&format=csv").format(table,variables,date_start,
                               date_end,quality,averaging,stype)
    
    df = pd.read_csv(url)
    
    if index_date:
        date_index = datetime_converter(df)
        df.index = date_index
    return df

Here's an example of the function in action:

In [None]:
date_start="2018-07-01%2000:00:00"
date_end="2018-07-02%2001:00:00"
variables="PAR"
df = getVariables(variables,date_start,date_end,
                  table="HYY_META",quality="CHECKED",
                  stype="NONE",index_date=True)

In [None]:
df[u"HYY_META.PAR"].plot()

Although SmartSMEAR API gives http return codes and in most cases also meaningful error messages, this information may not be obvious in our output due to the way that read_csv works.  Let's try out an unrealistic variable and see what happens:

In [None]:
date_start="2018-07-01%2000:00:00"
date_end="2018-07-02%2001:00:00"
variables="XXXX"
df = getVariables(variables,date_start,date_end,
                  table="HYY_META",quality="CHECKED",
                  stype="NONE",index_date=True)
print(df)

Specific notes for AVAA API:

When using variables parameter, if any variable does not exist in given table, no data from that table are returned.

In [None]:
date_start="2018-07-01%2000:00:00"
date_end="2018-07-02%2001:00:00"
variables="PAR,XXXX"
df = getVariables(variables,date_start,date_end,
                  table="HYY_META",quality="CHECKED",
                  stype="NONE",index_date=True)
print(df)

In [None]:
Specific notes for AVAA API:

Sometimes there are missing rows in the database, align the rows with merge.

Example: Hyytiälä and Siikaneva 1 meteo data in 2004/2005


In [None]:
date_start="2004-12-31%2023:00:00"
date_end="2005-01-01%2001:00:00"
variables="T168"
df = getVariables(variables,date_start,date_end,
                  table="HYY_META",quality="CHECKED",
                  stype="NONE",index_date=True)

variables="T_a"
df2 = getVariables(variables,date_start,date_end,
                  table="SII1_META",quality="CHECKED",
                  stype="NONE",index_date=True)

# join them together like this
df.join(df2, lsuffix='_HYY', rsuffix='_SII1')
