2.2.1 Temperature climatology in Spain (obs)
=====================================
In this notebook we will analise daily meteorological observations to derive anomalies.<br>
Our final goal is to know whether the temperature anomaly has influence on the power demand or not.<br>
<br>
However, before doing the climatological analysis, we will go through the Spanish meteorological service API (AEMET) to get access to the data.<br>
Let's import the useful module first.

In [1]:
import requests
import json
import pandas as pd
import datetime

#  Data access
The daily summary of all stations can be found on this URL

In [2]:
URL = 'https://opendata.aemet.es/opendata/api/valores/climatologicos/diarios/datos/fechaini/<start_date>T00%3A00%3A00UTC/fechafin/<end_date>T00%3A00%3A00UTC/todasestaciones'

In [3]:
print(URL)

https://opendata.aemet.es/opendata/api/valores/climatologicos/diarios/datos/fechaini/<start_date>T00%3A00%3A00UTC/fechafin/<end_date>T00%3A00%3A00UTC/todasestaciones


where start_date and end_date determined the range of interest.<br>
The API has a constraint, period cannot be longer than 30 days. Therefore in order to download 10 years, we will need to loop over several time periods.

However, we will need to use some datetime utils from Python and create local functions for easiness

In [4]:
from dateutil.relativedelta import *
import time

def numdate(datein,formatin):
    """The 'numdate' function converts a string 'datein' into a 'datetime' value based on a date format 'formatin'
    """
    dateout = datetime.datetime.strptime(datein,formatin)
    return dateout

def write_date(datein,formatout):
    """The 'numdate' function converts a 'datetime' value into a string based on a date format 'formatin'
    """
    dateout = datein.strftime(formatout)
    return dateout

In [5]:
# Vamos a generar una serie de fechas, puesto que la API de AEMET no nos permite buscar mas de 
first_date = numdate('2017-09-05','%Y-%m-%d') ; last_date = numdate('2017-11-20','%Y-%m-%d')

# loop while
loop_date = first_date
datenum_list = []
while loop_date <= last_date:
    datenum_list.append(loop_date)
    loop_date += relativedelta(days=21)    

In [6]:
print(datenum_list)

[datetime.datetime(2017, 9, 5, 0, 0), datetime.datetime(2017, 9, 26, 0, 0), datetime.datetime(2017, 10, 17, 0, 0), datetime.datetime(2017, 11, 7, 0, 0)]


In [7]:
# APIKEY
api_key = "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJtaWd1ZWwuY29yZG9iYUB3ZWF0aGVydHJlbmQuZXMiLCJqdGkiOiI5ODZhMT"+\
"JlYy03OGViLTQwOTktYmIxZS1hNmI4ZjQ3OTg0MzMiLCJpc3MiOiJBRU1FVCIsImlhdCI6MTUxNzg0OTI0MywidXNlcklkIjoiOT"+\
"g2YTEyZWMtNzhlYi00MDk5LWJiMWUtYTZiOGY0Nzk4NDMzIiwicm9sZSI6IiJ9.YeSmzO_si0SavN2KYCKzjnsSd_NzNtDqOZP9nMQgYA0"

querystring = {"api_key":api_key}

headers = {'cache-control': "no-cache"}

# AEMET web has not digital certificate
# To avoid 'warning' messages when doing a unsafe connection let's explicitly ignore those messages
import warnings
warnings.filterwarnings('ignore')

In [8]:
# Usamos las funcion 'numdate' para tener la fecha inicial (.min) y final (.max) en formato 'datetime'
last_time  = numdate(DATOS['fint'].max(),'%Y-%m-%dT%H:%M:%S')
first_time = numdate(DATOS['fint'].min(),'%Y-%m-%dT%H:%M:%S')

In [9]:
# Establecemos la primera conexión
for startd, endd in zip(datenum_list[:-1],datenum_list[1:]):
    
    # Here we write with the appropiate format the start and end date.
    wstartd = write_date(startd,'%Y-%m-%d')
    wendd   = write_date(endd + relativedelta(days=-1),'%Y-%m-%d') # the end date is one day before the following start date
    
    # Writing the URL replacing the start and end date
    url_tmp = URL.replace('<start_date>',wstartd)
    url_tmp = url_tmp.replace('<end_date>',wendd)

    response = requests.get(url, headers=headers, params=querystring,verify=False)
    RESPONSE = eval(response.text)

    # Usamos try/except para conocer algo mas de los posibles errores
    try:    
        if RESPONSE['descripcion'] == 'exito':
            # If the conection has been successful, we stablish the second one towards the data
            acceso = requests.get(RESPONSE['datos'], headers=headers, params=querystring, verify=False)
        else:
            print ('Error at the second conection : Cant access the JSON file')
    except:
        print ('Error at the first conection')

    # We are using Pandas to move from JSON to CSV easily
    DATOS = pd.DataFrame.from_dict(acceso.json())

    # File name
    file_csv = 'aemet_valores_climatologicos_todas_%s_%s.csv' % (wstartd,wendd)
    # Last action, export DataFrame to CSV
    DATOS.to_csv(file_csv,index=False)    
    print (file_csv)
    time.sleep(10) # Let's wait few seconds in other to be gentle and mitigate the risk of being kick out

./datos_aemet/aemet_valores_climatologicos_todas_2017-09-05_2017-09-25.csv
./datos_aemet/aemet_valores_climatologicos_todas_2017-09-26_2017-10-16.csv
./datos_aemet/aemet_valores_climatologicos_todas_2017-10-17_2017-11-06.csv


In [12]:
# Let's have a quick look at the data
DATOS.head()

Unnamed: 0,altitud,dir,fecha,horaPresMax,horaPresMin,horaracha,horatmax,horatmin,indicativo,nombre,prec,presMax,presMin,provincia,racha,sol,tmax,tmed,tmin,velmedia
0,273,19,2017-10-17,09,24,13:00,00:10,08:20,4358X,DON BENITO,168,9886.0,9837.0,BADAJOZ,119,14.0,229.0,198.0,168.0,28
1,486,25,2017-10-17,,,13:50,11:40,03:10,4220X,PUEBLA DE DON RODRIGO,168,,,CIUDAD REAL,92,,220.0,178.0,136.0,22
2,632,99,2017-10-17,Varias,05,Varias,13:08,23:24,C447A,TENERIFE NORTE AEROPUERTO,1,9538.0,9510.0,STA. CRUZ DE TENERIFE,117,61.0,233.0,202.0,171.0,58
3,408,18,2017-10-17,00,24,00:30,,,6106X,ANTEQUERA,166,9750.0,9693.0,MALAGA,150,,,,,50
4,807,17,2017-10-17,00,Varias,14:40,14:10,06:20,9698U,TALARN,6,9328.0,9271.0,LLEIDA,64,92.0,233.0,162.0,90.0,22
