# GET data from AEMET API

This Notebook will:
- Get complete climate datasets through AEMET API.
- Storage in local .csv files the datasets which I will use in EDA.

## Tabla de contenidos

<nav>
  <ol>
    <li><a href="#1-data-selection">Data selection</a></li>
    <li><a href="#2-BC-data">Basque Country data accessing</a></li>
    <li><a href="#3-data-cleaning">Basque Country data clean and merge</a></li>
  </ol>
</nav>


## 1 - Data selection <a id="1-data-selection"></a>

Select data information that will be requested.

Through the AEMET OpenData API, historical data dating back to 1805 can be accessed. The San Fernando station in Cadiz has the oldest precipitation data series, with records spanning almost 200 years.

In total, 29 weather stations in Spain have centennial series, providing information of great historical and climatological value.


In [1]:
# Libraries
import os
import re
import sys
import time

import numpy as np
import pandas as pd
import requests

In [2]:
current_directory = os.getcwd()
root_path = os.path.abspath(os.path.join(current_directory, '..'))
print(current_directory) 
print(root_path)  

# Validate if the path exists before adding it
if os.path.exists(root_path) and root_path not in sys.path:
    sys.path.append(root_path)
                        
from utils.functions import fetch_data_aemet

c:\Users\Lander\Documents\Bootcamp_DS\ONLINE_DS_THEBRIDGE_2024\04_Project_Break_I\EDA_Project\src\notebooks
c:\Users\Lander\Documents\Bootcamp_DS\ONLINE_DS_THEBRIDGE_2024\04_Project_Break_I\EDA_Project\src


### 1.1 - Accessing AEMET API

In [3]:
# Load API-Key so I have permission to access AEMET OpenData 

with open('C:/Users/Lander/Documents/API_Keys/apiKey_aemet.txt') as f:
    api_key = f.read()

In [4]:
# API AEMET URL base. Found in https://opendata.aemet.es/dist/index.html
base_url = 'https://opendata.aemet.es/opendata'

# Parameters for the request
querystring = {"api_key": api_key}
headers = {'cache-control': "no-cache"}

### 1.2 - Provinces selection

In [5]:
# File .csv contains info related to all Spanish Automatic Weather Stations (AWS)
# EMA = Estación Meteorológica Automática (EMA)
df_stations = pd.read_csv('../data/raw/EMA_info_raw.csv')

# Spanish provinces
display(df_stations.sample(5))
display(df_stations['provincia'].unique())

Unnamed: 0,latitud,provincia,altitud,idema,nombre,indsinop,longitud
629,374712N,SEVILLA,200,5654X,LA PUEBLA DE LOS INFANTES,,052216W
772,394252N,CUENCA,815,8245Y,MIRA,,012633W
284,433904N,LUGO,80,1347T,BURELA,,072126W
227,430544N,BIZKAIA,270,1064L,"OROZKO, IBARRA",,025137W
869,413716N,ZARAGOZA,258,9434P,"ZARAGOZA, VALDESPARTERA",8159.0,005605W


array(['ILLES BALEARS', 'BALEARES', 'LAS PALMAS', 'STA. CRUZ DE TENERIFE',
       'TARRAGONA', 'BARCELONA', 'GIRONA', 'NAVARRA', 'GIPUZKOA',
       'ARABA/ALAVA', 'BIZKAIA', 'CANTABRIA', 'ASTURIAS', 'LEON', 'LUGO',
       'A CORUÑA', 'PONTEVEDRA', 'OURENSE', 'SORIA', 'BURGOS', 'SEGOVIA',
       'VALLADOLID', 'PALENCIA', 'AVILA', 'MADRID', 'SALAMANCA', 'ZAMORA',
       'GUADALAJARA', 'CUENCA', 'TOLEDO', 'CACERES', 'ALBACETE',
       'CIUDAD REAL', 'BADAJOZ', 'CORDOBA', 'HUELVA', 'CEUTA', 'JAEN',
       'GRANADA', 'ALMERIA', 'SEVILLA', 'CADIZ', 'MELILLA', 'MALAGA',
       'MURCIA', 'ALICANTE', 'VALENCIA', 'TERUEL', 'CASTELLON',
       'LA RIOJA', 'HUESCA', 'ZARAGOZA', 'LLEIDA'], dtype=object)

In [6]:
# Provinces for the EDA
provinces = ['BIZKAIA', 'ARABA/ALAVA', 'GIPUZKOA']

df_EMA_euskadi = df_stations[df_stations['provincia'].isin(provinces)]
df_EMA_euskadi.sample(5)

Unnamed: 0,latitud,provincia,altitud,idema,nombre,indsinop,longitud
216,431808N,GIPUZKOA,28,1041A,ZUMAIA,8026.0,021504W
225,432230N,BIZKAIA,90,1059X,PUNTA GALEA,8059.0,030118W
230,431050N,BIZKAIA,210,1078C,BALMASEDA,,031235W
217,430328N,ARABA/ALAVA,617,1044X,"ARAMAIO, ETXAGUEN",,023522W
819,423614N,ARABA/ALAVA,612,9122I,LABASTIDA,,024636W


In [7]:
len(df_EMA_euskadi)

37

### 1.3 Accessing AWS informatin

In [8]:
# Accesing ONE single AWS
end_point = '/api/valores/climatologicos/mensualesanuales/datos/anioini/{anioIniStr}/aniofin/{anioFinStr}/estacion/{idema}'

anioIniStr = '2010'
anioFinStr = '2013'
idema = '1012P' # Irun station "idema"

end_point = end_point.format(anioIniStr = anioIniStr, anioFinStr = anioFinStr, idema = idema)

url = base_url + end_point

response = requests.request("GET", url, headers=headers, params=querystring)
print(response.status_code)
print(response.json())

200
{'descripcion': 'exito', 'estado': 200, 'datos': 'https://opendata.aemet.es/opendata/sh/273c5535', 'metadatos': 'https://opendata.aemet.es/opendata/sh/997c0034'}


In [9]:
# Accessing Irún data
df_Irun = pd.DataFrame(requests.get(response.json()['datos']).json())
display(df_Irun.head())
df_Irun.shape

Unnamed: 0,fecha,indicativo,p_max,hr,tm_min,ta_max,ts_min,nt_30,np_100,np_001,...,nt_00,ti_max,tm_mes,tm_max,np_010,nw_55,w_racha,nw_91,w_rec,w_med
0,2010-10,1012P,50.8(10),70.0,11.8,29.4(02),20.2,0.0,4.0,11.0,...,0.0,12.2,15.6,19.3,8.0,,,,,
1,2010-11,1012P,33.4(22),77.0,8.4,23.2(12),14.7,0.0,12.0,26.0,...,0.0,7.1,11.1,13.7,20.0,9.0,24/20.3(15),0.0,272.0,11.0
2,2010-12,1012P,41.2(22),68.0,4.6,21.0(06),14.3,0.0,6.0,13.0,...,6.0,2.7,8.2,11.8,12.0,5.0,16/21.1(06),0.0,158.0,7.0
3,2010-13,1012P,,,,,,,,,...,,,,,,,,,,
4,2010-1,1012P,,,,,,,,,...,,,,,,,25/28.6(14),,,


(52, 24)

### Conclusion:

For the EDA the following data will be used:
- Climatic parameters for the last 100 year (if storaged).
- Basque Country data (Bizkaia, Araba, Gipuzkoa)
- Mean values per month per year
- JOIN climate parameters with AWS information


## 2 - Basque Country data accessing <a id="2-BC-data"></a>

In [10]:
# Inspection of all the AWS
df_stations

Unnamed: 0,latitud,provincia,altitud,idema,nombre,indsinop,longitud
0,394924N,ILLES BALEARS,490,B013X,"ESCORCA, LLUC",8304.0,025309E
1,394744N,ILLES BALEARS,5,B051A,"SÓLLER, PUERTO",8316.0,024129E
2,394121N,ILLES BALEARS,60,B087X,BANYALBUFAR,,023046E
3,393445N,ILLES BALEARS,52,B103B,ANDRATX - SANT ELM,99103.0,022208E
4,393305N,ILLES BALEARS,50,B158X,"CALVIÀ, ES CAPDELLÀ",,022759E
...,...,...,...,...,...,...,...
942,424131N,LLEIDA,2467,9988B,CAP DE VAQUÈIRA,8936.0,005826E
943,424201N,LLEIDA,1161,9990X,"NAUT ARAN, ARTIES",8107.0,005237E
944,424634N,LLEIDA,722,9994X,BOSSÒST,,004123E
945,430528N,NAVARRA,334,9995Y,VALCARLOS/LUZAIDE,,011803W


In [11]:
df_EMA_euskadi.idema.unique()

array(['1012P', '1014', '1014A', '1021X', '1024E', '1025A', '1025X',
       '1026X', '1037X', '1037Y', '1038X', '1041A', '1044X', '1048X',
       '1049N', '1050J', '1052A', '1055B', '1056K', '1057B', '1059X',
       '1060X', '1064L', '1069Y', '1074C', '1078C', '1078I', '1082',
       '1083B', '9060X', '9073X', '9087', '9091O', '9091R', '9122I',
       '9145X', '9178X'], dtype=object)

In [12]:
# Get climate data per AWS, per month, per year

df_climate_month_year = pd.DataFrame()    # Empty DataFrame

for idema in df_EMA_euskadi['idema']:
    for year in range(1924, 2025, 4):
        print(f'Accessing {idema} {year}\'s info...')
        end_point = '/api/valores/climatologicos/mensualesanuales/datos/anioini/{anioIniStr}/aniofin/{anioFinStr}/estacion/{idema}'
        anioIniStr = str(year)
        anioFinStr = str(year+3)

        end_point = end_point.format(anioIniStr = anioIniStr, anioFinStr = anioFinStr, idema = idema)
        url = base_url + end_point
        
        time.sleep(3)   # To prevent requests limit
        
        # Complete request dataset
        df_idema_data = fetch_data_aemet(url, headers, querystring)
        
        if isinstance(df_idema_data, pd.DataFrame):
            df_climate_month_year = pd.concat([df_climate_month_year, df_idema_data])
        
        print('')


# Display data frame
display(df_climate_month_year)
    
# Storage dataset
df_climate_month_year.to_csv('../data/raw/climate_month_year_euskadi_RAW.csv',
                             index = False)

Accessing 1012P 1924's info...
Request Status Code:  200
Data Info:  {'descripcion': 'No hay datos que satisfagan esos criterios', 'estado': 404}

Accessing 1012P 1928's info...
Request Status Code:  200
Data Info:  {'descripcion': 'No hay datos que satisfagan esos criterios', 'estado': 404}

Accessing 1012P 1932's info...
Request Status Code:  200
Data Info:  {'descripcion': 'No hay datos que satisfagan esos criterios', 'estado': 404}

Accessing 1012P 1936's info...
Request Status Code:  200
Data Info:  {'descripcion': 'No hay datos que satisfagan esos criterios', 'estado': 404}

Accessing 1012P 1940's info...
Request Status Code:  200
Data Info:  {'descripcion': 'No hay datos que satisfagan esos criterios', 'estado': 404}

Accessing 1012P 1944's info...
Request Status Code:  200
Data Info:  {'descripcion': 'No hay datos que satisfagan esos criterios', 'estado': 404}

Accessing 1012P 1948's info...
Request Status Code:  200
Data Info:  {'descripcion': 'No hay datos que satisfagan esos

KeyboardInterrupt: 

## 3 - Basque Country data cleaning <a id="3-data-cleaning"></a>

In [13]:
df_euskadi = pd.read_csv('../data/raw/climate_month_year_euskadi_RAW.csv')
df_euskadi

Unnamed: 0,fecha,indicativo,p_max,hr,nw_55,tm_min,ta_max,ts_min,nt_30,w_racha,...,q_max,q_mar,q_med,q_min,inso,p_sol,ts_20,ts_10,ts_50,glo
0,2009-10,1012P,35.2(22),74.0,1.0,12.7,30.8(06),22.9,1.0,16/15.3(20),...,,,,,,,,,,
1,2009-11,1012P,,67.0,14.0,10.6,26.9(01),17.2,0.0,25/27.2(07),...,,,,,,,,,,
2,2009-12,1012P,,73.0,10.0,6.6,19.6(29),16.0,0.0,14/26.1(21),...,,,,,,,,,,
3,2009-13,1012P,,,,,,,,,...,,,,,,,,,,
4,2009-1,1012P,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12397,2024-9,9178X,,,,,,,,,...,,,,,,,,,,
12398,2024-3,9178X,20.8(07),73.0,,3.4,24.1(22),11.0,0.0,,...,,,,,,,,,,
12399,2024-4,9178X,10.4(27),65.0,,4.5,27.2(13),11.8,0.0,,...,,,,,,,,,,
12400,2024-5,9178X,22.8(18),67.0,,7.5,26.4(28),12.7,0.0,,...,,,,,,,,,,


In [14]:
df_euskadi.fecha.min()  # Oldest data

'1928-1'

### 3.1 - Climate mean values per year

AEMET mean data is storage per year in the month numer "13". But this data is missing a lot of values. Maybe I won't use this information.

In [15]:
# Filter: month "13" storages mean values of each year. Data provided directly by AEMET
df_euskadi_anual = df_euskadi[df_euskadi['fecha'].str.contains(r'-13$', regex=True)]

# Delete "13" suffix
df_euskadi_anual.loc[:, 'fecha'] = df_euskadi_anual['fecha'].str.replace(r'-13$', '', regex=True)
df_euskadi_anual

Unnamed: 0,fecha,indicativo,p_max,hr,nw_55,tm_min,ta_max,ts_min,nt_30,w_racha,...,q_max,q_mar,q_med,q_min,inso,p_sol,ts_20,ts_10,ts_50,glo
3,2009,1012P,,,,,,,,,...,,,,,,,,,,
16,2010,1012P,,,,,,,,,...,,,,,,,,,,
29,2011,1012P,,74.0,,12.0,39.1(21/ago),21.6,10.0,,...,,,,,,,,,,
42,2012,1012P,67.0(20/oct),75.0,28.0,10.7,38.2(17/ago),22.1,16.0,26/24.4(05/ene),...,,,,,,,,,,
55,2013,1012P,89.2(08/jun),76.0,44.0,10.9,37.6(31/jul),21.0,5.0,23/27.5(26/ene),...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12340,2020,9178X,34.0(06/nov),,,,,,,,...,,,,,,,,,,
12353,2021,9178X,,73.0,,6.9,40.0(14/ago),20.8,29.0,,...,,,,,,,,,,
12366,2022,9178X,27.0(09/ene),69.0,,7.6,40.9(16/jul),20.8,57.0,,...,,,,,,,,,,
12379,2023,9178X,52.8(01/sep),71.0,,7.6,40.0(24/ago),24.2,37.0,,...,,,,,,,,,,


In [16]:
df_euskadi_anual = df_euskadi_anual.rename(columns={'indicativo' : 'idema'})
df_euskadi_anual.sort_values(by = ['idema', 'fecha'])

df_euskadi_anual

Unnamed: 0,fecha,idema,p_max,hr,nw_55,tm_min,ta_max,ts_min,nt_30,w_racha,...,q_max,q_mar,q_med,q_min,inso,p_sol,ts_20,ts_10,ts_50,glo
3,2009,1012P,,,,,,,,,...,,,,,,,,,,
16,2010,1012P,,,,,,,,,...,,,,,,,,,,
29,2011,1012P,,74.0,,12.0,39.1(21/ago),21.6,10.0,,...,,,,,,,,,,
42,2012,1012P,67.0(20/oct),75.0,28.0,10.7,38.2(17/ago),22.1,16.0,26/24.4(05/ene),...,,,,,,,,,,
55,2013,1012P,89.2(08/jun),76.0,44.0,10.9,37.6(31/jul),21.0,5.0,23/27.5(26/ene),...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12340,2020,9178X,34.0(06/nov),,,,,,,,...,,,,,,,,,,
12353,2021,9178X,,73.0,,6.9,40.0(14/ago),20.8,29.0,,...,,,,,,,,,,
12366,2022,9178X,27.0(09/ene),69.0,,7.6,40.9(16/jul),20.8,57.0,,...,,,,,,,,,,
12379,2023,9178X,52.8(01/sep),71.0,,7.6,40.0(24/ago),24.2,37.0,,...,,,,,,,,,,


In [17]:
df_euskadi_anual.to_csv('../data/raw/BC_anual_climate_raw.csv',
                        index = False)

In [18]:
df_euskadi_anual.idema.unique()

array(['1012P', '1014', '1014A', '1021X', '1024E', '1025A', '1025X',
       '1026X', '1037X', '1037Y', '1038X', '1041A', '1044X', '1048X',
       '1049N', '1050J', '1052A', '1055B', '1056K', '1057B', '1059X',
       '1060X', '1064L', '1069Y', '1074C', '1078C', '1078I', '1082',
       '1083B', '9060X', '9073X', '9087', '9091O', '9091R', '9122I',
       '9145X', '9178X'], dtype=object)

In [19]:
df_euskadi_anual.fecha.unique()

array(['2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016',
       '2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024',
       '1955', '1956', '1957', '1958', '1959', '1960', '1961', '1962',
       '1963', '1964', '1965', '1966', '1967', '1968', '1969', '1970',
       '1971', '1972', '1973', '1974', '1975', '1976', '1977', '1978',
       '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
       '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994',
       '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002',
       '2003', '2004', '2005', '2006', '2007', '2008', '1928', '1929',
       '1930', '1931', '1932', '1933', '1934', '1935', '1936', '1937',
       '1938', '1939', '1940', '1941', '1942', '1943', '1944', '1945',
       '1946', '1947', '1948', '1949', '1950', '1951', '1952', '1953',
       '1954'], dtype=object)

### 3.2 - Clima per month, per yer

In [20]:
# Filter: only months from 1 to 12
df_euskadi_month_year = df_euskadi[~df_euskadi['fecha'].str.contains(r'-13$', regex=True)]

df_euskadi_month_year = df_euskadi_month_year.rename(columns={'indicativo' : 'idema'})
df_euskadi_month_year.sort_values(by = ['idema', 'fecha'])

df_euskadi_month_year

Unnamed: 0,fecha,idema,p_max,hr,nw_55,tm_min,ta_max,ts_min,nt_30,w_racha,...,q_max,q_mar,q_med,q_min,inso,p_sol,ts_20,ts_10,ts_50,glo
0,2009-10,1012P,35.2(22),74.0,1.0,12.7,30.8(06),22.9,1.0,16/15.3(20),...,,,,,,,,,,
1,2009-11,1012P,,67.0,14.0,10.6,26.9(01),17.2,0.0,25/27.2(07),...,,,,,,,,,,
2,2009-12,1012P,,73.0,10.0,6.6,19.6(29),16.0,0.0,14/26.1(21),...,,,,,,,,,,
4,2009-1,1012P,,,,,,,,,...,,,,,,,,,,
5,2009-2,1012P,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12397,2024-9,9178X,,,,,,,,,...,,,,,,,,,,
12398,2024-3,9178X,20.8(07),73.0,,3.4,24.1(22),11.0,0.0,,...,,,,,,,,,,
12399,2024-4,9178X,10.4(27),65.0,,4.5,27.2(13),11.8,0.0,,...,,,,,,,,,,
12400,2024-5,9178X,22.8(18),67.0,,7.5,26.4(28),12.7,0.0,,...,,,,,,,,,,


In [21]:
# Save dataset
df_euskadi_month_year.to_csv('../data/raw/BC_month_year_climate_raw.csv',
                            index = False)
