# ARPA Lombardia Ground Sensors- Air quality and meteorological preprocessing

- Ground sensor's type and position is retrived by the API: [Air quality stations](https://www.dati.lombardia.it/Ambiente/Stazioni-qualit-dell-aria/ib47-atvt) and [Meteorological stations](https://www.dati.lombardia.it/Ambiente/Stazioni-Meteorologiche/nf78-nj6b)
- The API provides data for the current year only (from Jenuary 2022): [API Air quality data](https://www.dati.lombardia.it/Ambiente/Dati-sensori-aria/nicp-bhqi) and  [API Meteorological data](https://www.dati.lombardia.it/Ambiente/Dati-sensori-meteo/647i-nhxk)
- To use data from previous years it's required to search for the dataset, such as [Air quality data for 2020]( https://www.dati.lombardia.it/Ambiente/Dati-sensori-aria-2020/88sp-5tmj) or [Meteorological data for 2020](https://www.dati.lombardia.it/Ambiente/Dati-sensori-meteo-2020/erjn-istm). It's required to download the ".csv" file.

In this notebook sensors position and type are retrieved from the API only, while time series are retreived by .csv or API depending on the year (if before 2022 is only possible to use the .csv file, while for 2022 data from API are available)

Notes:<br>
**Air pollution .csv data are still not available for 2021.** <br>
**Meteorological data of Jenuary 2022 are retreivable from API, but the .csv for 2021 is available.**


The "app_token" is required to access the data. <br>
Example video tutorial: https://www.youtube.com/watch?v=3p4gncGaSeg&t=899s&ab_channel=CharmingData <br>
Register on "Open Data Lombardia" to get tokens: https://www.dati.lombardia.it/login

## Import libraries

In [1]:
from sodapy import Socrata
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import os

In [2]:
cwd = os.getcwd()

In [3]:
start_date = "2020-03-01"
end_date = "2020-03-20" 

In [4]:
start_date_api = "'2022-02-01'"
end_date_api = "'2022-02-06'"

---

# Import stations and sensor type from ARPA API

Import sensors description and position from the API.

In [5]:
arpa_domain = "www.dati.lombardia.it"
st_descr = "ib47-atvt"

In [6]:
client = Socrata(arpa_domain, app_token = "riTLzYVRVdDaQtUkxDDaHRgJi")

In [7]:
results = client.get_all(st_descr)

In [8]:
air_st_descr = pd.DataFrame(results)

In [9]:
air_st_descr["idsensore"] = air_st_descr["idsensore"].astype(str).astype(int)

- - - 

<a id='aq_data_api'></a>
# Import air quality data from ARPA API

Skip to [air quality import from .csv](#aq_data_csv) data if required

In [None]:
arpa_domain = "www.dati.lombardia.it"
dati = "nicp-bhqi" #change this depending on the dataset (check Open Data Lombardia datasets)

In [None]:
client = Socrata(arpa_domain, app_token = "riTLzYVRVdDaQtUkxDDaHRgJi")

In [None]:
date_query = "data > {} and data < {}".format(start_date_api,end_date_api)
date_query

In [None]:
results = client.get(dati, where=date_query, limit=5000000000)

In [None]:
aq_data = pd.DataFrame(results)

In [None]:
aq_data['data'] =  pd.to_datetime(aq_data['data'], format='%Y/%m/%d %H:%M:%S')

In [None]:
aq_data = aq_data.astype({"idsensore": int,"valore": float})

- - -

<a id='aq_data_csv'></a>
# Import air quality data from .csv

Go back to [air quality import from API](#aq_data_api) data if required

Example using air quality data 2020 from ARPA stations: https://www.dati.lombardia.it/Ambiente/Dati-sensori-aria-2020/88sp-5tmj

In [10]:
aq_data = pd.read_csv(cwd+"/ground_sensor/aq_2020.csv")

Rename columns:

In [11]:
aq_data.rename(columns={'IdSensore': 'idsensore','Data': 'data','idOperatore': 'idoperatore','Stato': 'stato','Valore': 'valore'}, inplace=True)

Set date format:

In [12]:
aq_data['data'] =  pd.to_datetime(aq_data['data'], format='%d/%m/%Y %H:%M:%S')

Select date range:

In [13]:
mask = (aq_data.data >= start_date) & (aq_data.data <= end_date)
aq_data = aq_data.loc[mask]

- - -

# Air quality data processing

Drop "stato" and "idoperatore" columns and select valid values different from -9999:

In [14]:
aq_data = aq_data.drop(columns=['stato', 'idoperatore'])

In [15]:
aq_data = aq_data[aq_data.valore.astype(float) != -9999]

This part calculates the mean value for each sensor in the time range provided:

In [16]:
aq_means = aq_data.groupby(['idsensore'],as_index=False).mean()

Join sensors description and information with the mean value:

In [17]:
aq_table = pd.merge(aq_means, air_st_descr, on='idsensore')

Get the unique sensor type names:

In [18]:
air_st_descr.nometiposensore.unique()

array(['Ossidi di Azoto', 'Monossido di Carbonio', 'Biossido di Azoto',
       'Biossido di Zolfo', 'Particelle sospese PM2.5', 'Benzene',
       'Ozono', 'PM10 (SM2005)', 'Particolato Totale Sospeso',
       'Ammoniaca', 'Nikel', 'Arsenico', 'Cadmio', 'Piombo',
       'Benzo(a)pirene', 'BlackCarbon', 'Monossido di Azoto', 'PM10'],
      dtype=object)

Select sensors adding their names to the list:

In [19]:
sensor_sel = ['Ossidi di Azoto', 'Monossido di Carbonio', 'Biossido di Azoto','Ozono',
       'Biossido di Zolfo', 'Particelle sospese PM2.5','Ammoniaca','PM10 (SM2005)']

In [20]:
aq_table['nometiposensore'].astype(str)
aq_table = aq_table[aq_table['nometiposensore'].isin(sensor_sel)]

Save sensors separately and create a .gpkg file for each one:

In [21]:
# nox = aq_table.loc[aq_table['nometiposensore'] == 'Ossidi di Azoto']
pm25 = aq_table.loc[aq_table['nometiposensore'] == 'Particelle sospese PM2.5']
co = aq_table.loc[aq_table['nometiposensore'] == 'Monossido di Carbonio']
no2 = aq_table.loc[aq_table['nometiposensore'] == 'Biossido di Azoto']
so2 = aq_table.loc[aq_table['nometiposensore'] == 'Biossido di Zolfo']
nh3 = aq_table.loc[aq_table['nometiposensore'] == 'Ammoniaca']
nox = aq_table.loc[aq_table['nometiposensore'] == 'Ossidi di Azoto']
pm10 = aq_table.loc[aq_table['nometiposensore'] == 'PM10 (SM2005)']
ozono = aq_table.loc[aq_table['nometiposensore'] == 'Ozono']

In [22]:
pm25_gdf = gpd.GeoDataFrame(pm25, geometry=gpd.points_from_xy(pm25.lng, pm25.lat))
pm25_gdf = pm25_gdf.set_crs('epsg:4326')

In [23]:
co_gdf = gpd.GeoDataFrame(co, geometry=gpd.points_from_xy(co.lng, co.lat))
co_gdf = co_gdf.set_crs('epsg:4326', inplace=True)

In [24]:
no2_gdf = gpd.GeoDataFrame(no2, geometry=gpd.points_from_xy(no2.lng, no2.lat))
no2_gdf = no2_gdf.set_crs('epsg:4326')

In [25]:
so2_gdf = gpd.GeoDataFrame(so2, geometry=gpd.points_from_xy(so2.lng, so2.lat))
so2_gdf = so2_gdf.set_crs('epsg:4326')

In [26]:
nh3_gdf = gpd.GeoDataFrame(nh3, geometry=gpd.points_from_xy(nh3.lng, nh3.lat))
nh3_gdf = nh3_gdf.set_crs('epsg:4326')

In [27]:
nox_gdf = gpd.GeoDataFrame(nox, geometry=gpd.points_from_xy(nox.lng, nox.lat))
nox_gdf = nox_gdf.set_crs('epsg:4326')

In [28]:
pm10_gdf = gpd.GeoDataFrame(pm10, geometry=gpd.points_from_xy(pm10.lng, pm10.lat))
pm10_gdf = pm10_gdf.set_crs('epsg:4326')

In [29]:
ozono_gdf = gpd.GeoDataFrame(ozono, geometry=gpd.points_from_xy(ozono.lng, ozono.lat))
ozono_gdf = ozono_gdf.set_crs('epsg:4326')

In [30]:
pm25_gdf.to_file(cwd+"/temp/pm25_st.gpkg", driver="GPKG")
co_gdf.to_file(cwd+"/temp/co_st.gpkg", driver="GPKG")
no2_gdf.to_file(cwd+"/temp/no2_st.gpkg", driver="GPKG")
so2_gdf.to_file(cwd+"/temp/so2_st.gpkg", driver="GPKG")
nh3_gdf.to_file(cwd+"/temp/amm_st.gpkg", driver="GPKG")
nox_gdf.to_file(cwd+"/temp/nox_st.gpkg", driver="GPKG")
pm10_gdf.to_file(cwd+"/temp/pm10_st.gpkg", driver="GPKG")
ozono_gdf.to_file(cwd+"/temp/ozono_st.gpkg", driver="GPKG")

  pd.Int64Index,
  pd.Int64Index,
  pd.Int64Index,
  pd.Int64Index,
  pd.Int64Index,
  pd.Int64Index,
  pd.Int64Index,
  pd.Int64Index,


- - -

# Import meteorological stations from ARPA API

https://www.dati.lombardia.it/Ambiente/Dati-sensori-meteo-2020/erjn-istm

In [None]:
arpa_domain = "www.dati.lombardia.it"
m_st_descr = "nf78-nj6b"

In [None]:
client = Socrata(arpa_domain, app_token = "riTLzYVRVdDaQtUkxDDaHRgJi")

In [None]:
results = client.get_all(m_st_descr)

In [None]:
meteo_st_descr = pd.DataFrame(results)

In [None]:
meteo_st_descr["idsensore"] = meteo_st_descr["idsensore"].astype(str).astype(int)

In [None]:
meteo_st_descr

- - -

<a id='meteo_data_api'></a>
# Import meteorological data from API

Skip to [meteorological data import from .csv](#meteo_data_csv) data if required

In [None]:
arpa_domain = "www.dati.lombardia.it"
dati = "647i-nhxk" #change this depending on the dataset (check Open Data Lombardia datasets)

In [None]:
client = Socrata(arpa_domain, app_token = "riTLzYVRVdDaQtUkxDDaHRgJi")

The date must be changed from the following request:

In [None]:
date_query = "data > {} and data < {}".format(start_date_api,end_date_api)
date_query

In [None]:
results = client.get(dati, where=date_query, limit=5000000000)

In [None]:
meteo_data = pd.DataFrame(results)
meteo_data

In [None]:
meteo_data['data'] =  pd.to_datetime(meteo_data['data'], format='%Y/%m/%d %H:%M:%S')

In [None]:
meteo_data = meteo_data.astype({"idsensore": int,"valore": float})

<a id='meteo_data_csv'></a>
# Import climate data from ARPA .csv file

Go back to [meteorological data import from API](#meteo_data_api) data if required

In [None]:
meteo_data = pd.read_csv(cwd+"/ground_sensor/meteo_2020.csv", dtype={"IdSensore": int,"Valore": float, "Stato": str, "idOperatore":str})

Rename columns:

In [None]:
meteo_data.rename(columns={'IdSensore': 'idsensore','Data': 'data','idOperatore': 'idoperatore','Stato': 'stato','Valore': 'valore'}, inplace=True)
meteo_data

Set date format:

In [None]:
meteo_data['data'] =  pd.to_datetime(meteo_data['data'], format='%d/%m/%Y %H:%M:%S')

Filter date range:

In [None]:
mask = (meteo_data.data >= start_date) & (meteo_data.data <= end_date)
meteo_data = meteo_data.loc[mask]

- - -

# Meteorological data processing 

Drop "stato", "idoperatore" columns and select valid data different from -9999:

In [None]:
meteo_data = meteo_data.drop(columns=['stato', 'idoperatore'])

In [None]:
meteo_data = meteo_data[meteo_data.valore != -9999]

Calculate mean value for each sensor in the time range:

In [None]:
meteo_means = meteo_data.groupby(['idsensore'],as_index=False).mean()

Join sensors description and information:

In [None]:
meteo_table = pd.merge(meteo_means, meteo_st_descr, on = 'idsensore')

Get sensors unique types:

In [None]:
meteo_st_descr.tipologia.unique()

Select sensors adding to the following list:

In [None]:
m_sensor_sel = ['Precipitazione','Temperatura','Umidità Relativa','Direzione Vento','Velocità Vento', 'Radiazione Globale']

In [None]:
meteo_table['tipologia'].astype(str)
meteo_table = meteo_table[meteo_table['tipologia'].isin(m_sensor_sel)]

Save sensors separately and create a .gpkg file for each one:

In [None]:
temp_st = meteo_table.loc[meteo_table['tipologia'] == 'Temperatura']
prec_st = meteo_table.loc[meteo_table['tipologia'] == 'Precipitazione']
air_hum_st = meteo_table.loc[meteo_table['tipologia'] == 'Umidità Relativa']
wind_dir_st = meteo_table.loc[meteo_table['tipologia'] == 'Direzione Vento']
wind_speed_st = meteo_table.loc[meteo_table['tipologia'] == 'Velocità Vento']
rad_glob_st = meteo_table.loc[meteo_table['tipologia'] == 'Radiazione Globale']

In [None]:
temp_gdf = gpd.GeoDataFrame(temp_st, geometry=gpd.points_from_xy(temp_st.lng, temp_st.lat))
temp_gdf = temp_gdf.set_crs('epsg:4326')

In [None]:
prec_gdf = gpd.GeoDataFrame(prec_st, geometry=gpd.points_from_xy(prec_st.lng, prec_st.lat))
prec_gdf = prec_gdf.set_crs('epsg:4326')

In [None]:
air_hum_gdf = gpd.GeoDataFrame(air_hum_st, geometry=gpd.points_from_xy(air_hum_st.lng, air_hum_st.lat))
air_hum_gdf = air_hum_gdf.set_crs('epsg:4326')

In [None]:
wind_dir_gdf = gpd.GeoDataFrame(wind_dir_st, geometry=gpd.points_from_xy(wind_dir_st.lng, wind_dir_st.lat))
wind_dir_gdf = wind_dir_gdf.set_crs('epsg:4326')

In [None]:
wind_speed_gdf = gpd.GeoDataFrame(wind_speed_st, geometry=gpd.points_from_xy(wind_speed_st.lng, wind_speed_st.lat))
wind_speed_gdf = wind_speed_gdf.set_crs('epsg:4326')

In [None]:
rad_glob_gdf = gpd.GeoDataFrame(rad_glob_st, geometry=gpd.points_from_xy(rad_glob_st.lng, rad_glob_st.lat))
rad_glob_gdf = rad_glob_gdf.set_crs('epsg:4326')

In [None]:
temp_gdf.to_file(cwd+"/temp/temp_st.gpkg", driver="GPKG")
prec_gdf.to_file(cwd+"/temp/prec_st.gpkg", driver="GPKG")
air_hum_gdf.to_file(cwd+"/temp/air_hum_st.gpkg", driver="GPKG")
wind_dir_gdf.to_file(cwd+"/temp/wind_dir_st.gpkg", driver="GPKG")
wind_speed_gdf.to_file(cwd+"/temp/wind_speed_st.gpkg", driver="GPKG")
rad_glob_gdf.to_file(cwd+"/temp/rad_glob_st.gpkg", driver="GPKG")

---

In [None]:
#mask = (meteo_data['data'] >= start_date) & (meteo_data['data'] < end_date)
#meteo_data = meteo_data.loc[mask]
#meteo_data

In [None]:
# print(list(arpa_df.columns))
# print(arpa_df['idsensore'].values)

In [None]:
# results = client.get_all(dati, idsensore = "100", data='2022-01-20')
# results

In [None]:
# arpa_df.loc[arpa_df['idsensore'] == "10377"]