<a href="https://colab.research.google.com/github/schenzio/picchi_benzene/blob/main/Something_in_the_air_Mappatura_picchi_benzene.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Picchi di benzene in Italia
## 📌 Obiettivo
Il progetto intende mappare i territori italiani che hanno registrato picchi orari critici di benzene dal 2013 al 2023. Questo documento mostra illustra come estrarre questi dati dal portale dell'Envionrmental European Agency.

***

## 📖Contesto
Quando le concentrazioni di benzene nell'aria sono un pericolo per salute?

La [direttiva UE 2008/EC/50](https://eur-lex.europa.eu/eli/dir/2008/50/oj/ita/pdf) sulla qualità dell'aria fissa a 5 µg/m³ (microgrammi/metro cubo) la soglia della media annuale concentrazione del benzene (C6H6) nell'aria, ma non stabilisce un limite per le concentrazioni orarie.

In letteratura, si considera la soglia di 27 µg/m³ significativa ai fini degli effetti avversi sulla salute. Tale limite viene preso come riferimento nella [normativa californiana](https://oehha.ca.gov/chemicals/benzene).

Quindi, la popolazione di un territorio con una media annuale di benzene considerata accettabile (< 5 µg/m³), potrebbe avere comunque effetti avversi sulla salute se i dati orari dietro quella media presentano picchi di benzene critici (>27 µg/m³).

Il progetto vuole quindi individuare dove in Italia si sono registrati picchi orari di benzene superiori alla soglia critica.

### 📊Taranto, un caso studio di interesse
Questa ricerca prende ispirazione dall'[analisi sui picchi orari di benzene a Taranto](https://www.isdenews.it/i-picchi-di-benzene-a-taranto/), presentata in Senato dall'associazione Peacelink nel febbraio 2024

I dati presentati da Peacelink sono stati raccolti dal [software Ominscope](https://www.peacelink.it/ariataranto), che ha analizzato le medie orarie del benzene nella centralina ARPA di via Machiavelli del quartiere Tamburi di Taranto tra 2013 e 2023.

***

## 🔍 Fonti
Tutti i valori delle rilevazioni orarie del benzene in Europa sono scaribili dal portale dell'**EEA (Environmental European Agency)**.

L'analisi seguente richiede il lavoro su due file:

*   **Rilevazioni orarie benzene per stazione di monitoraggio**: *ParquetFilesUrls_IT_C6H6_dati_orari.csv*

  Contiene una lista di URL di file *.parquet*. Ciascun file *.parquet* è nominato come una stazione di monitoraggio dell'aria e contiene lo storico delle rilevazioni orarie di benzene in quella data stazione.

  Scaricabile dalla [portale EEA](https://eeadmz1-downloads-webapp.azurewebsites.net/) applicando i seguenti filtri (oppure usando l'API di EEA):
   - Countries = IT
   - Pollutants = C6H6
   - Dataset = Primary validated data (E1a) *(dati verificati tra 2013 e 2023)*
   - Type = Hourly data
   - List of URLs = True

*   **Stazioni di monitoraggio del benzene in Itala**: *IT_C6H6_stazioni_di_monitoraggio.csv*

  Contiene informazioni su tutte le stazioni di monitoraggio del benzene in Italia.
  
  Scaricabile dal [portale EEA](https://discomap.eea.europa.eu/App/AQViewer/index.html?fqn=Airquality_Dissem.b2g.measurements) applicando i seguenti filtri:
   - Country = Italy
   - Air pollutant = C6H6

Per i dettagli sui contenuti dei file, vedi le [****istruzioni per lettura portale EEA e metadati****](https://eeadmz1-downloads-webapp.azurewebsites.net/content/documentation/How_To_Downloads.pdf)

***

## ⚙️ Processo

0. Setting ambiente
2. Estrazione picchi di benzene
3. Confronto con le stazioni di monitoraggio del benzene

### 0. Setting ambiente
I due file .csv delle fonti sono stati scaricati dai link EEA sopracitati con i parametri indicati, e sono ora pubblicati su **[Github](https://github.com/schenzio/picchi_benzene)**

Scarico i file dell'EEA dal repository github (RAW link)

In [None]:
!wget -c "https://raw.githubusercontent.com/schenzio/picchi_benzene/refs/heads/main/ParquetFilesUrls_IT_C6H6_dati_orari.csv" -O "ParquetFilesUrls_IT_C6H6_dati_orari.csv"
!wget -c "https://raw.githubusercontent.com/schenzio/picchi_benzene/refs/heads/main/IT_C6H6_stazioni_di_monitoraggio.csv" -O "IT_C6H6_stazioni_di_monitoraggio.csv"

--2025-02-08 11:03:29--  https://raw.githubusercontent.com/schenzio/picchi_benzene/refs/heads/main/ParquetFilesUrls_IT_C6H6_dati_orari.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 37037 (36K) [text/plain]
Saving to: ‘ParquetFilesUrls_IT_C6H6_dati_orari.csv’


2025-02-08 11:03:29 (4.56 MB/s) - ‘ParquetFilesUrls_IT_C6H6_dati_orari.csv’ saved [37037/37037]

--2025-02-08 11:03:29--  https://raw.githubusercontent.com/schenzio/picchi_benzene/refs/heads/main/IT_C6H6_stazioni_di_monitoraggio.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200

Installo e importo le librerie per l'analisi dati

In [None]:
!pip install pandas
!pip install duckdb
!pip install geopy



In [None]:
import pandas as pd
import duckdb
from geopy.geocoders import Nominatim

### 1. Estrazione picchi di benzene

Il dataset dei picchi di benzene è creato estrando da ogni ogni file .parquet le rilevazioni orarie di benzene maggiori di 27 µg/m³.

Creo un dataframe a partire dal csv con gli URL dei files .parquet

In [None]:
file_path_parquet_urls = "/content/ParquetFilesUrls_IT_C6H6_dati_orari.csv"
parquet_urls = pd.read_csv(file_path_parquet_urls)
parquet_urls.head(3)

Unnamed: 0,ParquetFileUrl
0,https://eeadmz1batchservice02.blob.core.window...
1,https://eeadmz1batchservice02.blob.core.window...
2,https://eeadmz1batchservice02.blob.core.window...


In [None]:
print(f"Nel file ci sono {parquet_urls.shape[0]} URL di file parquet, ciascuno dei quali contiene rilevazioni orarie di benzene per una data stazione di monitoraggio")

Nel file ci sono 301 URL di file parquet, ciascuno dei quali contiene rilevazioni orarie di benzene per una data stazione di monitoraggio


  Ogni url contiene il nome della relativa stazione di monitoraggio, che corrisponde alla parte finale dell'url successiva al codice paese (IT).
  
  Di seguito un esempio con la prima url in lista

In [None]:
first_url = parquet_urls.iloc[0, 0]
print("Esempio di url:", first_url)

substring_to_find = "/IT/"
index = first_url.find(substring_to_find)

if index != -1:
  id_stazione = first_url[index + len(substring_to_find):]
  print("\nID stazione di monitoraggio associata:", id_stazione)
else:
  print("stringa non trovata")

Esempio di url: https://eeadmz1batchservice02.blob.core.windows.net/airquality-p-e1a/IT/SPO.IT0063A_20_GC-FID_2015-04-01_00_00_00.parquet

ID stazione di monitoraggio associata: SPO.IT0063A_20_GC-FID_2015-04-01_00_00_00.parquet


Come sono fatti i parquet file delle stazioni di monitoraggio?

*Tutti i metadati leggibili [qui (p.16)](https://eeadmz1-downloads-webapp.azurewebsites.net/content/documentation/How_To_Downloads.pdf)*

Esempio dal primo URL:

In [None]:
df_fist_url=duckdb.query(f"SELECT * FROM '{first_url}'").df()
df_fist_url.sample(3)

Unnamed: 0,Samplingpoint,Pollutant,Start,End,Value,Unit,AggType,Validity,Verification,ResultTime,DataCapture,FkObservationLog
21298,IT/SPO.IT0063A_20_GC-FID_2015-04-01_00:00:00,20,2017-09-04 10:00:00,2017-09-04 11:00:00,0.0,ug.m-3,hour,1,1,2018-07-11 16:48:09,,3f0156a8-5afd-415b-9a01-196bd3ed12ad
19304,IT/SPO.IT0063A_20_GC-FID_2015-04-01_00:00:00,20,2017-06-13 08:00:00,2017-06-13 09:00:00,0.0,ug.m-3,hour,1,1,2018-07-11 16:48:09,,3f0156a8-5afd-415b-9a01-196bd3ed12ad
22663,IT/SPO.IT0063A_20_GC-FID_2015-04-01_00:00:00,20,2017-10-31 07:00:00,2017-10-31 08:00:00,2.0,ug.m-3,hour,1,1,2018-07-11 16:48:09,,3f0156a8-5afd-415b-9a01-196bd3ed12ad


Creo un dataframe *picchi_benzene* in cui inserire attraverso un loop tutti i record dei file .parquet in cui la concentrazione di benzene supera i 27 µg/m³

In [None]:
picchi_benzene = pd.DataFrame()

# Per ogni riga del dataframe parquet_urls:
for index, row in parquet_urls.iterrows():
    url = row['ParquetFileUrl']  # Estraggo il valore dell'url dalla riga
    query= f"SELECT * FROM '{url}'" #Query per leggere parquet file come df > metodo spiegato da Borruso :) > https://aborruso.github.io/posts/leggere-interrogare-file-parquet/
    df_url=duckdb.query(query).df() #creo la var df_url che legge il contenuto del file .parquet come dataframe
    if (index==0): #a inizio loop la parte di df_url con picchi critici viene sovrascritta su picchi_benzene (così assume struttura colonne)
      picchi_benzene = df_url[df_url['Value'] >= 27]
    else: #per i successivi url, la parte di df_url con picchi critici viene mergiata a picchi_benzene (metodo outer, la struttura delle colonne è comune a tutti i .parquet)
      picchi_benzene = picchi_benzene.merge(df_url[df_url['Value'] >= 27], how="outer")

picchi_benzene.head(10)

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Unnamed: 0,Samplingpoint,Pollutant,Start,End,Value,Unit,AggType,Validity,Verification,ResultTime,DataCapture,FkObservationLog
0,IT/SPO.IT0460A_20_GC-FID_2004-09-02_00:00:00,20,2014-03-27 16:00:00,2014-03-27 17:00:00,31.9,ug.m-3,hour,1,1,2015-10-23 15:33:02,,c6a11703-ce82-4cc6-9d40-a60e079e5362
1,IT/SPO.IT0461A_20_GC-FID_2003-08-03_00:00:00,20,2015-07-17 04:00:00,2015-07-17 05:00:00,28.6,ug.m-3,hour,1,1,2016-07-18 14:49:33,,9770a205-069e-4dd3-86f5-6fb0df84e57f
2,IT/SPO.IT0461A_20_GC-FID_2003-08-03_00:00:00,20,2015-07-17 08:00:00,2015-07-17 09:00:00,27.6,ug.m-3,hour,1,1,2016-07-18 14:49:33,,9770a205-069e-4dd3-86f5-6fb0df84e57f
3,IT/SPO.IT0461A_20_GC-FID_2003-08-03_00:00:00,20,2018-04-12 18:00:00,2018-04-12 19:00:00,69.5,ug.m-3,hour,1,1,2019-10-21 10:11:53,,e0206885-9125-41b3-a5d6-743ea84c7159
4,IT/SPO.IT0461A_20_GC-FID_2003-08-03_00:00:00,20,2018-04-14 09:00:00,2018-04-14 10:00:00,79.2,ug.m-3,hour,1,1,2019-10-21 10:11:53,,e0206885-9125-41b3-a5d6-743ea84c7159
5,IT/SPO.IT0461A_20_GC-FID_2003-08-03_00:00:00,20,2018-04-14 10:00:00,2018-04-14 11:00:00,58.3,ug.m-3,hour,1,1,2019-10-21 10:11:53,,e0206885-9125-41b3-a5d6-743ea84c7159
6,IT/SPO.IT0461A_20_GC-FID_2003-08-03_00:00:00,20,2018-04-14 11:00:00,2018-04-14 12:00:00,67.6,ug.m-3,hour,1,1,2019-10-21 10:11:53,,e0206885-9125-41b3-a5d6-743ea84c7159
7,IT/SPO.IT0461A_20_GC-FID_2003-08-03_00:00:00,20,2018-04-14 12:00:00,2018-04-14 13:00:00,36.0,ug.m-3,hour,1,1,2019-10-21 10:11:53,,e0206885-9125-41b3-a5d6-743ea84c7159
8,IT/SPO.IT0461A_20_GC-FID_2003-08-03_00:00:00,20,2018-04-14 14:00:00,2018-04-14 15:00:00,76.8,ug.m-3,hour,1,1,2019-10-21 10:11:53,,e0206885-9125-41b3-a5d6-743ea84c7159
9,IT/SPO.IT0461A_20_GC-FID_2003-08-03_00:00:00,20,2018-04-14 15:00:00,2018-04-14 16:00:00,75.7,ug.m-3,hour,1,1,2019-10-21 10:11:53,,e0206885-9125-41b3-a5d6-743ea84c7159


In [None]:
picchi_benzene.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1889 entries, 0 to 1888
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Samplingpoint     1889 non-null   object        
 1   Pollutant         1889 non-null   int32         
 2   Start             1889 non-null   datetime64[us]
 3   End               1889 non-null   datetime64[us]
 4   Value             1889 non-null   float64       
 5   Unit              1889 non-null   object        
 6   AggType           1889 non-null   object        
 7   Validity          1889 non-null   int32         
 8   Verification      1889 non-null   int32         
 9   ResultTime        1889 non-null   datetime64[us]
 10  DataCapture       0 non-null      float64       
 11  FkObservationLog  1889 non-null   object        
dtypes: datetime64[us](3), float64(2), int32(3), object(4)
memory usage: 155.1+ KB


In [None]:
picchi_benzene.shape

(1889, 12)

In [None]:
print(f"In {picchi_benzene['Samplingpoint'].nunique()} stazioni di monitoraggio si sono registrati picchi critici di benzene (>27 µg/m³)")

In 130 stazioni di monitoraggio si sono registrati picchi critici di benzene (>27 µg/m³)


Ordinando i valori di concentrazione per ordine decrescente (colonna *Value*), si notano valori fuori scala

In [None]:
picchi_benzene_sorted = picchi_benzene.sort_values(by='Value', ascending=False).reset_index(drop=True)
picchi_benzene_sorted.head(10)

Unnamed: 0,Samplingpoint,Pollutant,Start,End,Value,Unit,AggType,Validity,Verification,ResultTime,DataCapture,FkObservationLog
0,IT/SPO.IT1744A_20_GC-FID_2006-05-30_00:00:00,20,2022-09-21 17:00:00,2022-09-21 18:00:00,8237.76,ug.m-3,hour,-1,1,2023-09-05 13:24:27,,3fb31916-9192-4b82-8549-632c886fdb04
1,IT/SPO.IT2203A_20_GC-PID_2013-03-01_00:00:00,20,2022-04-12 08:00:00,2022-04-12 09:00:00,2387.06,ug.m-3,hour,-1,1,2023-09-05 13:24:27,,64df35d3-e9ff-4694-8edc-2a8f304d3ceb
2,IT/SPO.IT2203A_20_GC-PID_2013-03-01_00:00:00,20,2022-04-12 09:00:00,2022-04-12 10:00:00,1286.86,ug.m-3,hour,-1,1,2023-09-05 13:24:27,,64df35d3-e9ff-4694-8edc-2a8f304d3ceb
3,IT/SPO.IT1744A_20_GC-FID_2006-05-30_00:00:00,20,2022-09-21 18:00:00,2022-09-21 19:00:00,1275.89,ug.m-3,hour,-1,1,2023-09-05 13:24:27,,3fb31916-9192-4b82-8549-632c886fdb04
4,IT/SPO.IT2160A_20_GC-MS_2017-01-01_00:00:00,20,2018-02-01 19:00:00,2018-02-01 20:00:00,709.9,ug.m-3,hour,1,1,2019-11-03 13:56:45,,a9dce11f-f44f-443d-ab3a-97c4890c6a76
5,IT/SPO.IT2202A_20_GC-PID_2013-03-01_00:00:00,20,2022-11-08 10:00:00,2022-11-08 11:00:00,576.97,ug.m-3,hour,-1,1,2023-09-05 13:24:27,,1306a0af-69a5-4b1e-b296-16348743bf48
6,IT/SPO.IT1269A_20_GC-FID_2001-12-07_00:00:00,20,2021-07-31 07:00:00,2021-07-31 08:00:00,259.72101,ug.m-3,hour,1,1,2022-05-11 15:43:55,,137b648a-6cad-45b7-b163-77bbec0d24f3
7,IT/SPO.IT1269A_20_GC-FID_2001-12-07_00:00:00,20,2021-07-31 05:00:00,2021-07-31 06:00:00,248.95599,ug.m-3,hour,1,1,2022-05-11 15:43:55,,137b648a-6cad-45b7-b163-77bbec0d24f3
8,IT/SPO.IT1159A_20_GC-FID_1998-10-13_00:00:00,20,2014-08-25 15:00:00,2014-08-25 16:00:00,223.4,ug.m-3,hour,-99,1,2015-09-10 12:33:57,,0621cf7e-e981-4696-86ff-8d4723d7610c
9,IT/SPO.IT2203A_20_GC-PID_2013-03-01_00:00:00,20,2022-04-12 10:00:00,2022-04-12 11:00:00,219.82,ug.m-3,hour,-1,1,2023-09-05 13:24:27,,64df35d3-e9ff-4694-8edc-2a8f304d3ceb


In alcuni casi si tratta di errori: quando la colonna *Validity* ha valori a -99 e -1, i record non sono validi e vanno rimossi > vedi [metadati EEA](https://dd.eionet.europa.eu/vocabulary/aq/observationvalidity)


In [None]:
print("valori unici colonna Validity pre pulizia: ", picchi_benzene["Validity"].unique())
picchi_benzene_cleaned = picchi_benzene[(picchi_benzene["Validity"] != -99) & (picchi_benzene["Validity"] != -1)]
print("valori unici colonna Validity post pulizia: ", picchi_benzene_cleaned["Validity"].unique())

valori unici colonna Validity pre pulizia:  [  1   2 -99  -1]
valori unici colonna Validity post pulizia:  [1 2]


Il file dei picchi ripulito dai valori invalidi può quindi essere scaricato

In [None]:

picchi_benzene_cleaned.to_csv('picchi_benzene_cleaned.csv', index=False)

# Download the CSV file
from google.colab import files
files.download('picchi_benzene_cleaned.csv')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### 2. Mappatura dei picchi di benzene


Una volta costruito il file dei picchi, per la mappatura territoriale occorre lavorare sul dataset delle stazioni di monitoraggio

In [None]:
file_path_stazioni = "/content/IT_C6H6_stazioni_di_monitoraggio.csv"
stazioni = pd.read_csv(file_path_stazioni)
stazioni.head(3)

Unnamed: 0,Country,B-G Namespace,Year,Air Quality Network,Air Quality Network Name,Timezone,Air Quality Station EoI Code,Air Quality Station Nat Code,Air Quality Station Name,Sampling Point Id,...,Detection Limit,Detection Limit Unit,Documentation,QA Report,Duration,Duration Unit,Cadence,Cadence Unit,Source Data URL,Imported
0,Italy,IT.ISPRA.AQD,2023,NET.IT001A,Rete provinciale Provincia di Ancona,UTC+01,IT0456A,1104213,SENIGALLIA,SPO.IT0456A_20_GC-FID_2004-09-02_00:00:00,...,0.3,ug.m-3,Le misurazioni effettuate seguono le procedure...,in preparation,30,minute,30,minute,http://cdr.eionet.europa.eu/it/eu/aqd/d/envzpz...,05/09/2024 03:23:48
1,Italy,IT.ISPRA.AQD,2023,NET.IT001A,Rete provinciale Provincia di Ancona,UTC+01,IT1564A,1104215,ANCONA/VIA BOCCONI,SPO.IT1564A_20_GC-FID_2004-09-17_00:00:00,...,0.3,ug.m-3,Le misurazioni effettuate seguono le procedure...,in preparation,30,minute,30,minute,http://cdr.eionet.europa.eu/it/eu/aqd/d/envzpz...,05/09/2024 03:23:48
2,Italy,IT.ISPRA.AQD,2023,NET.IT001A,Rete provinciale Provincia di Ancona,UTC+01,IT1693A,1104221,Ancona - Porto,SPO.IT1693A_20_GC-FID_2007-01-01_00:00:00,...,0.3,ug.m-3,Le misurazioni effettuate seguono le procedure...,in preparation,30,minute,30,minute,http://cdr.eionet.europa.eu/it/eu/aqd/d/envzpz...,05/09/2024 03:23:48


In [None]:
stazioni.shape

(370, 70)

In [None]:
stazioni.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 370 entries, 0 to 369
Data columns (total 70 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Country                       370 non-null    object 
 1   B-G Namespace                 370 non-null    object 
 2   Year                          370 non-null    int64  
 3   Air Quality Network           370 non-null    object 
 4   Air Quality Network Name      370 non-null    object 
 5   Timezone                      370 non-null    object 
 6   Air Quality Station EoI Code  370 non-null    object 
 7   Air Quality Station Nat Code  370 non-null    int64  
 8   Air Quality Station Name      370 non-null    object 
 9   Sampling Point Id             370 non-null    object 
 10  Air Pollutant                 370 non-null    object 
 11  Longitude                     370 non-null    float64
 12  Latitude                      370 non-null    float64
 13  Altit

La colonna *Sampling point ID* del file delle stazioni corrisponde alla colonna *Samplingpoint* (ripulita dal prefisso "IT/") del dataset dei picchi, e indica l'identificativo di una stazione di monitoraggio dell'aria.

Attraverso il merge di queste colonne, è quindi possibile riportare le coordinate di latitudine e longitudine di ogni stazione nel dataset dei picchi (più altri dati di interesse sulle stazioni, per esempio il tipo). Infine, con un metodo di reverse geocoding si associano le coordinate delle stazioni alle rispettive località (indirizzo, comune, provincia, regione, ecc.).

In questo modo, si ottiene il dataset dei picchi orari di benzene arricchito di tutte le informazioni territoriali di interesse.

Per maggior semplicità, questi passaggi sono stati eseguiti su un foglio di calcolo. Il dataset finale è sempre disponibile su github: