# Date Selection Notebook
<center><img src=img/DDUST__Nero.png width="300"></center>

This notebook is used to plot the average `precipitation` and `temperature` over the Lombardy region for a given month of the year.

Ideally, it would be necessary to have data on the periods of manure spreading and the cadastral parcels affected by these practices.
Since these data are not available over the study area, it is necessary to identify criteria to choose the most likely periods when, on average, such practices are carried out. The choice of these periods was made precisely based on temperature, precipitations, and the main months in which fertilization in the Lombardy region takes place (e.g. March/April).

The data used in this notebook come from [meteorological stations operated by ARPA Lombardy](https://www.dati.lombardia.it/Ambiente/stazioni-meteo/pevf-9zqp).
In order to access to these data using the Socrata API, you must register to [Open Data Regione Lombardia](https://www.dati.lombardia.it/). The [sodapy](https://github.com/xmunoz/sodapy) library allows to request the data using this API.

Read the `Ground Sensor Variables Request .ipynb` notebook to have more information about accessing ARPA meteorological data.

## Import libraries

In [None]:
#Warnings
import warnings
warnings.filterwarnings('ignore')

#Main libraries
from sodapy import Socrata
import pandas as pd
import geopandas as gpd
import os
import zipfile
import requests
import json
import io
from scipy import stats
import numpy as np
from datetime import datetime
from datetime import timedelta
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Set current working directory
cwd = os.getcwd()

# Import functions defined for DDUST project:
from functions import DDUST_methods

# Key and app token for Socrata API
f = open('keys.json')
keys = json.load(f)

## Define time range
First, is possible to select a `month` and a `year`. For example, for March 2021 just replace the `month` variable with `3`:

In [None]:
month = 7  # e.g. 3 corresponds to March
year = 2021
start_date_dt = datetime(year, month, 1).date()
print(start_date_dt)

Then we can set a function that gets the last day of the selected month:

In [None]:
# Function to get the last day of the month
def last_day_month(test_date):            
    # getting next month
    # using replace to get to last day + offset
    # to reach next month
    nxt_mnth = test_date.replace(day=28) + timedelta(days=4)

    # subtracting the days from next month date to
    # get last date of current Month
    res = nxt_mnth - timedelta(days=nxt_mnth.day)

    return res

Now, we can visualize the start and the end days for the selected month:

In [None]:
# Start and end days of the month
start_date = str(start_date_dt)
end_date = str(last_day_month(start_date_dt))[0:10]
end_date_dt = datetime.strptime(end_date, '%Y-%m-%d') + timedelta(days=1)
print('The time range is '+start_date + ' / ' + end_date)

If you want to define a time range for the current month and use them into the API you can uncomment the following code and put the right dates:

In [None]:
# start_date='2022-08-01'
# end_date_dt='2022-08-10'

## Get ARPA meteorological sensors information

Let's now connect to ARPA [Socrata API](https://dev.socrata.com/), which allows to require all the information associated to the meteorological stations (e.g. name, sensor type, station id, location etc.).

<div class="alert alert-warning" role="alert">
<span>&#9888;</span>
<a id='warning'></a> Remember that is required to register to Open Data Regione Lombardia in order to get access to these data.
</div>
<div class="alert alert-warning" role="alert">
<span>&#9888;</span>
<a id='warning'></a> If you want to download data from API remember that only data from the current month are available. 
</div>

In [None]:
arpa_domain = "www.dati.lombardia.it"
m_st_descr = "nf78-nj6b"
client = Socrata(arpa_domain, app_token = keys['arpa_token'])
results = client.get_all(m_st_descr)
meteo_st_descr = pd.DataFrame(results)
meteo_st_descr["idsensore"] = meteo_st_descr["idsensore"].astype(int)

## Get ARPA meteorological sensors times series

It's now possible to effectively request the time series for each meteorological sensor.

It's important to remember that is possible to request meteorological data directly from the API for the current month only, while if data from previous years are needed it is necessary to download the `.zip` folder containing the time series in `.csv` format.

The following code will automatically request data from the Socrata API, by checking the year selected at the beginning of the notebook. The links for downloading the `.zip` folder for multiple years are available inside `DDUST_methods.py` within the `meteo_sensor` function.

Moreover, if the `.zip` folder of the selected year is already available inside the current working directory, it won't be downloaded again.

In [None]:
# If current year and month are selected use data from API (only current months data ara available from meteo sensors)
if int(year) == datetime.today().year:
    
    # Set domain and token 
    arpa_domain = "www.dati.lombardia.it"
    dati = "647i-nhxk" #change this depending on the dataset (check Open Data Lombardia datasets)
    client = Socrata(arpa_domain, app_token = keys['arpa_token']) #insert your arpa_token
    
    # Query the data
    date_query = "data > {} and data < {}".format('"'+ start_date + '"','"'+ str(end_date_dt) + '"') #query the data in the time range from the API
    results = client.get(dati, where=date_query, limit=5000000000000) #GET request to the API
    
    # Create the dataframe
    meteo_data = pd.DataFrame(results) #get the dataframe
    meteo_data.rename(columns={'IdSensore': 'idsensore','Data': 'data','idOperatore': 'idoperatore','Stato': 'stato','Valore': 'valore'}, inplace=True) #rename some columns
    meteo_data['data'] =  pd.to_datetime(meteo_data['data'], format='%Y/%m/%d %H:%M:%S')  #transform dates to datetime
    meteo_data = meteo_data.astype({"idsensore": int,"valore": float})  #define types
    
# If previous years download the corresponding year .zip file, extract the .csv file and filter the dates
elif int(year) < datetime.today().year: 
    filename = 'meteo_'+str(year)+'.zip'
    
    #if file does not exist then download it
    if not os.path.exists(os.path.join(filename)):
        csv_url = my_methods.meteo_sensor(str(year))
        r2 = requests.get(csv_url, allow_redirects=True)
        DL_zip = open(filename, 'wb').write(r2.content)
        print('Dowloaded zip file')
    
    print('Zip file exist')  #if file exist
    
    # Open the zip file
    archive = zipfile.ZipFile(filename, 'r')
    data = archive.open(str(year)+'.csv') 
    
    # Create dataframe
    meteo_data_df = pd.read_csv(data, dtype={"IdSensore": int,"Valore": float, "Stato": str, "idOperatore":str})
    meteo_data_df.rename(columns={'IdSensore': 'idsensore','Data': 'data','idOperatore': 'idoperatore','Stato': 'stato','Valore': 'valore'}, inplace=True)
    meteo_data_df['data'] =  pd.to_datetime(meteo_data_df['data'], format='%d/%m/%Y %H:%M:%S')
    
    # Mask the meteo_data_df in the right time range
    mask = (meteo_data_df.data >= start_date) & (meteo_data_df.data < str(end_date_dt))
    meteo_data = meteo_data_df.loc[mask]

## Meteorological sensors data processing

Now, that the we have all the meteorological data inside the given time range, is possible to remove unused columns and NaN (with value -9999):

In [None]:
meteo_data = meteo_data.drop(columns=['stato', 'idoperatore'])
meteo_data = meteo_data[meteo_data.valore.astype(float) != -9999]

Select the `Precipitazione` (precipitation) and `Temperatura` (temperature) columns and merge the sensor information with the time series:

In [None]:
m_sensor_sel = ['Precipitazione','Temperatura']  #select Temperature and Precipitation
meteo_table = pd.merge(meteo_data, meteo_st_descr, on = 'idsensore')
meteo_table['tipologia'].astype(str)
meteo_table = meteo_table[meteo_table['tipologia'].isin(m_sensor_sel)]

Extract temperature and precipitation in two separate dataframes:

In [None]:
temp_st = meteo_table.loc[meteo_table['tipologia'] == 'Temperatura']
prec_st = meteo_table.loc[meteo_table['tipologia'] == 'Precipitazione']

Now, we have the time series with alla the associated sensor information. 

It is also possible to remove outliers that can create issues to the selected data.

For precipitations values with higher than 100 mm/h are removed. For temperature the Z-Score is calculated and the values above a given threshold (e.g. 4) are removed.

In [None]:
# Remove Outliers
# For precipitation values less than 100 mm/h
prec_st = prec_st[prec_st.valore < 100]

# For temperature using a Z-Score with high threshold
threshold = 4
temp_st['zscore'] = np.abs(stats.zscore(temp_st['valore'], nan_policy='propagate'))
temp_st = temp_st[temp_st.zscore < threshold]

Change the `data` variable type (i.e. the date) to `datetime` in order to obtain a time series:

In [None]:
prec_st['data'] = pd.to_datetime(prec_st['data'], format='%Y-%m-%d hh:mm:ss')
temp_st['data'] = pd.to_datetime(temp_st['data'], format='%Y-%m-%d hh:mm:ss')

Calculate the mean `precipitation` and `temperature` for each day using the `group_by` function an setting a daily frequency on the date variable.

In [None]:
# Precipitation
prec_mean = prec_st.groupby(pd.Grouper(freq='D', key='data')).mean()
prec_mean = prec_mean.drop(columns=['idsensore'])
prec_mean['data'] = prec_mean.index

#Temperature
temp_mean = temp_st.groupby(pd.Grouper(freq='D', key='data')).mean()
temp_mean = temp_mean.drop(columns=['idsensore'])
temp_mean['data'] = temp_mean.index

## Plot mean temperature and precipitation for all ARPA sensor over Lombardy region

In [None]:
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add traces
fig.add_trace(
    go.Scatter(x=prec_mean.index, y=prec_mean.valore, name="Daily mean precipitation"),
    secondary_y=False)

fig.add_trace(
    go.Scatter(x=temp_mean.index, y=temp_mean.valore, name="Daily mean temperature"),
    secondary_y=True)

# Add figure title
fig.update_layout(
    title_text="Daily mean temperature and precipitation from ARPA ground sensors - 1 Month time range - " +str(start_date_dt.strftime("%B %Y")))

# Set x-axis title
fig.update_xaxes(title_text="Date")

# Set y-axes titles
fig.update_yaxes(title_text="<b>Mean precipitation (mm/h)</b>", secondary_y=False)
fig.update_yaxes(title_text="<b>Mean temperature (°C)</b>", secondary_y=True)

fig.show()