# TP : explore 2D weather model data and ground station data

The aim of this notebook is to indicate how to explore and overlap weather model data and ground station data. You can find [here](https://github.com/meteofrance/meteonet/tree/master/course_slides) the slides associated to that tutorial.

# Note

<font size="4.5">To use <span style="color:blue">**Cartopy**</span>, a library to plot data with basemaps (see cells below), it is necessary to <span style="color:red">activate the internet connection</span> of that notebook (in edit mode, you can find on the right column, in the *Settings* section, a row entitled *Internet*, put the slider bar on **on**).  </font>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import datetime as dt
from datetime import timedelta  

import cartopy.crs as ccrs
import cartopy.feature as cfeature

import matplotlib.gridspec as gridspec
from scipy.interpolate import griddata


# Input data files are available in the "../input/" directory.
# Any results you write to the current directory are saved as output.

# I - Open & explore the data
## I.a - Ground station data

Hundreds of observation stations are built throughout the French territory and fitted with several weather sensors (temperature, pressure, wind...). 
Each parameter is measured every 6 minutes and each file contains 1 year of data for the geographical area 'NW' for North-West of France.  

For more information about data, cf [documentation](https://meteofrance.github.io/meteonet/english/data/ground-observations/).

## Data overview

### Metadata parameters

* **number_sta** : ground station number

* **lat** : latitude into decimal degrees

* **lon** : longitude into decimal degrees

* **height_sta** : station height into meters

The date parameter is a datetime object with the format 'YYYY-MM-DD HH:mm:ss'.

### Meteorological parameters

* **dd** : wind direction in degrees

* **ff** : wind speed in m.s<sup>-1</sup>

* **precip** : precipitation during the reporting period in kg.m<sup>2</sup>

* **hu** : humidity in %

* **td** : dew point temperature** in Kelvin

* **t** : temperature in Kelvin

* **psl** : pressure reduced to sea level in Pa


** : temperature for which it is needed to refresh an air volume at constant pressure and humidity to become it saturated (i.e. condensation temperature)

Select the year you want to study:

In [None]:
year = '2016'
fname = '/kaggle/input/meteonet/NW_Ground_Stations/NW_Ground_Stations/NW_Ground_Stations_'+year+".csv"
df = pd.read_csv(fname,parse_dates=[4],infer_datetime_format=True)

1. Let's have a look at the dataframe!

In [None]:
display(df.head())
display(df.tail())

## Plot a given parameter for a given date 

Let's start by selecting a date and displaying a subpart of the dataframe.

In [None]:
date = '2016-01-01T06:00:00'
d_sub = df[df['date'] == date]

display(d_sub.head())
display(d_sub.tail())

Now, let's plot the data!

### Simple scatter plot

In [None]:
param = 't'
plt.scatter(d_sub['lon'], d_sub['lat'], c=d_sub[param], cmap='jet')
plt.colorbar()
plt.title(date+' - param '+param)

plt.show()

### Plot with Cartopy

In [None]:
# Coordinates of studied area boundaries (in °N and °E)
lllat = 46.25  #lower left latitude
urlat = 51.896  #upper right latitude
lllon = -5.842  #lower left longitude
urlon = 2  #upper right longitude
extent = [lllon, urlon, lllat, urlat]

fig = plt.figure(figsize=(9,5))

# Select projection
ax = plt.axes(projection=ccrs.PlateCarree())

# Plot the data
plt.scatter(d_sub['lon'], d_sub['lat'], c=d_sub[param], cmap='jet')  # Plot
plt.colorbar()
plt.title(date+' - param '+param)


# Add coastlines and borders
ax.coastlines(resolution='50m', linewidth=1)
ax.add_feature(cfeature.BORDERS.with_scale('50m'))

# Adjust the plot to the area we defined 
#/!\# this line causes a bug of the kaggle notebook and clears all the memory. That is why this line is commented and so
# the plot is not completely adjusted to the data
# Show only the area we defined
#ax.set_extent(extent)

plt.show()

### Now, let's play with the parameters!

In [None]:
def choose_parameters_and_display(date,param,df):
    #select the data corresponding to the selected date
    d_sub = df[df['date'] == date]
    
    # Coordinates of studied area boundaries (in °N and °E)
    lllat = 46.25  #lower left latitude
    urlat = 51.896  #upper right latitude
    lllon = -5.842  #lower left longitude
    urlon = 2  #upper right longitude
    extent = [lllon, urlon, lllat, urlat]

    fig = plt.figure(figsize=(9,5))

    # Select projection
    ax = plt.axes(projection=ccrs.PlateCarree())

    # Plot the data
    plt.scatter(d_sub['lon'], d_sub['lat'], c=d_sub[param], cmap='jet')  # Plot
    plt.colorbar()
    plt.title(date+' - param '+param)

    # Add coastlines and borders
    ax.coastlines(resolution='50m', linewidth=1)
    ax.add_feature(cfeature.BORDERS.with_scale('50m'))

    # Adjust the plot to the area we defined 
    #/!\# this line causes a bug of the kaggle notebook and clears all the memory. That is why this line is commented and so
    # the plot is not completely adjusted to the data
    # Show only the area we defined
    #ax.set_extent(extent)

    plt.show()
    return d_sub

In [None]:
date = '2016-12-31T12:24:00'
param = 'td'
data = choose_parameters_and_display(date,param,df)

### Exercice

1. Get the temperature value for the station_id *14066001* at 02/02/2016 06h00.
2. Get the wind speed value for the station_id *86137003* at 10/10/2016 12h06.
3. Get the humidity value for the lat 48.527 and lon 1.995 at 25/12/2016 15h12.

In [None]:
#question 1

date = '2016-02-02T06:00:00'
station_id = 14066001
param = 't'

Execute the following cell to get the correction (it is necessary to execute twice : one about loading and one about executing)

In [None]:
#%load /kaggle/usr/lib/tp_solutions_answer_obs_1/tp_solutions_answer_obs_1

In [None]:
#to execute the correction
#d_sub = obs_answer_1(year,date,station_id,param)

In [None]:
#question 2 

date = '2016-10-10T12:06:00'
station_id = 86137003
param = 'ff'

In [None]:
#to execute the correction
#d_sub = obs_answer_1(year,date,station_id,param)

In [None]:
#question 3 

date = '2016-12-25T15:12:00'
lat = 48.527
lon = 1.995
param = 'hu'

Execute the following cell to get the correction (it is necessary to execute twice : one about loading and one about executing)

In [None]:
# %load /kaggle/usr/lib/tp_solutions_answer_obs_2/tp_solutions_answer_obs_2

In [None]:
#to execute the correction
#d_sub = obs_answer_2(year,date,station_id,param)

## I.b - 2D weather model data

**/!\ only February 2016 is available for now. In the near future, all the 2D and 3D weather model data will be available for 2016, 2017 and 2018 at netcdf format on Kaggle. Meanwhile, you can already find all the data at GRIB format on this server : https://meteonet.umr-cnrm.fr/dataset/. So a conversion step (from GRIB to netcdf) has to be performed to put the rest on Kaggle.**

For more information about data, cf [documentation](https://meteofrance.github.io/meteonet/english/data/weather-models/).

There are 2 weather models:
* a fine-mesh french weather model called 'AROME' (spatial resolution of 0.025°)
* a large-mesh french weather model called 'ARPEGE' (spatial resolution of 0.1°)

Per model and day, for the geographic zone 'NW' (for North-West of France), you have the model run of 00h with range forecasts from 00h to 24h. The time step is 1h.

The 2D data are stored in 4 different netCDF files, according to the vertical level:

* at 2m (*2m* in the file name) : temperature (in K), dew point temperature** (in K) and relative humidity (in %)
* at 10m (*10m* in the file name): wind speed (in m.s<sup>-1</sup>), wind direction (in degrees), U and V wind components*** (in m.s<sup>-1</sup>)
* at the sea level (*P_sea_level* in the file name) : mean sea level pressure (in Pa)
* at the ground level (*PRECIP* in the file name)  : total precipitation (in kg m<sup>-2</sup> which is equivalent to mm) since the beginning of the model run 

** : temperature for which it is needed to refresh an air volume at constant pressure and humidity to become it saturated (i.e. condensation temperature)

*** : horizontal wind speed components, U : from west to east and V : from south to north. 

**/!\ : about the total precipitation parameter, the range forecast begins to 1h and not 0h as the other parameters. It is planned in a future version to modify that parameter in order to have the total precipitation between two time steps instead.**
/!\ : Some files in the model files are incomplete/corrupted. It comes from the database of our corporation. There are archiving anomalies. We are working on a new version of MeteoNet and will take into account these anomalies. Meanwhile, you can detect these anomalies by checking the file sizes. For a given file category (zone, level, model type), the files must have exactly the same size. So if some files are smaller than the other one, then these files have anomalies. This has been reported in kaggle in this discussion https://www.kaggle.com/katerpillar/meteonet/discussion/157206 and in Github (https://github.com/meteofrance/meteonet/issues/14).

Select the model, date and level you want to study :

In [None]:
model = 'arome' #weather model (arome or arpege)
level = '2m'      #vertical level (2m, 10m, P_sea_level or PRECIP)
#date /!\ only available for February 2016!
date = dt.datetime(2016, 2, 14,0,0) #/!\ you can not modify the hour (always 00h) -> 1 possible run date only

directory = '/kaggle/input/meteonet/NW_weather_models_2D_parameters_' + str(date.year) + str(date.month).zfill(2) + '/' + str(date.year) + str(date.month).zfill(2) + '/'
fname = directory + f'{model.upper()}/{level}/{model}_{level}_NW_{date.year}{str(date.month).zfill(2)}{str(date.day).zfill(2)}000000.nc'
data = xr.open_dataset(fname)  
print(data)

## Quick field visualization
The following comand shows the weather forecast at different steps, at the chosen run date and parameter.
For each file, the run chosed is the midnight run. The temperature at step 12 (figure 3) is the midday temperature. 
You can see that this temperature is higher than midnight (fig 1 and 4) or at 6AM (figure 2)

In [None]:
param = 't2m'
data.isel(step=[0, 6, 12, 23])[param].plot(x='longitude',
                                           y='latitude',
                                           col='step',
                                           col_wrap=2)

Get the information about coordinates (latitude and longitude):

In [None]:
coord = 'longitude'
print(data[coord])

In [None]:
data[coord].units

In [None]:
data[coord].values[0:10]

Get the information about the run date and the different range forecasts:

In [None]:
run_date = data['time']
#run_date.values     #get the values
print(run_date)

In [None]:
range_forecasts_dates = data['valid_time']
print(range_forecasts_dates)

In [None]:
#if you want information about vertical level
# if (level =='2m' or level == '10m'):
#     level_name = 'heightAboveGround'
# elif (level =='P_sea_level'):
#     level_name = 'meanSea'
# else:
#     level_name = 'surface'
# info_level = data[level_name]
# print(info_level)

Get the information about one parameter: 
the parameter names in the file are indicated in the field *Data variables* (cf print(data) above)

In [None]:
d = data[param]     #param : parameter name defined at the beginning of the Notebook 
d_vals=d.values     #get the values
###examples to get the information from attributes
#d.units                      #unit
#d.long_name                      #long name
print(d)

The structure of the parameter (3 dimensions):
* number of steps or range forecasts
* number of points in latitude
* number of points in longitude  

In [None]:
print(d.dims)
print(d_vals.shape)

About the parameter names, here are the correspondences between the "comprehensive" names and the names in the files:

* temperature at 2m : t2m
* dew point temperature at 2m : d2m
* relative humidity at 2m : r
* wind speed at 10m : ws
* wind direction at 10m : p3031
* U wind component at 10m : u10
* V wind component at 10m : v10
* mean sea level pressure : msl
* total precipitation since the beginning of the model run : tp  -> **/!\ : about the total precipitation parameter, the range forecast begins to 1h and not 0h as the other parameters.**

### Exercice

1. Get the temperature value for the run date 10/02/2016 00h, arome model, lat 51.696 ,lon 0.008 (nearest point method) and step 3 (10/02/2016 3h)
2. Get the wind speed value for the run date 01/02/2016 00h, arpege model, lat 48.896 ,lon 0.558 (nearest point method)  and step 6 (01/02/2016 06h)
3. Get the precipitation value for the run date 20/02/2016 00h, arpege model, lat 47.496 ,lon 1.858 (nearest point method)  and step 19 (20/02/2016 20h) -> because there is no data for 00h about precipitation

In [None]:
#question 1

run_date = '2016-02-10T00:00:00'
param = 't2m'  #cf the cell above to know the parameter names in the observation file 
model = 'arome' #weather model (arome or arpege)
lat = 51.696
lon = 0.008
step = 3

## Help

* to select a data subset by parameter value : **data.sel(parameter_name=parameter_value,method='nearest')** -> nearest for nearest point (ex lat/lon)
* to select a data subset by value parameter index : **data.isel(parameter_name=value_parameter_index,method='nearest')** -> nearest for nearest point (ex lat/lon)

Execute the following cell to get the correction (it is necessary to execute twice : one about loading and one about executing)

In [None]:
# %load /kaggle/usr/lib/tp_solutions_answer_mod/tp_solutions_answer_mod

In [None]:
#to execute the correction
#result1 = open_and_select(run_date,param,model,lat,lon,step)

In [None]:
#question 2

run_date = '2016-02-01T00:00:00'
param = 'ws'  #cf the cell above to know the parameter names in the observation file 
model = 'arpege' #weather model (arome or arpege)
lat = 48.896
lon = 0.558
step = 6

#to execute the correction
#result2 = open_and_select(run_date,param,model,lat,lon,step)

In [None]:
#question 3 

run_date = '2016-02-20T00:00:00'
param = 'tp'  #cf the cell above to know the parameter names in the observation file 
model = 'arpege' #weather model (arome or arpege)
lat = 47.496
lon = 1.858
step = 19

#to execute the correction
#result3 = open_and_select(run_date,param,model,lat,lon,step)

## Plotting data with Cartopy

Plot the parameter values for 1 given time step with Cartopy:

In [None]:
#index for the studied time step
step = 0  

In [None]:
# Coordinates of studied area boundaries (in °N and °E)
lllat = 46.25  #lower left latitude
urlat = 51.896  #upper right latitude
lllon = -5.842  #lower left longitude
urlon = 2  #upper right longitude
extent = [lllon, urlon, lllat, urlat]

fig=plt.figure(figsize=(9,10))

# Select projection
ax = plt.axes(projection=ccrs.PlateCarree())

#plot the data and the background map (coastlines and borders)
img = ax.imshow(d_vals[step,:,:], interpolation='none', origin='upper', extent=extent)
ax.coastlines(resolution='50m', linewidth=1)
ax.add_feature(cfeature.BORDERS.with_scale('50m'))


plt.colorbar(img, orientation= 'horizontal').set_label(d.long_name+ ' (in '+d.units+ ')')
plt.title(model +" model - "+str(d['valid_time'].values[step])+" - " +"NW zone")
plt.show()

### Now, let's play with the parameters!

In [None]:
def choose_parameters_and_display(model,run_date,step,param):
    
    #open the corresponding file according to the chosen parameter    
    if param == 't2m' or param == 'd2m' or param == 'r':
        level = '2m'
    elif param == 'ws' or param =='p3031' or param == 'u10' or param == 'v10':
        level = '10m'
    elif param == 'msl':
        level = 'P_sea_level'
    else:
        level = 'PRECIP'
    
    directory = '/kaggle/input/meteonet/NW_weather_models_2D_parameters_' + str(run_date.year) + str(run_date.month).zfill(2) + '/' + str(run_date.year) + str(run_date.month).zfill(2) + '/'
    fname = directory + f'{model.upper()}/{level}/{model}_{level}_NW_{run_date.year}{str(run_date.month).zfill(2)}{str(run_date.day).zfill(2)}000000.nc'
    sub_data = xr.open_dataset(fname)      
    
    # Coordinates of studied area boundaries (in °N and °E)
    lllat = 46.25  #lower left latitude
    urlat = 51.896  #upper right latitude
    lllon = -5.842  #lower left longitude
    urlon = 2  #upper right longitude
    extent = [lllon, urlon, lllat, urlat]

    fig=plt.figure(figsize=(9,10))

    # Select projection
    ax = plt.axes(projection=ccrs.PlateCarree())

    #plot the data and the background map (coastlines and borders)
    img = ax.imshow(sub_data[param].values[step,:,:], interpolation='none', origin='upper', extent=extent)
    ax.coastlines(resolution='50m', linewidth=1)
    ax.add_feature(cfeature.BORDERS.with_scale('50m'))


    plt.colorbar(img, orientation= 'horizontal').set_label(sub_data[param].long_name+ ' (in '+sub_data[param].units+ ')')
    plt.title(model +" model - "+str(sub_data['valid_time'].values[step])+" - " +"NW zone")
    plt.show()
    
    return sub_data

About the parameter names, here are the correspondences between the "comprehensive" names and the names in the files:

* temperature at 2m : t2m
* dew point temperature at 2m : d2m
* relative humidity at 2m : r
* wind speed at 10m : ws
* wind direction at 10m : p3031
* U wind component at 10m : u10
* V wind component at 10m : v10
* mean sea level pressure : msl
* total precipitation since the beginning of the model run : tp  -> **/!\ : about the total precipitation parameter, the range forecast begins to 1h and not 0h as the other parameters.**

In [None]:
model = 'arpege' #weather model (arome or arpege)
#run date /!\ only available for February 2016!
run_date = dt.datetime(2016, 2,10,0,0) #/!\ you can not modify the hour (always 00h) -> 1 possible run date only
param = 'ws'    #parameter name in the file (cf correspondences in the cell below)
step = 3   #index of chosen time step (from 0 to 24 and about precipitation, from 0 to 23)

sub_data = choose_parameters_and_display(model,run_date,step,param)

# III - Superimpose the data for a given position and a given time interval

We have here 2 data categories :
* data projected on a grid : weather models
* data on points with a given latitude and longitude : observations from ground stations

So, for a given ground station and a given time interval, we will interpolate **spatially** model data to the position of the chosen ground station to be able to compare data on the same point.

First, let's choose our parameter values!

### Note

Here are the parameter names in the observations files:

* **number_sta** : ground station number
* **lat** : latitude into decimal degrees
* **lon** : longitude into decimal degrees
* **height_sta** : station height into meters
* **date** : datetime object with the format 'YYYY-MM-DD HH:mm:ss'.
* **dd** : wind direction in degrees
* **ff** : wind speed in m.s<sup>-1</sup>
* **precip** : precipitation during the reporting period in kg.m<sup>2</sup>
* **hu** : humidity in %
* **td** : dew point temperature** in Kelvin
* **t** : temperature in Kelvin
* **psl** : pressure reduced to sea level in Pa

In [None]:
#date
date = '2016-02-13' 

#observation
param_obs = 'dd'  #cf the cell below to know the parameter names in the observation file 

#model
model = 'arpege' #weather model (arome or arpege)
MODEL = 'ARPEGE' #weather model (AROME or ARPEGE)
param_mod = 'p3031'   #cf correspondences in the cell below

#algorithm about interpolation
algo = 'linear' #or 'nearest' for nearest neighbors

### Get the observation values
* get the values for the chosen time period (from d day 00h to 24h included)
* choose one station : **/!\ the stations do not have necessarily the same time coverage!**
* get the latitude and longitude of the chosen station (necessary to perform the interpolation of forecast data)
* get all the hourly data (because the forecast data have a hourly step):resampling...etc

In [None]:
#first part : open and filtering on the day
year = date[0:4]
fname = '/kaggle/input/meteonet/NW_Ground_Stations/NW_Ground_Stations/NW_Ground_Stations_'+year+".csv"

def open_and_date_filtering(year,fname,date):    
    #open the observation data
    #df = pd.read_csv(fname,parse_dates=[4],infer_datetime_format=True) #execution time ~1 min 

    #filtering on the date 
    study_date = pd.Timestamp(date)  #study date
    d_sub = df[(df['date'] >= study_date) & (df['date'] <= study_date + timedelta(days=1))]
    d_sub = d_sub.set_index('date')
    display(d_sub.head())
    display(d_sub.tail())
    return(d_sub)

d_sub = open_and_date_filtering(year,fname,date)

In [None]:
#second part : choose the station_id, get the lat/lon of the station and resample the data to the hourly step!
station_id = 86137003

def station_lat_lon_resample(station_id,d_sub):
    #filtering on the station_id
    d_sub = d_sub[d_sub['number_sta'] == station_id]

    #get the lat/lon values of the station 
    lat_sta = d_sub['lat'][0]
    lon_sta = d_sub['lon'][0]

    #resample the 6 min data to get hourly data (by using the mean on one hour)
    d_sub = d_sub[param_obs].resample('H').mean()
    print('station_id:',station_id)
    print('lat/lon:',lat_sta,'/',lon_sta)
    print('weather parameter',param_obs)
    display(d_sub)
    return(d_sub,station_id,lat_sta,lon_sta)

d_sub, station_id, lat_sta, lon_sta = station_lat_lon_resample(station_id,d_sub)

### Get the forecast values
* **/!\ the data are only available for February 2016**
* get the values for the chosen time period (from d day 00h to 24h included) -> all the necessary data are in one single file corresponding to the chosen run date
* get the latitude and longitude of the forecast data (necessary to perform the interpolation of forecast data)

Steps proposal:
* open the corresponding file according to the chosen parameter
* get the forecast values and the lat/lon of the grid data

In [None]:
def open_get_values(param_mod,date):
    #open the corresponding file according to the chosen parameter
    if param_mod == 't2m' or param_mod == 'd2m' or param_mod == 'r':
        level = '2m'
    elif param_mod == 'ws' or param_mod =='p3031' or param_mod == 'u10' or param_mod == 'v10':
        level = '10m'
    elif param_mod == 'msl':
        level = 'P_sea_level'
    else:
        level = 'PRECIP'

    year = date[0:4]
    month = date[5:7]
    day = date[8:10]

    directory = '/kaggle/input/meteonet/NW_weather_models_2D_parameters_' + year + month + '/' + year + month + '/'
    fname = directory + f'{MODEL}/{level}/{model}_{level}_NW_{year}{month}{day}000000.nc'
    dm = xr.open_dataset(fname)
    print('dataset overview:',dm)

    #get the forecast values and the lat/lon of the grid data
    grid_values = dm[param_mod].values
    grid_lat = dm['latitude'].values
    grid_lon = dm['longitude'].values
    print('shape of the forecast values array:',grid_values.shape)
    print('ten first latitudes:',grid_lat[0:10])
    print('ten first longitudes',grid_lon[0:10])
    return(grid_values,grid_lat,grid_lon)

In [None]:
grid_values,grid_lat,grid_lon = open_get_values(param_mod,date)

### Perform the interpolation
For each time step, we interpolate the forecast data on the chosen station position 

In [None]:
from scipy.interpolate import griddata

def interpolation(grid_values,grid_lat,grid_lon,lat_sta,lon_sta):
    #initialization
    model_values = []
    grid_on_points = np.empty(grid_values.shape[0], dtype = object) 
    
    #loop per time step 
    for step in range(0,grid_values.shape[0]):
        latlon_grid = []
        val_grid = []
        latlon_obs = []

        #grid data preprocessing
        for i in range(0,grid_lat.shape[0]):        
            for j in range(0,grid_lon.shape[0]):
                #put coordinates (lat,lon) in list of tuples
                latlon_grid.append([grid_lat[i],grid_lon[j]])
                #put grid values into a list
                val_grid.append(grid_values[step,i,j])

        grid_latlon = np.array(latlon_grid)
        grid_val2 = np.array(val_grid)

        #ground station position (lat/lon) preprocessing
        latlon_obs.append([lat_sta,lon_sta])
        latlon_obs = np.array(latlon_obs)

        #compute the interpolation
        grid_on_points[step] = griddata(grid_latlon ,grid_val2, latlon_obs,  method=algo)[0]
        print('step ',step, ' OK!')
    return(grid_on_points)

In [None]:
grid_on_points = interpolation(grid_values,grid_lat,grid_lon,lat_sta,lon_sta)

Rearrange the data to plot them!
* put the interpolated forecasted data into a pandas series with the date in index 

In [None]:
obs = d_sub

In [None]:
def preproc_output(obs,grid_on_points,param_mod):
    mod = pd.Series(grid_on_points,index=obs.index)
    print('interpolated forecasted data, param ',param_mod)
    return (mod)

In [None]:
mod = preproc_output(obs,grid_on_points,param_mod)
display(mod)

Plot the results on a plot!
* Let's plot the 2 curves to compare them!

In [None]:
def plots(obs,mod,MODEL,param_obs,lat_sta,lon_sta):
    plt.plot(obs, label ='Observation')
    plt.plot(mod, label = MODEL +' forecast')
    plt.title('Parameter '+param_obs+' / lat='+str(lat_sta)+' and lon='+str(lon_sta))
    plt.xlabel('Time')
    plt.ylabel(param_obs)
    plt.legend()

In [None]:
plots(obs,mod,MODEL,param_obs,lat_sta,lon_sta)

# Annex - Superimpose the data for a given date 

For a given date, we will interpolate **spatially** model data to positions of ground stations to be able to compare data on the same points for the same weather parameter!

First, let's choose our parameter values!

In [None]:
#observation
date_obs = '2016-02-10T10:00:00' 
param_obs = 'ff'

#model
model = 'arpege' #weather model (arome or arpege)
MODEL = 'ARPEGE' #weather model (AROME or ARPEGE)
date_mod = dt.datetime(2016, 2,10,10,0) # Day example 
param_mod = 'ws'

#algorithm about interpolation
algo = 'linear' #or 'nearest' for nearest neighbors

Get the observation values

In [None]:
fname = '/kaggle/input/meteonet/NW_Ground_Stations/NW_Ground_Stations/NW_Ground_Stations_'+date_obs[0:4]+".csv"
#df = pd.read_csv(fname,parse_dates=[4],infer_datetime_format=True)
study_date = pd.Timestamp(date_obs)  #study date
d_sub = df[df['date'] == study_date]
print('observation data',d_sub)

Get the forecast values

In [None]:
directory = '/kaggle/input/meteonet/NW_weather_models_2D_parameters_' + str(date_mod.year) + str(date_mod.month).zfill(2) + '/' + str(date_mod.year) + str(date_mod.month).zfill(2) + '/'

if param_mod == 't2m' or param_mod == 'd2m' or param_mod == 'r':
    level = '2m'
elif param_mod == 'ws' or param_mod =='p3031' or param_mod == 'u10' or param_mod == 'v10':
    level = '10m'
elif param_mod == 'msl':
    level = 'P_sea_level'
else:
    level = 'PRECIP'

fname = directory + f'{model.upper()}/{level}/{model}_{level}_NW_{date_mod.year}{str(date_mod.month).zfill(2)}{str(date_mod.day).zfill(2)}000000.nc'

mod = xr.open_dataset(fname)
grid_lat = mod['latitude'].values
grid_lon = mod['longitude'].values
grid_val = mod[param_mod].values[date_mod.hour,:,:]
print('latitudes on the model grid:',grid_lat)
print('longitudes on the model grid:',grid_lon)
print('forecast values:',grid_val)

Interpolation function:

In [None]:
def interpolate_grid_on_points(grid_lat,grid_lon,grid_val,data_obs,algo):
    
    #initialisation
    latlon_grid = []
    latlon_obs = []
    val_grid = []
    
    #grid data preprocessing
    for i in range(0,grid_lat.shape[0]):        
        for j in range(0,grid_lon.shape[0]):
            #put coordinates (lat,lon) in list of tuples
            latlon_grid.append([round(grid_lat[i],3),round(grid_lon[j],3)])
            #put grid values into a list
            val_grid.append(grid_val[i,j])
    grid_latlon = np.array(latlon_grid)
    grid_val2 = np.array(val_grid)

    #obs data preprocessing : put coordinates (lat,lon) in list of tuples
    for i in range(0,data_obs.shape[0]):
        latlon_obs.append([data_obs['lat'].values[i],data_obs['lon'].values[i]])
    latlon_obs = np.array(latlon_obs)
    
    #interpolation
    grid_val_on_points=griddata(grid_latlon ,grid_val2, latlon_obs,  method=algo)
    return latlon_obs,grid_val_on_points

Perform the interpolation:

In [None]:
latlon_obs,grid_val_on_points = interpolate_grid_on_points(grid_lat,grid_lon,grid_val,d_sub,algo)
print('10 first lat/lon couple per station:',latlon_obs[0:10,:])
print('associated forecast values interpolated on ground station points:',grid_val_on_points[0:10])

## Note : if we want to use another interpolation method, let's test yours!

### Now, plot the different data!

In [None]:
#Plot the different data
fig=plt.figure()
gs = gridspec.GridSpec(4, 4)

#Min and max boundaries about colorbar
vmin_obs = d_sub[param_obs].min()
vmax_obs = d_sub[param_obs].max()
vmin_model_ori= grid_val.min()
vmax_model_ori= grid_val.max()
vmin_model_inter=grid_val_on_points.min()
vmax_model_inter=grid_val_on_points.max()
vmin=np.min([vmin_obs,vmin_model_ori,vmin_model_inter])
vmax=np.max([vmax_obs,vmax_model_ori,vmax_model_inter])

#observation data
ax1 = plt.subplot(gs[:2, :2])
plt.tight_layout(pad=3.0)
im=ax1.scatter(d_sub['lon'], d_sub['lat'], c=d_sub[param_obs], cmap='jet',vmin=vmin,vmax=vmax)
ax1.set_title('Observation data')

#weather model data (original grid)
ax2 = plt.subplot(gs[:2, 2:])
ax2.pcolor(grid_lon,grid_lat,grid_val,cmap="jet",vmin=vmin,vmax=vmax)
ax2.set_title('Weather model data (original grid)')

#weather model data (interpolated on observation points)
ax3 = plt.subplot(gs[2:4, 1:3])
im3=ax3.scatter(latlon_obs[:,1], latlon_obs[:,0], c=grid_val_on_points, cmap='jet',vmin=vmin,vmax=vmax)
ax3.set_title('Weather model data (interpolated on observation points)')

fig.colorbar(im,ax=[ax2,ax3]).set_label(mod[param_mod].long_name+ ' (in '+mod[param_mod].units+ ')')
plt.show()

### Now, let's play with the parameters!

In [None]:
def choose_and_display(model,date_mod,date_obs,param_obs,param_mod,algo,d_sub):
    
    #get the observation values 
    study_date = pd.Timestamp(date_obs)  #study date
    d_sub = df[df['date'] == study_date]
    
    #get the model data 
    directory = '/kaggle/input/meteonet/NW_weather_models_2D_parameters_' + str(date_mod.year) + str(date_mod.month).zfill(2) + '/' + str(date_mod.year) + str(date_mod.month).zfill(2) + '/'

    if param_mod == 't2m' or param_mod == 'd2m' or param_mod == 'r':
        level = '2m'
    elif param_mod == 'ws' or param_mod =='p3031' or param_mod == 'u10' or param_mod == 'v10':
        level = '10m'
    elif param_mod == 'msl':
        level = 'P_sea_level'
    else:
        level = 'PRECIP'

    fname = directory + f'{model.upper()}/{level}/{model}_{level}_NW_{date_mod.year}{str(date_mod.month).zfill(2)}{str(date_mod.day).zfill(2)}000000.nc'

    mod = xr.open_dataset(fname)
    grid_lat = mod['latitude'].values
    grid_lon = mod['longitude'].values
    grid_val = mod[param_mod].values[date_mod.hour,:,:]
    
    #perform the interpolation
    latlon_obs,grid_val_on_points = interpolate_grid_on_points(grid_lat,grid_lon,grid_val,d_sub,algo)
    
    #Plot the different data
    fig=plt.figure()
    gs = gridspec.GridSpec(4, 4)

    #Min and max boundaries about colorbar
    vmin_obs = d_sub[param_obs].min()
    vmax_obs = d_sub[param_obs].max()
    vmin_model_ori= grid_val.min()
    vmax_model_ori= grid_val.max()
    vmin_model_inter=grid_val_on_points.min()
    vmax_model_inter=grid_val_on_points.max()
    vmin=np.min([vmin_obs,vmin_model_ori,vmin_model_inter])
    vmax=np.max([vmax_obs,vmax_model_ori,vmax_model_inter])

    #observation data
    ax1 = plt.subplot(gs[:2, :2])
    plt.tight_layout(pad=3.0)
    im=ax1.scatter(d_sub['lon'], d_sub['lat'], c=d_sub[param_obs], cmap='jet',vmin=vmin,vmax=vmax)
    ax1.set_title('Observation data')

    #weather model data (original grid)
    ax2 = plt.subplot(gs[:2, 2:])
    ax2.pcolor(grid_lon,grid_lat,grid_val,cmap="jet",vmin=vmin,vmax=vmax)
    ax2.set_title('Weather model data (original grid)')

    #weather model data (interpolated on observation points)
    ax3 = plt.subplot(gs[2:4, 1:3])
    im3=ax3.scatter(latlon_obs[:,1], latlon_obs[:,0], c=grid_val_on_points, cmap='jet',vmin=vmin,vmax=vmax)
    ax3.set_title('Weather model data (interpolated on observation points)')

    fig.colorbar(im,ax=[ax2,ax3]).set_label(mod[param_mod].long_name+ ' (in '+mod[param_mod].units+ ')')
    plt.show()
    
    return d_sub, mod[param_mod][date_mod.hour,:,:], latlon_obs,grid_val_on_points

In [None]:
#observation
date_obs = '2016-02-10T10:00:00' 
param_obs = 'hu'

#model
model = 'arome' #weather model (arome or arpege)
date_mod = dt.datetime(2016, 2,10,10,0) # Day example 
param_mod = 'r'

#algorithm about interpolation
algo = 'nearest' #'linear' or 'nearest' for nearest neighbors

In [None]:
obs_output, mod_output, latlon_obs,grid_val_on_points =  choose_and_display(model,date_mod,date_obs,param_obs,param_mod,algo,d_sub)