##Prepare meteorological data for WOFOST run on gridded data

This notebook reads MeteoSwiss climate data [Tmin, Tmax](https://www.meteoswiss.admin.ch/dam/jcr:818a4d17-cb0c-4e8b-92c6-1a1bdf5348b7/ProdDoc_TabsD.pdf) and agricultural exposure data (or the spatial_units.gpkg file) as described in [Portmann et al. 2023](https://egusphere.copernicus.org/preprints/2023/egusphere-2023-2598/) and produces one .csv file for each meteorological variable with time as rows and grid cell IDs (of all 1x1km grid cells with exposure) as columns.

In [2]:
import pandas as pd
import xarray as xr
import numpy as np
import geopandas as gpd
import os

First, define directories of gridded exposure data (polygons), the MeteoSwiss data and the directory to store the output data

In [4]:
#produce MCH Data (comment if already done so)

#INPUT DATA
datadir_MCH='O:/Data-Raw/27_Natural_Resources-RE/99_Meteo_Public/MeteoSwiss_netCDF/__griddedData/lv95/'
datadir_exposure='C:/Users/F80840370/projects/scClim/climada/data/scClim/exposure/GIS/Weizen_Mais_Raps_Gerste_polygons.gpkg'

#OUTPUT DATA
OUTDIR='C:/Users/F80840370/projects/scClim/wofost/winter_wheat_phenology/meteo_data/MeteoSwiss/'

startyear=1971
endyear=2021
variables=['TminD','TmaxD']
filename_base='TminTmax_daily_{}_{}'.format(startyear,endyear)

Create spatial units file from exposure or just read it if it is there

In [8]:
#create from exposure
units=gpd.read_file(datadir_exposure)
units=units[units.n_fields>0]
units_lv95=units.to_crs(crs=2056)
units_lv95['X']=np.round(units_lv95.centroid.x)
units_lv95['Y']=np.round(units_lv95.centroid.y)
units_lv95['ID']=units_lv95.index


#save
#nits_lv95.to_file("spatial_units.gpkg", driver="GPKG")
#read
#units_lv95=gpd.read_file("spatial_units.gpkg", driver="GPKG")

ValueError: Must pass 2-d input. shape=(1, 12742, 6)

In [37]:
units_new = gpd.GeoDataFrame.copy(units_lv95)
units_new['geometry']=units_new.geometry.centroid
units_new=units_new.to_crs(epsg=4326)
units_lv95['latitude']=units_new.geometry.y.values
units_lv95['longitude']=units_new.geometry.x.values

In [38]:
units_lv95.latitude

147475    46.181038
147476    46.190032
147477    46.199026
148112    46.154227
148116    46.190203
            ...    
351650    46.295079
351651    46.304069
354204    46.239913
363869    46.819433
365790    46.827412
Name: latitude, Length: 12742, dtype: float64

In [39]:
'n_fields' in units.columns

True

In [13]:
units_lv95.copy

Unnamed: 0,n_fields,area_ha,geometry,X,Y,ID
147475,2.0,9.397416,"POLYGON ((2484000.000 1115000.001, 2484000.000...",2484500.0,1115500.0,147475
147476,3.0,20.227029,"POLYGON ((2484000.000 1116000.001, 2484000.000...",2484500.0,1116500.0,147476
147477,1.0,5.369543,"POLYGON ((2484000.000 1117000.001, 2484000.000...",2484500.0,1117500.0,147477
148112,9.0,41.700048,"POLYGON ((2485000.000 1112000.001, 2485000.000...",2485500.0,1112500.0,148112
148116,1.0,11.171633,"POLYGON ((2485000.000 1116000.001, 2485000.000...",2485500.0,1116500.0,148116
...,...,...,...,...,...,...
351650,1.0,0.699982,"POLYGON ((2803000.001 1130000.001, 2803000.001...",2803500.0,1130500.0,351650
351651,1.0,0.109455,"POLYGON ((2803000.001 1131000.001, 2803000.001...",2803500.0,1131500.0,351651
354204,1.0,0.086635,"POLYGON ((2807000.001 1124000.001, 2807000.001...",2807500.0,1124500.0,354204
363869,1.0,0.244925,"POLYGON ((2822000.001 1189000.001, 2822000.001...",2822500.0,1189500.0,363869


Read gridded MeteoSwiss Climate Data from Agroscope Server. Make sure to use .load() to load the data into memory. Otherwise subsequent computations may take a long time. This step takes quite some time (up to 1h is to be expected)

In [41]:
#get list of datafiles #1961-2022
mch_data={}
for var in variables:
    datafiles=[datadir_MCH+str(var)+'_ch01r.swiss.lv95_'+str(year)+'01010000_'+str(year)+'12310000.nc' for year in range(startyear,endyear+1)]
    #read datafiles
    print('reading data for {} from {}...'.format(var,datadir_MCH))
    mch_data[var]=xr.open_mfdataset(datafiles,concat_dim = 'time',combine='nested',coords = 'minimal')
    #to be able to later run this quickly we need to load the dataset
    mch_data[var].load()

147475    2484500.0
147476    2484500.0
147477    2484500.0
148112    2485500.0
148116    2485500.0
            ...    
351650    2803500.0
351651    2803500.0
354204    2807500.0
363869    2822500.0
365790    2825500.0
Name: X, Length: 12742, dtype: float64

For each variable read MeteoSwiss data and create for each variable a DataFrame with time as column and spatial units (ID) as rows. Write one file per variable for all years

In [3]:
for var in variables:
      datalist=[pd.Dataframe({'time': mch_data[var].time.values})]
      for ind in units_lv95.index:
            X=units_lv95.iloc[ind].X
            Y=units_lv95.iloc[ind].Y
            ID=units_lv95.iloc[ind].ID
            data=mch_data[var].sel(E=X,N=Y,method='nearest')
            datalist.append=pd.Dataframe({ID: data[var].values})
            
      df_out=pd.concat(datalist,axis=1)
      filename='{}_{}_{}'.format(var,startyear,endyear)
      df_out.to_csv(OUTDIR+filename+'.csv')


NameError: name 'variables' is not defined

Voilá, you are ready to run wofost

In [5]:
df=pd.read_csv(OUTDIR+'TmaxD_1971_2021.csv')


In [6]:
df

Unnamed: 0,time,147475,147476,147477,148112,148116,148750,148751,148752,148756,...,350374,351011,351012,351013,351018,351650,351651,354204,363869,365790
0,1971-01-01,-4.305581,-4.445194,-4.585274,-4.005631,-4.283897,-3.819474,-3.738159,-3.751381,-4.391461,...,-5.876159,-5.674415,-5.699399,-6.659936,-7.507540,-5.649245,-5.672795,-3.024392,-7.164733,-7.094122
1,1971-01-02,-5.051905,-5.068727,-5.061028,-5.073231,-5.053247,-4.870844,-4.803940,-4.804438,-5.050619,...,-2.794635,-2.533915,-2.600953,-3.438850,-4.595566,-2.460833,-2.527752,-0.481924,-10.618644,-10.558322
2,1971-01-03,-6.093925,-6.115509,-6.132390,-6.043950,-6.111102,-5.922085,-5.889316,-5.897485,-6.140614,...,-4.219985,-3.964033,-4.019457,-4.992185,-6.054646,-3.903561,-3.958723,-1.435574,-10.548851,-10.478326
3,1971-01-04,-3.602469,-3.649998,-3.686842,-3.516478,-3.608020,-3.363592,-3.311257,-3.320285,-3.647545,...,-4.791201,-4.692708,-4.710209,-5.226327,-5.879336,-4.676414,-4.692913,-2.306504,-10.044308,-10.001883
4,1971-01-05,-4.599098,-4.708415,-4.825640,-4.292585,-4.576224,-4.159068,-4.095499,-4.104642,-4.672283,...,-2.720943,-2.593507,-2.624409,-3.002264,-3.745184,-2.565035,-2.593020,-0.453866,-8.595233,-8.615941
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18623,2021-12-27,7.116959,7.015142,6.916936,7.327223,7.157213,7.449694,7.513391,7.516283,7.097947,...,5.195571,5.141492,5.182870,4.526286,3.964543,5.091433,5.133923,5.985033,0.648888,0.804338
18624,2021-12-28,10.747185,10.602016,10.462452,11.048518,10.805146,11.224643,11.317163,11.321515,10.720868,...,5.971821,5.874358,5.929748,5.305171,4.858248,5.809328,5.865180,6.675868,1.581946,1.782910
18625,2021-12-29,10.290333,10.192044,10.093796,10.441219,10.308229,10.604899,10.670662,10.669029,10.243773,...,12.513414,12.562511,12.561054,12.126258,11.566723,12.513274,12.519116,12.786704,2.275849,2.498450
18626,2021-12-30,14.382624,14.261340,14.149421,14.654027,14.433211,14.818632,14.908133,14.912622,14.356853,...,16.113150,15.565740,15.747110,16.290533,16.836813,15.348239,15.533063,15.199997,4.918460,4.909317


In [54]:
df.to_csv(OUTDIR+'test.csv')


In [71]:
df2=df[df['time']=='1996-10-01']

In [72]:
df3=df2.transpose()
df3

Unnamed: 0,9405
time,1996-10-01
147475,20.891436
147476,20.803375
147477,20.72431
148112,21.046808
...,...
351650,16.224445
351651,16.211933
354204,16.797941
363869,20.193792


In [73]:
null_mask = df3.isnull().any(axis=1)
null_rows = df3[null_mask]

In [70]:
null_rows

Unnamed: 0,9770
150666,
189847,
190487,
190490,
190491,
191130,
191770,
192408,
192409,
192410,


In [74]:
null_rows

Unnamed: 0,9405
150666,
189847,
190487,
190490,
190491,
191130,
191770,
192408,
192409,
192410,
