# Example calculation of PV potential for ERA5 data

I calculate for one year and derive monthly-mean values of PV potential. I compare two manners of parallelizing the work:

1. multiprocessing with 1 process per file, hence 12 processes; each process uses dask-xarray under the hood and can hence use >100% CPU
2. dask-xarray processing of 1 dataset with all 12 months

In [1]:
import xarray as xr
import numpy as np
from pathlib import Path
import matplotlib.pyplot as plt
from multiprocessing import Process, Queue

import core as core

import warnings
warnings.filterwarnings("ignore")

In [2]:
# location of era5 data on teachinghub
path="/home/voigta80/LEHRE/msc-intro-comp-met-ex-w2024/data/era5/"

## 1. Multiprocessing

In [3]:
# generate list of era5 files for a given year
def get_filelists(year: str):
    flist = list()
    for file in Path(path).rglob("era5-"+year+"-*.nc"):
        flist.append(file)
    return flist

In [4]:
# function to compute time-mean pv potential, will be called by multiprocessing
def batchcompute_pvpot(file, queue):
    ds = xr.open_dataset(file, engine="netcdf4", chunks={"valid_time":1e5} )
    ds["wspd"] = core.windspeed(ds)
    pv_pot = core.pv_pot(ds).mean("valid_time").compute()
    queue.put(pv_pot)
    return None

In [5]:
nlat = 721;
nlon = 1440;

def multi_processing():
    year= "2000"
    flist = get_filelists(year)
    # use 1 process per monthly file
    nprocs = len(flist)
    # output from each process
    pvpot_chk = np.zeros((nprocs,nlat,nlon))
    queue = Queue()
    processes = [Process(target=batchcompute_pvpot, 
                         args=(flist[i], queue)) for i in range(0, nprocs)]
    for process in processes: process.start() # start all processes
    for i in range(nprocs): # collect results from processes
        pvpot_chk[i] = queue.get()
    for process in processes: process.join()  # wait for all processes to complete
    # merge into yearly array
    pvpot = np.stack(pvpot_chk, axis=0)

In [6]:
core.measure_performance(multi_processing)

Execution time: 78.37880 seconds


## 2. Dask-array on merged file

In [7]:
def dask_xarray():
    ds2=xr.open_mfdataset(path+"era5-2000-*.nc", chunks={"valid_time":1e5} )
    ds2["wspd"] = core.windspeed(ds2)
    pvpot2 = core.pv_pot(ds2).groupby(ds2.valid_time.dt.month).mean("valid_time").compute()

In [9]:
core.measure_performance(dask_xarray)

Execution time: 679.63075 seconds
