# Sub-seasonal range re-forecasts example

## Objective  
This notebook will show you:
- how to find which dates and steps to use to download the re-forecacsts - weekly steps from sub-seasonal re-forecasts
- download the re-forecasts
- calculate the mean of the fields for each step
- calculate the percentiles for each step

Please note that the climate built using this notebook is only valid for the date it was built. If you want to build the climate on some other day, you need to download the data again with the correct steps.

If you want to build the climate every day, you may download all the available steps at once and reuse the files.

In [31]:
import datetime
from earthkit.time import Sequence, model_climate_dates, date_range
import metview as mv
import requests
from ecmwfapi import ECMWFService

server = ECMWFService("mars")

Sequenece is Abstract representation of a sequence of dates.  
For the re-forecasts we use either ecmwf-4days or ecmwf-2days sequence. These are built in, and represent configuration of ECMWF forecast systems.

First example will be medium range forecast.

In [37]:
sequence = Sequence.from_resource("ecmwf-2days")
sequence

MonthlySequence(days=[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31], excludes={(2, 29)})

Next thing we need is to know what dates we need to build the model climate for today's forecast.  
Form medium range model climate (M climate) we can get this using the model_climate_dates function.

model_climate_dates(reference: date, start: date | int, end: date | int, before: timedelta | int, after: timedelta | int, sequence: Sequence)→ Iterator[date]  
Parameters:
- `reference (datetime.date)` – Reference date for the climate
- `start (datetime.date or int)` – Start of the climatological period. Either a full date or a year
- `end (datetime.date or int)` – End of the climatological period. Either a full date or a year
- `before (datetime.timedelta or int)` – Cut-off before the reference date. Either a timedelta or a number of days
- `after (datetime.timedelta or int)` – Cut-off after the reference date. Either a timedelta or a number of days
- `sequence (earthkit.time.sequence.Sequence)` – Sequence of available dates in the reference set

Returns: Sequence of dates

Now we have a sequence, we can find a closest reforecast date for today, and from there, all the reforecast dates we need to calculate the model climate.

In [38]:
today = datetime.date.today()
today

datetime.date(2024, 9, 16)

Now we need to calculate the closest day of the ECMWF re-forecasts.  
We will use the **nearest** funciton.

In [39]:
clim_date = sequence.nearest(today)
clim_date

datetime.date(2024, 9, 15)

Next we will calculate the sequence of 9 dates, around our climatology date (including the climatology date).  
For this we can use the bracket function.  
We need to give it `clim_date`, number of days around `clim_date` we want reforecasts for and we need to set `strict` to False to include the `clim_date` in the set of dates.

In [40]:
clim_dates = sequence.bracket(clim_date,2,strict=False)
clim_dates

<generator object Sequence.bracket at 0x17c4eb340>

We can loop through the `clim_dates` to see what we got:

In [41]:
for c in clim_dates:
    print(c)

2024-09-11
2024-09-13
2024-09-15
2024-09-17
2024-09-19


The steps we need from the reforecast will depend on the **day of the week of the forecast**.  
This is because at ECMWF we are creating weekly climatologies to calculate the forecast anomalies, for example.

Every **forecast week starts on Monday**. Therefore, depending on the day of the week of the forecast, we will need different steps from the re-forecasts.  

The steps are defined in this table:

| DOW | steps |
| --- | --- |
| Monday | 0-168/168-336/336-504/504-672/672-840/840-1008 |
| Tuesday | 144-312/312-480/480-648/648-816/816-984 |
| Wednesday | 120-288/288-456/456-624/624-792/792-960 |
| Thursday | 96-264/264-432/432-600/600-768/768-936/936-1104 |
| Friday | 72-240/240-408/408-576/576-744/744-912/912-1080 |
| Saturday | 48-216/216-384/384-552/552-720/720-888/888-1056 |
| Sunday | 24-192/192-360/360-528/528-696/696-864/864-1032 |

We can, of course do something like: 
```python
    if dow == 0:
        steps = "0-168/168-336/336-504/504-672/672-840/840-1008"
    ...
    elif dow == 6:
        steps = "24-192/192-360/360-528/528-696/696-864/864-1032"
```

but we can make it simpler.  

First we find out the day of the week of the date of the climatology.  

Note that the days of the week in python start with 0, so 1 is Tuesday.

In [8]:
clim_date.weekday()

0

We can use the following formula to find out the first step:
```python
first_step = ((7 - dow) % 7)*24
```
And then add weekly steps (168 hours) to build all the steps we need.

In [9]:
dow = clim_date.weekday()
first_step = ((7 - dow) % 7)*24
print(first_step)

0


We can check if this is correct result for all the days in the week by running next cell:

In [10]:
for dow in range(7):
    first_step = ((7 - dow) % 7)*24
    print(first_step)

0
144
120
96
72
48
24


Next step is to create the list of all the steps for a climatology date

In [11]:
dow = clim_date.weekday()
first_step = ((7 - dow) % 7)*24

maxstep = 937 #this is maximum of all the left ranges in the re-forecast
weekly_step = 168
the_steps = ""

for s in range(first_step, maxstep, weekly_step):
    step_string = str(s) + '-' + str(s+168)
    if s + weekly_step < maxstep:
        step_string += '/'
    the_steps += step_string

print(the_steps)

0-168/168-336/336-504/504-672/672-840/840-1008


Now we can put it all together

In [15]:
data_area = "50/10/40/30"
today = datetime.date.today()

clim_date = sequence.nearest(today)
dow = clim_date.weekday()

clim_dates = sequence.bracket(clim_date,2,strict=False)

for date in clim_dates:
    month, day = date.month, date.day
    print(month, day)
    fdate = date.strftime("%Y-%m-%d")
    dates = model_climate_dates(date, 2004, 2023, 1, 1, sequence)

    date_strings = [d.strftime('%Y-%m-%d') for d in dates]
    date_string = "/".join(date_strings)
    print(fdate)
    print(date_string)

    dow = clim_date.weekday()
    
    first_step = ((7 - dow) % 7)*24
    the_steps = ""
    
    for s in range(first_step, maxstep, weekly_step):
    
        step_string = str(s) + '-' + str(s+168)
        if s + weekly_step < maxstep:
            step_string += '/'
        the_steps += step_string
    
    fname = f'reforecast_raw_{fdate}.grib'
    
    server.execute(
        {
        'stream'    : "eefh",
        'levtype'   : "sfc",
        'expver'    : "79",
        'number'    : "0/1/2/3/4/5/6/7/8/9/10",
        'step'      : the_steps,
        'param'     : "167",
        'time'      : "00",
        'date'      : fdate,
        'hdate'     : date_string,
        'type'      : "fcmean",
        'class'     : "od",
        'area'      : data_area
        },    
        fname)    
    

9 5
2024-09-05
2004-09-05/2005-09-05/2006-09-05/2007-09-05/2008-09-05/2009-09-05/2010-09-05/2011-09-05/2012-09-05/2013-09-05/2014-09-05/2015-09-05/2016-09-05/2017-09-05/2018-09-05/2019-09-05/2020-09-05/2021-09-05/2022-09-05/2023-09-05
2024-09-10 11:03:30 ECMWF API python library 1.6.3
2024-09-10 11:03:30 ECMWF API at https://api.ecmwf.int/v1
2024-09-10 11:03:31 Welcome Milana Vuckovic
2024-09-10 11:03:32 In case of problems, please check https://confluence.ecmwf.int/display/WEBAPI/Web+API+FAQ or contact servicedesk@ecmwf.int
2024-09-10 11:03:32 Request submitted
2024-09-10 11:03:32 Request id: 66e01974d182ffa653c98190
2024-09-10 11:03:32 Request is submitted
2024-09-10 11:03:33 Request is active
2024-09-10 11:05:09 Calling 'nice mars /tmp/20240910-1000/60/tmp-_mars-ZPLtqv.req'
2024-09-10 11:05:09 Forcing MIR_CACHE_PATH=/data/ec_coeff
2024-09-10 11:05:09 mars - WARN -
2024-09-10 11:05:09 mars - WARN -
2024-09-10 11:05:09 MIR environment variables:
2024-09-10 11:05:09 MIR_CACHE_PATH=/dat

In [16]:
data = mv.Fieldset(path="reforecast_raw_*.grib")
data.describe()

parameter,typeOfLevel,level,date,time,step,number,paramId,class,stream,type,experimentVersionNumber
2t,surface,0,"20240905,20240907,...",0,"168,336,...","0,1,...",167,od,eefh,fcmean,79


In [17]:
data.ls()

Unnamed: 0_level_0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
Message,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,ecmf,2t,surface,0,20040905,0,0-168,fcmean,0,reduced_gg
1,ecmf,2t,surface,0,20040905,0,0-168,fcmean,1,reduced_gg
2,ecmf,2t,surface,0,20040905,0,0-168,fcmean,2,reduced_gg
3,ecmf,2t,surface,0,20040905,0,0-168,fcmean,3,reduced_gg
4,ecmf,2t,surface,0,20040905,0,0-168,fcmean,4,reduced_gg
...,...,...,...,...,...,...,...,...,...,...
6595,ecmf,2t,surface,0,20230913,0,840-1008,fcmean,6,reduced_gg
6596,ecmf,2t,surface,0,20230913,0,840-1008,fcmean,7,reduced_gg
6597,ecmf,2t,surface,0,20230913,0,840-1008,fcmean,8,reduced_gg
6598,ecmf,2t,surface,0,20230913,0,840-1008,fcmean,9,reduced_gg


We can see that for Tuesday and Wednesday, we have 5500 fields: 5 steps x 11 ensemble members (number 0-10) x 5 dates x 20 years.  
All the other days have 6 weekly steps so we have 6600 fields: 6 steps x 11 ensemble members (number 0-10) x 5 dates x 20 years.

Note: When step is in the form of range, we need to use the parameter **stepRange**.

Now we can calculate the mean value of all the emsemble members over all 20 years of the reforecasts for one step.  
To get one step, first we need to convert our string with steps to the list of steps and take the first element of the list.

In [18]:
steps = the_steps.split("/")
steps

['0-168', '168-336', '336-504', '504-672', '672-840', '840-1008']

In [19]:
first_step = steps[0]

In [20]:
one_step = data.select(stepRange = first_step)
one_step.ls()

Unnamed: 0_level_0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
Message,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,ecmf,2t,surface,0,20040905,0,0-168,fcmean,0,reduced_gg
1,ecmf,2t,surface,0,20040905,0,0-168,fcmean,1,reduced_gg
2,ecmf,2t,surface,0,20040905,0,0-168,fcmean,2,reduced_gg
3,ecmf,2t,surface,0,20040905,0,0-168,fcmean,3,reduced_gg
4,ecmf,2t,surface,0,20040905,0,0-168,fcmean,4,reduced_gg
...,...,...,...,...,...,...,...,...,...,...
1095,ecmf,2t,surface,0,20230913,0,0-168,fcmean,6,reduced_gg
1096,ecmf,2t,surface,0,20230913,0,0-168,fcmean,7,reduced_gg
1097,ecmf,2t,surface,0,20230913,0,0-168,fcmean,8,reduced_gg
1098,ecmf,2t,surface,0,20230913,0,0-168,fcmean,9,reduced_gg


In [21]:
one_step_mean = mv.mean(one_step)
one_step_mean.ls()

Unnamed: 0_level_0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,gridType
Message,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,ecmf,2t,surface,0,20040905,0,0-168,fcmean,reduced_gg


We can plot the data to quickly check the result.  
Begin with adding some automatic styling and zoom into the area.

In [22]:
data_area = [50,10,40,30]
margins = [2, -2, -2, 2]
view_area = [a + b for a, b in zip(data_area, margins)]

In [23]:
coastlines = mv.mcoast(map_coastline_land_shade=True,
                       map_coastline_land_shade_colour="RGB(0.85,0.85,0.85)",
                       map_coastline_sea_shade=True,
                       map_coastline_sea_shade_colour="RGB(0.95,0.95,0.95)",)
view = mv.geoview(map_area_definition="corners", area=view_area, coastlines=coastlines)
cont_auto = mv.mcont(legend=True, contour_automatic_setting="ecmwf", grib_scaling_of_derived_fields=True)

In [24]:
mv.plot(view, one_step_mean, cont_auto)

Image(value=b'', layout="Layout(visibility='hidden')")

Label(value='Generating plots....')

Now let's finally calculate mean for each step.  
We can do this by calculating mean over **number** and **date** dimension.

Please note that when doing the calculations, Metview will **keep the metadata of the first field**.

In [25]:
mean_test = data.mean(dim=["number", "date"],
    preserve_dims=["shortName", "level", "stepRange", "time"])

In [26]:
mean_test.ls()

Unnamed: 0_level_0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,gridType
Message,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,ecmf,2t,surface,0,20040905,0,0-168,fcmean,reduced_gg
1,ecmf,2t,surface,0,20040905,0,168-336,fcmean,reduced_gg
2,ecmf,2t,surface,0,20040905,0,336-504,fcmean,reduced_gg
3,ecmf,2t,surface,0,20040905,0,504-672,fcmean,reduced_gg
4,ecmf,2t,surface,0,20040905,0,672-840,fcmean,reduced_gg
5,ecmf,2t,surface,0,20040905,0,840-1008,fcmean,reduced_gg


In [27]:
mean_test.describe()

parameter,typeOfLevel,level,date,time,step,paramId,class,stream,type,experimentVersionNumber
2t,surface,0,20240905,0,"168,336,...",167,od,eefh,fcmean,79


In [28]:
mv.plot(view, mean_test, cont_auto)

Image(value=b'', layout="Layout(visibility='hidden')")

Label(value='Generating plots....')

VBox(children=(IntSlider(value=1, description='Frame:', layout=Layout(width='800px'), max=1, min=1), HBox(chil…

Please note that if you don't see the plot above, you need to install the **ipywidgets** into the Python environment you are currently using. We are not importing the ipywidgets directly, but Metview is using it internally.

## Compute percentiles
Last thing left to do is to compute the percentiles. 
Here we compute the percentiles for the first step range.

In [29]:
percentiles = list(range(101))
pc = mv.percentile(data=one_step, percentiles=percentiles)
pc.ls()

Unnamed: 0_level_0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
Message,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,ecmf,2t,surface,0,20040905,0,0-168,fcmean,0,reduced_gg
1,ecmf,2t,surface,0,20040905,0,0-168,fcmean,1,reduced_gg
2,ecmf,2t,surface,0,20040905,0,0-168,fcmean,2,reduced_gg
3,ecmf,2t,surface,0,20040905,0,0-168,fcmean,3,reduced_gg
4,ecmf,2t,surface,0,20040905,0,0-168,fcmean,4,reduced_gg
5,ecmf,2t,surface,0,20040905,0,0-168,fcmean,5,reduced_gg
6,ecmf,2t,surface,0,20040905,0,0-168,fcmean,6,reduced_gg
7,ecmf,2t,surface,0,20040905,0,0-168,fcmean,7,reduced_gg
8,ecmf,2t,surface,0,20040905,0,0-168,fcmean,8,reduced_gg
9,ecmf,2t,surface,0,20040905,0,0-168,fcmean,9,reduced_gg


In [30]:
mv.plot(view, pc, cont_auto)

Image(value=b'', layout="Layout(visibility='hidden')")

Label(value='Generating plots....')

VBox(children=(IntSlider(value=1, description='Frame:', layout=Layout(width='800px'), max=1, min=1), HBox(chil…