Skip to content

Commit

Permalink
Changes to make 'target_path' and 'target_filename' compliant with Py…
Browse files Browse the repository at this point in the history
…thon's standard string formatting (#144)

* made changes to make 'target_path' and 'target_filename' compliant with Python's standard string formatting

* use of Python's standard string formatting in fetcher_test.py

* refactored the type-casting and its related functions & some small changes

* remove support of target_filename

* refactored type-casting method

* use of readthedocs links instead of github and other small changes

* resolve pytest error.
  • Loading branch information
mahrsee1997 committed Apr 12, 2022
1 parent 61f0295 commit c5d585d
Show file tree
Hide file tree
Showing 13 changed files with 149 additions and 171 deletions.
47 changes: 18 additions & 29 deletions Configuration.md
Expand Up @@ -76,12 +76,6 @@ These describe which data source to download, where the data should live, and ho
* `dataset`: (optional) Name of the target dataset. Allowed options are dictated by the client.
* `target_path`: (required) Download artifact filename template. Can use Python string format symbols. Must have the
same number of format symbols as the number of partition keys.
* `target_filename`: (optional) This file name will be appended to `target_path`.
* Like `target_path`, `target_filename` can contain format symbols to be replaced by partition keys; if this is
used, the total number of format symbols in both fields must match the number of partition keys.
* This field is required when generating a date-based directory hierarchy (see below).
* `append_date_dirs`: (optional) A boolean indicating whether a date-based directory hierarchy should be created (see
below); defaults to false if not used.
* `partition_keys`: (optional) This determines how download jobs will be divided.
* Value can be a single item or a list.
* Each value must appear as a key in the `selection` section.
Expand All @@ -91,29 +85,23 @@ These describe which data source to download, where the data should live, and ho
* E.g. `['year', 'month']` will lead to a config set like `[(2015, 01), (2015, 02), (2015, 03), ...]`.
* The list of keys will be used to format the `target_path`.

### Creating a date-based directory hierarchy

The configuration can be set up to automatically generate a date-based directory hierarchy for the output files.
> **NOTE**: `target_path` template is totally compatible with Python's standard string formatting.
> This includes being able to use named arguments (e.g. 'gs://bucket/{year}/{month}/{day}.nc') as well as specifying formats for strings
> (e.g. 'gs://bucket/{year:04d}/{month:02d}/{day:02d}.nc').
To enable this feature, the `append_date_dirs` field has to be set to `true`. In addition, the `target_filename` needs
to be specified, and `date` has to be a `partition_key`;
`date` will not be used as a replacement in `target_template` but will instead be used to create a directory structure.
### Creating a date-based directory hierarchy

The resulting target path will be `<target_path>/{year}/{month}/{day}<target_filename>`. The number of format symbols in
this path has to match the number of partition keys excluding `date`.
The date-based directory hierarchy can be created using Python's standard string formatting.
Below are some examples of how to use `target_path` with Python's standard string formatting.

<details>
<summary><strong>Examples</strong></summary>

Below are more examples of how to use `target_path`, `target_filename`, and `append_date_dirs`.

Note that any parameters that are not relevant to the target path have been omitted.

```
[parameters]
target_filename=.nc
target_path=gs://ecmwf-output-test/era5/
append_date_dirs=true
target_path=gs://ecmwf-output-test/era5/{date:%%Y/%%m/%%d}.nc
partition_keys=
date
[selection]
Expand All @@ -126,9 +114,7 @@ will create

```
[parameters]
target_filename=-pressure-{}.nc
target_path=gs://ecmwf-output-test/era5/
append_date_dirs=true
target_path=gs://ecmwf-output-test/era5/{date:%%Y/%%m/%%d}-pressure-{pressure_level}.nc
partition_keys=
date
pressure_level
Expand All @@ -144,9 +130,7 @@ will create

```
[parameters]
target_filename=.nc
target_path=gs://ecmwf-output-test/pressure-{}/era5/
append_date_dirs=true
target_path=gs://ecmwf-output-test/pressure-{pressure_level}/era5/{date:%%Y/%%m/%%d}.nc
partition_keys=
date
pressure_level
Expand All @@ -160,12 +144,9 @@ will create
`gs://ecmwf-output-test/pressure-500/era5/2017/01/01.nc` and
`gs://ecmwf-output-test/pressure-500/era5/2017/01/02.nc`.

The above example also illustrates how to create a directory structure based on partition keys, even without using the
date-based creation:

```
[parameters]
target_path=gs://ecmwf-output-test/era5/{}/{}/{}-pressure-{}.nc
target_path=gs://ecmwf-output-test/era5/{year:04d}/{month:02d}/{day:02d}-pressure-{pressure_level}.nc
partition_keys=
year
month
Expand All @@ -187,6 +168,14 @@ will create
`gs://ecmwf-output-test/era5/2017/01/01-pressure-500.nc` and
`gs://ecmwf-output-test/era5/2017/01/02-pressure-500.nc`.

> **Note**: Replacing the `target_path` of the above example with this `target_path=gs://ecmwf-output-test/era5/{year}/{month}/{day}-pressure-
>{pressure_level}.nc`
>
> will create
>
> `gs://ecmwf-output-test/era5/2017/1/1-pressure-500.nc` and
> `gs://ecmwf-output-test/era5/2017/1/2-pressure-500.nc`.
</details>

### Subsections
Expand Down
2 changes: 1 addition & 1 deletion configs/era5_example_config.cfg
Expand Up @@ -14,7 +14,7 @@
[parameters]
client=cds
dataset=reanalysis-era5-pressure-levels
target_path=gs://ecmwf-output-test/era5/{}/{}/{}-pressure-{}.nc
target_path=gs://ecmwf-output-test/era5/{year:04d}/{month:02d}/{day:02d}-pressure-{pressure_level}.nc
partition_keys=
year
month
Expand Down
2 changes: 1 addition & 1 deletion configs/era5_example_config_local_run.cfg
Expand Up @@ -22,7 +22,7 @@
[parameters]
client=cds
dataset=reanalysis-era5-pressure-levels
target_path=era5-{}{}{}-pressure-{}.nc
target_path=era5-{year:04d}{month:02d}{day:02d}-pressure-{pressure_level}.nc
partition_keys=
year
month
Expand Down
2 changes: 1 addition & 1 deletion configs/era5_example_config_preproc.cfg
Expand Up @@ -14,7 +14,7 @@
[parameters]
client=cds
dataset=reanalysis-era5-pressure-levels
target_path=gs://ecmwf-downloads/test/o1280-{}-{}-{}.grib
target_path=gs://ecmwf-downloads/test/o1280-{year:04d}-{month:02d}-{day:02d}.grib
partition_keys=
year
month
Expand Down
4 changes: 1 addition & 3 deletions configs/era5_example_config_using_date.cfg
Expand Up @@ -21,9 +21,7 @@ dataset=reanalysis-era5-pressure-levels
# gs://ecmwf-output-test/era5/2017/01/02-pressure-500.nc
# gs://ecmwf-output-test/era5/2017/01/01-pressure-1000.nc
# gs://ecmwf-output-test/era5/2017/01/02-pressure-1000.nc
target_filename=-pressure-{}.nc
target_path=gs://ecmwf-output-test/era5/
append_date_dirs=true
target_path=gs://ecmwf-output-test/era5/{date:%%Y/%%m/%%d}-pressure-{pressure_level}.nc
partition_keys=
date
pressure_level
Expand Down
4 changes: 1 addition & 3 deletions configs/mars_example_config.cfg
Expand Up @@ -15,9 +15,7 @@
[parameters]
client=mars
dataset=ecmwf-mars-output
target_filename=.nc
target_path=gs://ecmwf-downloads/hres-single-level
append_date_dirs=true
target_path=gs://ecmwf-downloads/hres-single-level/{date:%%Y/%%m/%%d}.nc
partition_keys=
date

Expand Down
4 changes: 1 addition & 3 deletions configs/mars_example_config.json
Expand Up @@ -2,9 +2,7 @@
"parameters": {
"client": "mars",
"dataset": "ecmwf-mars-output",
"target_filename": ".nc",
"target_path": "gs://ecmwf-downloads/hres-single-level",
"append_date_dirs": "true",
"target_path": "gs://ecmwf-downloads/hres-single-level/{:%Y/%m/%d}.nc",
"partition_keys": "date"
},

Expand Down
2 changes: 1 addition & 1 deletion configs/seasonal_forecast_example_config.cfg
Expand Up @@ -15,7 +15,7 @@
[parameters]
client=cds
dataset=seasonal-original-single-levels
target_path=gs://ecmwf-output-test/seasonal-forecast/seasonal-forecast-{}-{}.nc
target_path=gs://ecmwf-output-test/seasonal-forecast/seasonal-forecast-{year:04d}-{month:02d}.nc
partition_keys=
year
month
Expand Down
10 changes: 5 additions & 5 deletions weather_dl/download_pipeline/fetcher_test.py
Expand Up @@ -73,7 +73,7 @@ def test_fetch_data(self, mock_retrieve, mock_gcs_file):
'parameters': {
'dataset': 'reanalysis-era5-pressure-levels',
'partition_keys': ['year', 'month'],
'target_path': 'gs://weather-dl-unittest/download-{}-{}.nc',
'target_path': 'gs://weather-dl-unittest/download-{:02d}-{:02d}.nc',
'api_url': 'https//api-url.com/v1/',
'api_key': '12345',
},
Expand Down Expand Up @@ -104,7 +104,7 @@ def test_fetch_data__manifest__returns_success(self, mock_retrieve, mock_gcs_fil
'parameters': {
'dataset': 'reanalysis-era5-pressure-levels',
'partition_keys': ['year', 'month'],
'target_path': 'gs://weather-dl-unittest/download-{}-{}.nc',
'target_path': 'gs://weather-dl-unittest/download-{:02d}-{:02d}.nc',
'api_url': 'https//api-url.com/v1/',
'api_key': '12345',
},
Expand Down Expand Up @@ -132,7 +132,7 @@ def test_fetch_data__manifest__records_retrieve_failure(self, mock_retrieve):
'parameters': {
'dataset': 'reanalysis-era5-pressure-levels',
'partition_keys': ['year', 'month'],
'target_path': 'gs://weather-dl-unittest/download-{}-{}.nc',
'target_path': 'gs://weather-dl-unittest/download-{:02d}-{:02d}.nc',
'api_url': 'https//api-url.com/v1/',
'api_key': '12345',
},
Expand Down Expand Up @@ -169,7 +169,7 @@ def test_fetch_data__manifest__records_gcs_failure(self, mock_retrieve, mock_gcs
'parameters': {
'dataset': 'reanalysis-era5-pressure-levels',
'partition_keys': ['year', 'month'],
'target_path': 'gs://weather-dl-unittest/download-{}-{}.nc',
'target_path': 'gs://weather-dl-unittest/download-{:02d}-{:02d}.nc',
'api_url': 'https//api-url.com/v1/',
'api_key': '12345',
},
Expand Down Expand Up @@ -205,7 +205,7 @@ def test_fetch_data__skips_existing_download(self, mock_retrieve, mock_gcs_file)
'parameters': {
'dataset': 'reanalysis-era5-pressure-levels',
'partition_keys': ['year', 'month'],
'target_path': 'gs://weather-dl-unittest/download-{}-{}.nc',
'target_path': 'gs://weather-dl-unittest/download-{year:02d}-{month:02d}.nc',
'api_url': 'https//api-url.com/v1/',
'api_key': '12345',
},
Expand Down

0 comments on commit c5d585d

Please sign in to comment.