Skip to content
Machine learning to better predict and understand drought
Branch: master
Clone or download
tommylees112 Vhi preprocess #preprocess (#14)
* going for a run so pushing prematurely. STILL TODO: tests, test the method on a subsample of data

* update the

parallel funcitons - had to be taken out of the class

* make travis happy

* add a few initial tests to

* update gitignore and change export() output from Path -> List

* adding test functionality to the vhi exporter

* make travis happier?

* update the envirnment file for travis

* add pathos (for parallel execution) to mac environment too

* update tests

* make travis happier

* try and replace the call to FTP but unable to pass the test

* respond to Gabi comments. Main change is getting the tests to work with the mock FTP object (thanks bud). Also added years arguments to the VHIExporter.export() so that have some more control. added docstrings to export() function

* fix the mutable arguments issue in VHIExporter.export(); add functionality to the and add to the default.json argument

* add years argument t vhi exporter

* add years argument t vhi exporter in default json

* update the default_pipeline dates etc.;

* initial commit for the preprocessing of VHI data. design process split into = raw preprocessing class, preprocess_vhi = utils specific to this VHI data and preprocess utils = utils general to other preprocessing files

* Update to preprocess in parallel - written in the VHIPreprocessor.preprocess() function

* making travis and mypy happy

* fix the get_vhi_filepaths arg

* update the interim folder path

* add an initial attempt at createing the one dataset to rule them all

* add the 1990 year to try and fix missing data

* update all years in default.json

* update notebook with datasets we want to initially work with

* update the data we want file

* update the data for GLOFAS

* ADD IN A 5 loop to download VHI data 5x to try and ensure most of teh data has been downloaded

* add a vhi only download config file

* add the initial ability to test the missing filepaths in the data/raw/vhi dir

* update the json object

* update the json object

* update the json object

* update vhi only download json

* add the error catching and export for years missing files

* repeat BOTH getting filenames AND downloading 5x (in case it's the first call to the FTP server that changes

* error catching implemented in

* update message to user

* allow files to have EITHER 52 OR 104 files

* update datasets watnted

* update the gitignore file to ignore data/interim

* add pytest script to vhi_preprocess

* initial test added to vhi

* update the name of the preprocessor

* update the pytesting functionality for the vhi_preprocessor object

* update tests to test the actual preprocess functionality on .nc file

* update tests for VHI preprocessor

* Cleanup

* Cleanup

* Cleanup

* Cleanup

* Cleanup

* Allow latitude / longitude to be reversed if necessary

* Rename preprocessed_utils to utils

* Rename preprocess_vhi_data function

* Change subset argument to bool

* Consolidate helper functions into single file

* Add VHIPreprocessor to the init file

* Add to the base preprocessor docstring

* Cleanup

* add the preprocessor to the option

* Add to the preprocessor tests

* update the json

* get the running pipeline working

* update the to work with ALL variables in variables

* update documentation in base preprocessor
Latest commit 02916f4 May 24, 2019

ml_drought Build Status

Check out our companion repo with exploratory notebooks here!

ESoWC 2019 - Machine learning to better predict and understand drought.

Using ECMWF/Copernicus open datasets to evaluate machine learning techniques for the prediction of droughts. This was a project proposal submitted to the ECMWF Summer of Weather Code Challenge #12.

Team: @tommylees112, @gabrieltseng

Table of Contents

About the Project

The Summer of Weather Code(ESoWC) programme by the European Centre for Medium-Range Weather Forecasts (ECMWF) is a collabrative online programme to promote the development of weather-related open-source software.

This is our contribution to the ECMWF Summer of Weather Code programme where we will be developing a pipeline for rapid experimentation of:

  • different machine learning algorithms
  • different input datasets (feature selection)
  • different measures of drought (meteorological, hydrological, agricultural)

Our goals are as follows:

  • To build a robust, useful pipeline.
  • Usefully predict drought.
  • Understand the relationships between model inputs and outputs - What is the model learning?
  • Make the pipeline and results accessible.

Work in progress

We will be documenting the progress of our pipeline as go.

We have a set of notebooks and scripts that are very rough but represent the current state of our work at this repo here.

For updates follow @tommylees112 on twitter or look out for our blog posts!


The main entrypoint into the pipeline is The configuration of the pipeline can be defined using a configuration file - the desired configuration file can be passed as a command line argument:

python --config <PATH_TO_CONFIG>

If no configuration file is passed, the pipeline's default minimal configuration is used.


Anaconda running python 3.7 is used as the package manager. To get set up with an environment, install Anaconda from the link above, and (from this directory) run

conda env create -f environment.{mac, ubuntu.cpu}.yml

This will create an environment named esowc-drought with all the necessary packages to run the code. To activate this environment, run

conda activate esowc-drought

Docker can also be used to run this code. To do this, first run the docker app (either docker desktop) or configure the docker-machine:

# on macOS
brew install docker-machine docker

docker-machine create --driver virtualbox default
docker-machine env default

See here for help on all machines or here for MacOS.

Then build the docker image:

docker build -t ml_drought .

Then, use it to run a container, mounting the data folder to the container:

docker run -it \
--mount type=bind,source=<PATH_TO_DATA>,target=/ml_drought/data \
ml_drought /bin/bash

This pipeline can be tested by running pytest. We use mypy for type checking. This can be run by running mypy src (this runs mypy on the src directory).


Huge thanks to @ECMWF for making this project possible!

You can’t perform that action at this time.