# Using `ehyd_reader` to analyze and export a number of wells and stations

As shown in the [basic demonstration](https://github.com/joha1/ehyd_reader/blob/master/examples/demonstration1.ipynb) `ehyd_reader` can easily read and plot a single ehyd csv file.

But the benefit of using a scripting language like Python of course lies in the ability to automate the handling of multiple files.
In this Example, we will take a closer look at the [Aichfeld region](https://de.wikipedia.org/wiki/Aichfeld) (also called Judenburg–Knittelfelder–Becken), a large basin in the upper Mur valley. 
It covers an area of around $70 km^2$ at an average elevation of about 650 m a.s.l. 
Besides the tertiary fill down to 1000 m b.g.l. which was mined for coal, it's filled with up to 70 meters of fluvio-glacial sediment, housing significant amounts of groundwater (see [Haas & Birk (2017)](www.hydrol-earth-syst-sci.net/21/2421/2017/) for a more detailed description and further reading).
If you look at this region in ehyd (search for *Zeltweg* in the search box for the town right in its middle) you see that there's dozens of groundwater montiring wells as well as a few precipitation and river stations available.

In this notebook, we want to download them, to some first analysis and visualization and then export them to store them for further work.


Unfortunately, [ehyd.gv.at](https://ehyd.gv.at/) doesn not provide an API for bulk downloads, so we either have to download the needed files by hand, or coerce it into giving them to us in an automated way.

## Downloading the needed data

As shown in the [basic demonstration](https://github.com/joha1/ehyd_reader/blob/master/examples/demonstration1.ipynb) we can download data from the website by specifying a HZB number or by clicking the wells in the region/groundwater body that's interesting to us (with some help from the "Grundwassergebiete" layer under "Karten" in the upper right corner), but that is still a lot of clicking around.

In order to speed up this process, we only want to use the 20 wells having long term data available as listet in the supplement of [Haas & Birk (2017)](www.hydrol-earth-syst-sci.net/21/2421/2017/) and we will use the [urllib.request](https://docs.python.org/3/library/urllib.request.html) library to download.

In [10]:
# First, import the needed modules.
# This assumes you have ehyd_reader.py saved in your current working directory
import pandas as pd
from ehyd_reader import ehyd_reader
import urllib.request
import os

In [11]:
# Set up a list with the HZB numbers from Haas & Birk (2017)
well_list = [309997, 318394, 309989, 314807, 314716, 314864, 315317, 
             309906, 314815, 309948, 315077, 314898, 318386, 314872,
             314922, 310128, 314914, 314732, 310029, 310060]
# We're also interested in the precipitation stations and surface water gauges in the area
precip_list = [101030, 111716, 112771]
surface_list = [211128, 211730, 211136, 211920]

Since the HZB numbers are a country wide, universal identifyier, we can use them to download the wells (or stations) with those numbers from ehyd, like so:

    filename ='groundwater_309997.csv'
    url = 'http://ehyd.gv.at/eHYD/MessstellenExtraData/gw?id=309997&file=4'
    urllib.request.urlretrieve(url, filename)
    
, which we can of course automate with the lists we've set up above.
Note that `urllib.request.urlretrieve(url, filename)` is considered legacy and ["might become deprecated at some point in the future."](https://docs.python.org/dev/library/urllib.request.html#legacy-interface).
However, since this threat is now about 10 years old and this is the most simple way to download a file we'll just hope that this wont happen.

Unfortunately, the HZB number is not the only variable that changes in the url, so we either need quite a bit of string processing, or we simply run the downloader three times, once for each type of data.

In [29]:
# Set up a directory for the data to be downloaded into
os.mkdir('downloads')

In [30]:
# Let's loop over the three lists
for i in well_list:
    url = 'https://ehyd.gv.at/eHYD/MessstellenExtraData/gw?id={0}&file=4'.format(i)
    filename = ('downloads/GW_{0}.csv'.format(i))
    urllib.request.urlretrieve(url, filename)
for i in precip_list:
    url = 'https://ehyd.gv.at/eHYD/MessstellenExtraData/nlv?id={0}&file=2'.format(i)
    filename = ('downloads/P_{0}.csv'.format(i))
    urllib.request.urlretrieve(url, filename)
for i in surface_list:
    url = 'https://ehyd.gv.at/eHYD/MessstellenExtraData/owf?id={0}&file=7'.format(i)
    filename = ('downloads/S_{0}.csv'.format(i))
    urllib.request.urlretrieve(url, filename)

If you look into your downloads folder, you should see 27 files.
However, as of this writing, one of them (`GW_314864.csv`) has a size of 0 bytes, meaning that ehyd has no data for this well.
Besides the big issue of reproducible research, an empty file is of course rather hard to read in, so we have to delete it.
Also, note the `/gw?id=`, `/nlv?id=` and `owf?id=` parts of the urls.
Obviously, they specify what type of data we're asking from ehyd (`g`rund`w`asser \[groundw water\], `n`iederschlag \[precipitation\], `o`berflächen`w`asser \[surface water\]).
However, there is also the `&file=` part. 
The first assumption would be that this is another specifier for the type of data, with `7` = surface water and so on, but if you try the exact loops from above with other data, you will likely run into many 0 bytes files, since this parameter changes, depending on what kind of data is available from a station.
For the case of surface water, it only appears to be `file=7` for stations that have flow rates available.
For gauging stations that don't, `file=4` applies.
The same issue also applies to precipitation, where we can have `file=2`, `3` or `5`.

Since we don't have a proper API, we need to download the file first, test for its size and if it's empty, move on to the next file type, e.g.:

    for i in precip_list:
        url = 'https://ehyd.gv.at/eHYD/MessstellenExtraData/nlv?id={0}&file=2'.format(i)
        filename = ('downloads/P_{0}.csv'.format(i))
        urllib.request.urlretrieve(url, filename)
        if os.path.getsize(filename) == 0:
            os.remove(filename)
            url = 'https://ehyd.gv.at/eHYD/MessstellenExtraData/nlv?id={0}&file=3'.format(i)
            urllib.request.urlretrieve(url, filename)
            if os.path.getsize(filename) == 0:
                os.remove(filename)
                url = 'https://ehyd.gv.at/eHYD/MessstellenExtraData/nlv?id={0}&file=5'.format(i)
                urllib.request.urlretrieve(url, filename)
                # And a final remove if the station has vanished
                if os.path.getsize(filename) == 0:
                    os.remove(filename)
                    print('Station', i, 'does not exist anymore!)
        time.sleep(np.random.rand()/2)
        # And maybe some wait time so that we don't run into a DOS protection 
        # when we're trying to download a lot of files.
                    

Now, assuming we deleted the missing station, either by hand or by modifying the downloader above, we should have a `downloads` directory with 26 csv files, from about 23 to 555 KB in size.