# 1. Load and Filter the New Global Heat Flow (NGHF) Dataset
This notebook loads the heat flow data base in its raw format and filters according to the
quality criteria of the REHEATFUNQ model description paper. Accordingly, the NGHF data set
(Lucazeau, 2019) is used. If you would like to use a different data set, jump to the
[Save the Filtered Data Set](#Save-the-Filtered-Data-Set) section to learn abouit the required
format in which the data set needs to be saved.

To run this notebook, you need to download the NGHF data set of Lucazeau (2019) first. The necessary file
is `2019GC008389-sup-0004-Data_Set_SI-S02.zip` and can be downloaded [here](https://agupubs.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1029%2F2019GC008389&file=2019GC008389-sup-0004-Data_Set_SI-S02.zip). If this
link should not work anymore, the data set might be retrievable from the DOI listed below.


### Reference:
> Lucazeau, F. (2019). Analysis and mapping of an updated terrestrial heat
>    flow data set. Geochemistry, Geophysics, Geosystems, 20, 4001– 4024.
>    https://doi.org/10.1029/2019GC008389

From the ZIP file, you need to extract the `NGHF.csv` table and provide a working path to the file below.

In [None]:
nghf_file = 'data/NGHF.csv'

Configure plots to look good on a HiDPI monitor (you may not need the following configuration if you are not using a HiDPI monitor):

In [None]:
%config InlineBackend.figure_format = 'retina'

General imports used in this notebook:

In [None]:
import numpy as np
from pathlib import Path
from reheatfunq.data import read_nghf
import matplotlib.pyplot as plt

## Data Loading
Now we load this data base:

In [None]:
nghf_lon, nghf_lat, nghf_hf, nghf_quality, nghf_yr, nghf_type, \
nghf_max_depth, nghf_uncertainty, indexmap \
    = read_nghf(nghf_file)

Create NumPy arrays from some numeric data:

In [None]:
nghf_lon = np.array(nghf_lon)
nghf_lat = np.array(nghf_lat)
nghf_hf = np.array(nghf_hf)
nghf_yr = np.array(nghf_yr)

## Data Set Statistics

In [None]:
print("land:",np.count_nonzero([n == 'land' for n in nghf_type]))
print("ocean:",np.count_nonzero([n == 'ocean' for n in nghf_type]))
print("land A-C:",np.count_nonzero([n == 'land' and q in ('A','B','C')
                                    for n,q in zip(nghf_type,nghf_quality)]))

In [None]:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(nghf_yr[nghf_yr < 2030], bins=100);

### Filtering
Here, we choose all heat flow data lying on land. Also, we use only data that has quality of at least C assigned, and use data newer than 1960s (increase in measurement quality, see Lucazeau (2019)). Also we exclude geothermal data points using Lucazeau's empirical limit of $250 \,\mathrm{mW}/\mathrm{m}^2$.

In [None]:
# Only positive heat flow:
continental_mask_base = (nghf_hf > 0)

# Select only points on land:
continental_mask_base &= [n == 'land' for n in nghf_type]

# Quality selection: At least 'B' quality:
continental_mask_base &= [x in ('A','B') for x in nghf_quality]

# Select only data points from years 1990 till now:
continental_mask_base &= (nghf_yr <= 2020) & (nghf_yr >= 1990)

# Restricted heat flow, using Lucazeau (2019) empirical criterion (restricted to below 250 mW/m^2):
continental_mask_capped = (continental_mask_base & (nghf_hf < 250.))
continental_mask_full &= (nghf_hf < 250)

In [None]:
main_mask = continental_mask_capped

## Statistics of the Filtered Data Set

In [None]:
print("Final data set size:      ", np.count_nonzero(main_mask))
print("Minimum heat flow (mW/m²):", nghf_hf[main_mask].min())
print("Maximum heat flow (mW/m²):", nghf_hf[main_mask].max())
print("Average heat flow (mW/m²):", nghf_hf[main_mask].mean())

## Save the Filtered Data Set
Here, we export the data for further analysis in the other notebooks.

Ensure that all directories exist:

In [None]:
Path("data").mkdir(exist_ok=True)
Path("export").mkdir(exist_ok=True)

We use the `numpy.save` function to save a tuple `(hf, lon, lat)` to
the file `data/heat-flow-selection-mW_m2.npy`. If you wish to perform
the analysis of the following notebooks but load heat flow data from
another source or use custom data filtering, you could save to that
file. Make sure to adhere to the following characteristics:
 - `hf` should be a NumPy array of shape `(N,)` that lists the heat
   flow at the data points in $\mathrm{mW}/\mathrm{m}^2$
 - `lon` should be a NumPy array of shape `(N,)` listing the data
   point longitude coordinates in degrees
 - `lat` should be a NumPy array of shape `(N,)` listing the data
   point latitude coordinates in degrees
 - indices in the three arrays have to refer to the data points
   in equal order
 - all NumPy arrayse should be of double precision data type.

In [None]:
np.save('data/heat-flow-selection-mW_m2.npy',
        (nghf_hf[main_mask], nghf_lon[main_mask], nghf_lat[main_mask]))

Save a map from the filtered data set indices to indices in the original NGHF data base.

The map we save to `export/nghf-selection-indices.csv` contains one column for each data
point we saved in `data/heat-flow-selection-mW_m2.npy`. The entry in each column refers
to the row in `NGHF.csv` that the data point was read from.

In [None]:
used_indices = np.argwhere(main_mask)
final_index_map = [indexmap[int(i)] for i in used_indices]

with open('export/nghf-selection-indices.csv','w') as f:
    f.write(','.join(str(fi) for fi in final_index_map))

### License
```
A notebook to read and filter the NGHF data base.

This file is part of the REHEATFUNQ model.

Author: Malte J. Ziebarth (ziebarth@gfz-potsdam.de)

Copyright © 2019-2022 Deutsches GeoForschungsZentrum Potsdam,
            2022 Malte J. Ziebarth
            

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.
```