<img src="../../img/elevation.png" alt="" style="width: 100%; border-radius: 20px;"/>

In [2]:
%%HTML
<style>
    body {
        --vscode-font-family: "Itim";
        font color: #d8c0b5;
    }
</style>

## Objective
This notebook aims to compute the elevation of each bird sighting in the bird sightings provided from *ornitho.ch* and *ornitho.de*, based on the latitude and longitude coordinates.

We expect the incorporation of altitude information to provide additional decision support for our deployed models that detect implausible bird sightings. For instance, certain bird species are typically associated with specific altitude ranges; therefore, altitude data can serve as a valuable feature for validating the authenticity of a bird sighting.

## Scope
In this notebook, we aim to generate altitude values for bird sightings in Germany and Switzerland. The altitude values are inherently only given for the Swiss datasets. Based on the latitude and longitude values of the bird sighting, we can extract them for Germany as well. 

For this, three distinct approaches for computing altitude values were investigated:
- **API-based**
    - *api.open-elevation.com* and *api.opentopodata.org*: Utilizing an external API to obtain altitude information.
    - Earth Engine: Leveraging Earth Engine data for altitude calculations.
- **Local**
    - DEM file: Employing a local Digital Elevation Model (EU_DEM tif file) for altitude computation.

After evaluating the efficiency and accuracy of the three approaches, the local DEM file method was selected as the preferred choice due to its computational speed and satisfactory performance in altitude calculation.

Additionally, an initial assessment is provided to determine the possible effectiveness of altitude values in supporting the detection of implausible bird data. In this context, we investigate the distinctive altitude ranges of two exemplatory bird species, i.e. the water pipit and White-tailed eagle.

## Output data
The resulting master dataset with elevation data is stored as selected_bird_species_with_grids.csv [here](https://drive.google.com/drive/folders/18XoTsDtWnN4QdIBNGGbq-jaa6U3nVb2e).

In [1]:
import sys
sys.path.append('../')

import pandas as pd
import ee
from utils.copernicus import CopernicusDEM
import requests

In [2]:
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', False)

# 
<p style="background-color:#4A3228;color:white;font-size:240%;border-radius:10px 10px 10px 10px;"> &nbsp; 0️⃣ Specify your paths </p>

In order to run the notebook, the following file is required:
- Master dataset reduced to 27 species and enriched with EEA grids: *[selected_species_with_50km_grids.csv](https://drive.google.com/drive/folders/18XoTsDtWnN4QdIBNGGbq-jaa6U3nVb2e)*
- European Digital Elevation Model (EU-DEM); downloadable [here](https://www.eea.europa.eu/en/datahub/datahubitem-view/d08852bc-7b5f-4835-a776-08362e2fbf4b). It should contain the files *eu_dem_v11_E40N20.TIF* and *eu_dem_v11_E40N30.TIF*

If you wish to store the resulting dataframe enriched with elevation data, please specify a target path where it should be stored.

In [None]:
df_path = 'D:\Simon\Documents\GP\data\master_train.csv'  # please provide path to selected_bird_species_with_grids_50km.csv
eu_dem_path = 'D:\\Simon\\Documents\\GP\\data\\util_files\\EU_DEM\\'  # please provide path to the EU_DEM folder

target_path = 'D:\\Simon\\Documents\\GP\\data\\datasets\\selected_bird_species_50km_elevation.csv'  # please provide path where to save the dataset enriched with elevation data

# 
<p style="background-color:#4A3228;color:white;font-size:240%;border-radius:10px 10px 10px 10px;"> &nbsp; 1️⃣ Load data </p>

In this notebook, we compute the elevation data for all given sightings from 2018-2022.

There is already a column `altitude` which provide elevation information for all sightings in Switzerland. The German dataset only contains NaN values.

Although altitude values are present in the Swiss dataset, they are recalculated to ensure consistency. Therefore, the `altitude` column is dropped.

In [9]:
df = pd.read_csv(df_path, index_col=0, low_memory=False)
df.drop(columns=['altitude'], inplace=True)
df.head(5)

Unnamed: 0,id_sighting,id_species,name_species,date,coord_lat,coord_lon,altitude,total_count,atlas_code,country,eea_grid_id,urban_area_percent,industrial_area_percent,agriculture_area_percent,forest_area_percent,grassland_area_percent,shrubland_area_percent,coastal_area_percent,rocky_area_percent,sparsley_vegetated_area_percent,burnt_area_percent,glacier_area_percent,wetlands_area_percent,water_area_percent,land_use_coord,land_use_coord_numeric,unclassified_area_percent
0,29666972,8,Haubentaucher,2018-01-01,53.15776,8.676993,-1,0,,de,50kmE4200N3300,0.191298,0.0,0.39956,0.0,0.0,0.021766,0.0,0.0,0.0,0.0,0.0,0.0,0.387377,water,14,0.0
1,29654244,397,Schwarzkehlchen,2018-01-01,53.127639,8.957263,1,2,,de,50kmE4250N3300,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,agriculture,3,0.0
2,29654521,463,Wiesenpieper,2018-01-01,50.850941,12.146953,269,2,,de,50kmE4450N3050,0.405476,0.0,0.594524,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,urban,1,0.0
3,29666414,8,Haubentaucher,2018-01-01,51.076006,11.038316,157,8,,de,50kmE4350N3100,0.0,0.004445,0.376111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.619445,water,14,0.0
4,29656211,8,Haubentaucher,2018-01-01,51.38938,7.067282,52,10,,de,50kmE4100N3100,0.362214,0.0,0.02402,0.211704,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.402062,urban,1,0.0


# 
<p style="background-color:#4A3228;color:white;font-size:240%;border-radius:10px 10px 10px 10px;"> &nbsp; 2️⃣ Elevation generation </p>

To calculate the altitude for each bird sighting based on the bird's location, three distinct approaches are employed:



The first two approaches involve API-based methods, specifically utilizing public API's specializing on altitude data, as well as Google Earth Engine, leveraging external online resources for altitude information. 

The third approach involves local computation, utilizing a local Digital Elevation Model (DEM) file, specifically the *EU_DEM* from the European Environment Agency*, for altitude calculations.

## 1st approach: Public API's with no registration requirement
*api.open-elevation.com* and *api.opentopodata.org* are prominent external APIs employed for obtaining elevation data in scientific and geospatial analyses. They do not require any authentication or registration, making it accessible for other developers to run this script.

Api.open-elevation.com serves as an elevation data service, offering a straightforward API for querying elevation information based on latitude and longitude coordinates. It utilizes global datasets to provide elevation values, making it accessible for a wide range of locations. 

On the other hand, api.opentopodata.org is a versatile API designed to retrieve various topographic data, including elevation, slope, and aspect. It supports multiple data sources and formats, allowing users to customize queries based on specific topographic parameters.

### Generation
In order to generate elevation data, both API's require the latitude and longitude value of the bird sighting. As there is so option to pass a batch or list of multiple sightings at once, the API's must be queried once for each sighting.

In [None]:
def get_elevation(lat, long):
    query = f'https://api.open-elevation.com/api/v1/lookup?locations={lat},{long}'
    r = requests.get(query).json()
    print(r)
    elevation = r['results'][0]['elevation']
    return elevation

In [None]:
def get_elevation2(lat, long):
    query = f'https://api.opentopodata.org/v1/eudem25m?locations={lat},{long}'
    r = requests.get(query).json()
    elevation = r['results'][0]['elevation']
    return elevation

In [None]:
df['altitude'] = df.head(10).apply(lambda row: get_elevation(row['coord_lat'], row['coord_lon']), axis=1)

### Issues
The public APIs, *api.open-elevation.com* and *api.opentopodata.org*, deliver accurate results for the dataset. Yet, with a growing number of API calls, the time taken for each call also increases gradually. As a result, for the extensive dataset at hand, using either of these APIs becomes impractical due to the extended processing time for each data point.

Therefore, the public API's are considered not feasible for generation elevation data for the bird sightings. 

## 2nd approach: *Earth Engine*
<img src="https://www.appgeo.com/wp-content/uploads/GoogleEarthEngine.png" alt="Google Earth Engine Icon" style="border-radius: 7px; width: 400px; height: auto;">

Earth Engine is a cloud-based platform developed by Google that provides access to a vast amount of geospatial data and computational resources for analyzing and visualizing Earth's surface.

In the context of extracting elevation data based on latitude and longitude values for Europe, Earth Engine can be utilized to access and process high-resolution digital elevation models (DEMs) and terrain data. DEMs represent the topography of the Earth's surface, providing information about elevation at different geographic locations.

Contrary to *api.open-elevation.com* and *api.opentopodata.org*, Earth Engine utilizes Google's cloud infrastructure, providing significant computational resources for large-scale geospatial analyses. It is designed to handle massive datasets and complex computations efficiently, which may leverage the scalability issues that emerged when using *api.open-elevation.com* and *api.opentopodata.org*.


### Generation
The process of querying an altitude value for a sighting is similar to the the public API's *api.open-elevation.com* and *api.opentopodata.org* - Earth Engine also requires separate API requests for each individual bird sighting.

<span style="color:red;">In order to query the Earth Engine, you must... [describe registration and authentication]</span>

In [None]:
# Initialize the Earth Engine
ee.Authenticate()
ee.Initialize(project='ee-simonbirk')

def get_elevation(lat, lon):
    # Create a point with the coordinates
    point = ee.Geometry.Point([lon, lat])

    # Get the elevation data
    dataset = ee.Image('USGS/SRTMGL1_003')

    # Sample the point and get the elevation
    elevation = dataset.sample(point, 30).first().get('elevation').getInfo()
    return elevation

df['altitude'] = df.apply(lambda row: get_elevation(row['coord_lat'], row['coord_lon']), axis=1)

### Issues
The shift from the two public API's to Earth Engine was expected to address previously encountered computational inefficiencies. Despite Earth Engine providing a robust geospatial analysis platform with extensive computational resources, the processing duration remains too high for our use case.

## 3rd approach: *European Digital Elevation Model* by EEA

<img src="https://www.eea.europa.eu/logo.jpg" alt="EEA Icon" style="border-radius: 7px; width: 400px; height: auto;">


Due to prolonged computation times experienced with both registration-free API's and Earth Engine in extracting elevation data, it is opted to transition to a local computation approach. Despite the necessity to download files for this method, it holds the potential to address the issue of excessive processing times, as computations are now conducted locally rather than through an API.

For this purpose, we are utilizing *EU_DEM* files. The EU_DEM (European Digital Elevation Model) is provided by the European Environment Agency (EEA). The EEA is an agency of the European Union that focuses on providing environmental information to support policy development and implementation in Europe. It is part of the Copernicus program, specifically managed by the Copernicus Land Monitoring Service (CLMS) [1]. 

The elevation data in the EU_DEM is derived from various sources, including satellite data and airborne measurements [1], and is processed to create a high-quality digital representation of the Earth's surface for the European continent. It is freely available for download and use.



### Generation
To retrieve the elevation data, EEA provides a module that allows to compute all elevation data for a dataframe at once. For more detailed code information, please visit *../utils/copernicus.py*.

In [6]:
copernicus = CopernicusDEM(raster_paths=[eu_dem_path + 'eu_dem_v11_E40N20.TIF', eu_dem_path + 'eu_dem_v11_E40N30.TIF'])
df = copernicus.get_elevation(df, lat_col='coord_lat', lon_col='coord_lon')

In [8]:
df['altitude'] = df['elevation']
df.drop(columns=['elevation'], inplace=True)

### Issues

#### Issue: Incorrect altitude values due to EU_DEM borders

In [None]:
print(df.altitude.value_counts())

In [None]:
min_altitude = df.altitude.min()
df_min_altitude = df[df.altitude == min_altitude]
df_min_altitude.head(5)

#### Solution: Incorporate Earth Engine

In [None]:
df.loc[df['altitude'] == df.altitude.min(), 'altitude'] = 0

In [None]:
print(df.altitude.value_counts())

## Conclusion


## Outlook

# 
<p style="background-color:#4A3228;color:white;font-size:240%;border-radius:10px 10px 10px 10px;"> &nbsp; 3️⃣ Add elevation feature to the dataset</p>

In [16]:
df_path = 'D:\\Simon\\Documents\\GP\\data\\datasets\\selected_bird_species_with_grids_50km.csv'
df = pd.read_csv(df_path, index_col=0)
df.reset_index(inplace=True, drop=True)

  df = pd.read_csv(df_path, index_col=0)


In [17]:
df_path_alt = 'D:\\Simon\\Documents\\GP\\data\\datasets\\altitude\\dbird_export_altitude_202312140925.csv'
df_alt = pd.read_csv(df_path_alt, sep=';')

In [None]:
df_alt.head(3)

In [18]:
df_de = df[df['country'] == 'de']

merged_df_de = df_de.merge(df_alt, left_on='id_sighting', right_on='ornitho_id', how='left')
merged_df_de.drop(columns=['ornitho_id', 'altitude_x'], inplace=True)
merged_df_de.rename(columns={'altitude_y': 'altitude'}, inplace=True)

df.update(merged_df_de)

In [19]:
df.head(3)

Unnamed: 0,id_sighting,id_species,name_species,date,timing,coord_lat,coord_lon,precision,altitude,total_count,atlas_code,id_observer,country,eea_grid_id
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1,0.0,,37718.0,de,50kmE4200N3300
1,29654244,397.0,Schwarzkehlchen,2018-01-01,,53.127639,8.957263,square,1,2.0,,37803.0,de,50kmE4250N3300
2,29654521,463.0,Wiesenpieper,2018-01-01,,50.850941,12.146953,place,269,2.0,,39627.0,de,50kmE4450N3050


In [20]:
print(df.altitude.value_counts())

altitude
 0       115450
 430     88178 
-1       51270 
-2       43926 
 400     38170 
         ...   
 2893    1     
 3262    1     
 3124    1     
 3001    1     
 2971    1     
Name: count, Length: 2899, dtype: int64


In [21]:
min_altitude = df.altitude.min()
df_min_altitude = df[df.altitude == min_altitude]
df_min_altitude.head(5)

Unnamed: 0,id_sighting,id_species,name_species,date,timing,coord_lat,coord_lon,precision,altitude,total_count,atlas_code,id_observer,country,eea_grid_id
172472,57776355,463.0,Wiesenpieper,2021-05-03,12:08,50.958478,6.674523,precise,-34,1.0,,84909.0,de,50kmE4050N3050
172473,57790281,463.0,Wiesenpieper,2021-05-03,12:08,50.958478,6.674523,precise,-34,1.0,,88575.0,de,50kmE4050N3050


In [23]:
df_path = 'D:\\Simon\\Documents\\GP\\data\\datasets\\selected_bird_species_with_grids_50km.csv'
# df_path = 'D:\\Simon\\Documents\\GP\\data\\datasets\\selected_bird_species_50km_elevation.csv'
df.to_csv(df_path)

# 
<p style="background-color:#4A3228;color:white;font-size:240%;border-radius:10px 10px 10px 10px;"> &nbsp; 4️⃣ Initial assessment of the elevation feature: Possible degree of decision support </p>

Water pipit und Seeadler -> x-axis datum, y-axis elevatino; daran kann man sehen, dass species-specific elevation patterns auftreten; z.B. water pipit: fliegt im Sommer in höhere Lagen; Seeadler ist immer niedrig; daher kann man ausreißer schon an der elevation data allein feststellen, d.h. das könnte ein wichtiges feature in der outlier detection sein.

# References
[1]  European Environment Agency. "European Digital Elevation Model (EU-DEM), Version 1.1." (2016).