## **Forest Covertype**

Task: **Classification** of forest land areas into 7 cover types based on attributes such as elevation, aspect, slope, hillshade, soil-type, and more.
Dataset available [here](https://archive.ics.uci.edu/dataset/31/covertype).
### **General Information**
Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. Independent variables were derived from data originally obtained from US Geological Survey (USGS) and USFS data.

This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices.

### **Input Data Information**
All input data is in raw form (not centered/scaled). Given below is the attribute name, data type, measurement unit and a brief description of all available input parameters.

|Name | Data Type | Measurement | Description|
|-----|-----------|-------------|------------|
|Elevation | quantitative |meters | Elevation in meters|
|Aspect | quantitative | azimuth | Aspect in degrees azimuth|
|Slope | quantitative | degrees | Slope in degrees|
|Horizontal_Distance_To_Hydrology | quantitative | meters | Horz Dist to nearest surface water features|
|Vertical_Distance_To_Hydrology | quantitative | meters | Vert Dist to nearest surface water features|
|Horizontal_Distance_To_Roadways | quantitative | meters | Horz Dist to nearest roadway|
|Hillshade_9am | quantitative | 0 to 255 index | Hillshade index at 9am, summer solstice|
|Hillshade_Noon | quantitative | 0 to 255 index | Hillshade index at noon, summer soltice|
|Hillshade_3pm | quantitative | 0 to 255 index | Hillshade index at 3pm, summer solstice|
|Horizontal_Distance_To_Fire_Points | quantitative | meters | Horz Dist to nearest wildfire ignition points|
|Wilderness_Area (4 binary columns) | qualitative | 0 (absence) or 1 (presence) | Wilderness area designation|
|Soil_Type (40 binary columns) | qualitative | 0 (absence) or 1 (presence) | Soil Type designation|
|Cover_Type (7 types) | integer | 1 to 7 | Forest Cover Type designation|

### **Class labels (Ground Truth)**
Ground truth data is provided as integers with the following mapping:

|Integer | Cover Type (English) | Cover Type (German) |
|-----|-----------|-----|
|1| Spruce/Fir | Fichte/Tanne |
|2| Lodgepole Pine | Küstenkiefer |
|3| Ponderosa Pine | Gelbkiefer |
|4| Cottonwood/Willow | Pappel/Weide |
|5| Aspen | Espe |
|6| Douglas-fir | Douglasie |
|7| Krummholz | Krummholz |

### **Task Description**

**Before you continue:** Execute this cell by clicking the play button.

Your task is to build machine learning (ML) pipelines for classification on different subsets of the forest cover type dataset using the user interface (UI) provided below.

The **UI** consists of **two sections**:
1. **Subset Selection:** Select a data subset (`Subset A`, `Subset B` or `Subset C`) using the drop down menu.
2. **ML Pipeline:** Build your pipeline using the settings provided in this section. 

#### Building your ML Pipeline
<img src="data/ml_pipeline.png" alt="Machine Learning Pipeline" width="700"/>

Your ML pipelines consist of the following steps:
- Data Preparation
- Input Feature Selection
- Model Selection
- Model Training & Hyperparameter Tuning

In each step, play around with the available settings and answer the respective questions in the [Bonus Points Quiz](https://learn.boku.ac.at/mod/quiz/view.php?id=1904109) on BOKU Learn.

**IMPORTANT:** Make sure to select the correct subset depending on the information given in each question of the quiz!


## Data Visualization

### Subset A

<img src="data/subset_a.png" alt="Subset A" width="1200"/>

### Subset B
<img src="data/subset_b.png" alt="Subset B" width="1200"/>

### Subset C
<img src="data/subset_c.png" alt="Subset C" width="1200"/>

In [7]:
%run example_definitions/lecture_10/hands-on_definitions.ipynb

VBox(children=(VBox(children=(VBox(children=(HTML(value='<h2>Forest Covertype Subset Loading and Visualization…