## Will there be frogs?

### Introduction
In many countries, an environmental impact study (EIS) is required to assess the potential impact of actions that significantly affect the quality of the human environment. EIS are very important but can be an expensive undertaking often involving the deployment of ecological experts to collect data. What if it was possible to do a pre-assessment of a project site? A pre-assessment could give an initial indication of areas to focus on and potentially shorten the the field time required. 

### Problem Definition
The impact of infrastructure project on amphibian populations forms part of an EIS. 

Can the presence of amphibians species near water reservoirs be predicted using features obtained from GS systems and sattelite images?

### Data 
The data consists of 16 input variables and a multi-label target variable with 7 possible values indicating the presence of a certain type of frog. 


#### Attribute Information

##### Inputs
- ID integer
- MV categorical
- SR numerical
- NR numerical
- TR categorical
- VR categorical
- SUR1 categorical
- SUR2 categorical
- SUR3 categorical
- UR categorical
- FR categorical
- OR categorical
- RR ordinal
- BR ordinal
- MR categorical
- CR categorical

##### Target
- Label 1: The presence of Green frogs 
- Label 2: The presence of Brown frogs
- Label 3: The presence of Common toad
- Label 4: The presence of Fire-bellied toad
- Label 5: The presence of Tree frog
- Label 6: The presence of Common newt
- Label 7: The presence of Great Crested newt


#### Source
The dataset was obtained from the UCI Machine Learning Repository: [Amphibians](https://archive.ics.uci.edu/ml/datasets/Amphibians)

Marcin Blachnik, Marek SoÅ‚tysiak, Dominika DÄ…browska Predicting presence of amphibian species using features obtained from GIS and satellite images. ISPRS International Journal of Geo-Information 8 (3) pp. 123. MDPI. 2019

### Problem Approach
This is a multi-label classification problem. The goal is to fit and train a calssification model capable of outputing a multi-label variable indicatiing the presence of one of 7 types of frogs. 

### Data Management

In [5]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

The data for both road projects is contained in a single csv file. The repository claims that there are no missing values. 

In [11]:
amphibians = pd.read_csv('amphibians.csv', sep=';', header=1, index_col='ID')
amphibians.head()

Unnamed: 0_level_0,Motorway,SR,NR,TR,VR,SUR1,SUR2,SUR3,UR,FR,...,BR,MR,CR,Green frogs,Brown frogs,Common toad,Fire-bellied toad,Tree frog,Common newt,Great crested newt
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,A1,600,1,1,4,6,2,10,0,0,...,0,0,1,0,0,0,0,0,0,0
2,A1,700,1,5,1,10,6,10,3,1,...,1,0,1,0,1,1,0,0,1,0
3,A1,200,1,5,1,10,6,10,3,4,...,1,0,1,0,1,1,0,0,1,0
4,A1,300,1,5,0,6,10,2,3,4,...,0,0,1,0,0,1,0,0,0,0
5,A1,600,2,1,4,10,2,6,0,0,...,5,0,1,0,1,1,1,0,1,1


In [13]:
amphibians.dtypes

Motorway              object
SR                     int64
NR                     int64
TR                     int64
VR                     int64
SUR1                   int64
SUR2                   int64
SUR3                   int64
UR                     int64
FR                     int64
OR                     int64
RR                     int64
BR                     int64
MR                     int64
CR                     int64
Green frogs            int64
Brown frogs            int64
Common toad            int64
Fire-bellied toad      int64
Tree frog              int64
Common newt            int64
Great crested newt     int64
dtype: object

In [18]:
# Check for missing values
amphibians.isna().sum()

Motorway              0
SR                    0
NR                    0
TR                    0
VR                    0
SUR1                  0
SUR2                  0
SUR3                  0
UR                    0
FR                    0
OR                    0
RR                    0
BR                    0
MR                    0
CR                    0
Green frogs           0
Brown frogs           0
Common toad           0
Fire-bellied toad     0
Tree frog             0
Common newt           0
Great crested newt    0
dtype: int64