***

# An Analysis on the Biodiversity of the National Parks

![park](https://api.time.com/wp-content/uploads/2016/08/gettyimages-535829001.jpg?quality=85&w=800)

***

## Introduction

This study aims at analyzing wldlife data from the national parks, investigating trends and drawing conclusions from the data provided.
The study tries to answer the following initial questions:
* How well preserved are the species in National Parks? Are there any endangered species? Which kind of categories are the most vulnerable?
* How are the species categories distributed among the parks? Do parks concentrate its species in some few categories? Which parks cointain the most vulnerable species?
* What is the most observed category? And the most observed species? What are the most difficult species to observe?

During the development of the study, other questions, not previously foreseen, may arise.

***

## Package Import, Data Import, Cleaning and Table Joining

In [1]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

In [2]:
obs_df = pd.read_csv('observations.csv')
species_df = pd.read_csv('species_info.csv')

In [3]:
obs_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23296 entries, 0 to 23295
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   scientific_name  23296 non-null  object
 1   park_name        23296 non-null  object
 2   observations     23296 non-null  int64 
dtypes: int64(1), object(2)
memory usage: 546.1+ KB


In [4]:
obs_df.head()

Unnamed: 0,scientific_name,park_name,observations
0,Vicia benghalensis,Great Smoky Mountains National Park,68
1,Neovison vison,Great Smoky Mountains National Park,77
2,Prunus subcordata,Yosemite National Park,138
3,Abutilon theophrasti,Bryce National Park,84
4,Githopsis specularioides,Great Smoky Mountains National Park,85


In [5]:
species_df.head()

Unnamed: 0,category,scientific_name,common_names,conservation_status
0,Mammal,Clethrionomys gapperi gapperi,Gapper's Red-Backed Vole,
1,Mammal,Bos bison,"American Bison, Bison",
2,Mammal,Bos taurus,"Aurochs, Aurochs, Domestic Cattle (Feral), Dom...",
3,Mammal,Ovis aries,"Domestic Sheep, Mouflon, Red Sheep, Sheep (Feral)",
4,Mammal,Cervus elaphus,Wapiti Or Elk,


In [6]:
species_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5824 entries, 0 to 5823
Data columns (total 4 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   category             5824 non-null   object
 1   scientific_name      5824 non-null   object
 2   common_names         5824 non-null   object
 3   conservation_status  191 non-null    object
dtypes: object(4)
memory usage: 182.1+ KB


In [7]:
species_df = species_df.fillna(value = {'conservation_status': 'No Concern'})
species_df.head()

Unnamed: 0,category,scientific_name,common_names,conservation_status
0,Mammal,Clethrionomys gapperi gapperi,Gapper's Red-Backed Vole,No Concern
1,Mammal,Bos bison,"American Bison, Bison",No Concern
2,Mammal,Bos taurus,"Aurochs, Aurochs, Domestic Cattle (Feral), Dom...",No Concern
3,Mammal,Ovis aries,"Domestic Sheep, Mouflon, Red Sheep, Sheep (Feral)",No Concern
4,Mammal,Cervus elaphus,Wapiti Or Elk,No Concern


In [8]:
print(species_df.groupby('conservation_status').scientific_name.count())

conservation_status
Endangered              16
In Recovery              4
No Concern            5633
Species of Concern     161
Threatened              10
Name: scientific_name, dtype: int64


In [19]:
# Ordinal Categorical Data (Conservation Status) - Ordering Statuses
order = ['No Concern', 'Species of Concern', 'In Recovery', 'Threatened', 'Endangered']

In [9]:
species_df['concern'] = species_df['conservation_status'].apply(lambda x: False if x == 'No Concern' else True)

In [10]:
print(species_df.groupby('concern').scientific_name.count())
species_df.head()

concern
False    5633
True      191
Name: scientific_name, dtype: int64


Unnamed: 0,category,scientific_name,common_names,conservation_status,concern
0,Mammal,Clethrionomys gapperi gapperi,Gapper's Red-Backed Vole,No Concern,False
1,Mammal,Bos bison,"American Bison, Bison",No Concern,False
2,Mammal,Bos taurus,"Aurochs, Aurochs, Domestic Cattle (Feral), Dom...",No Concern,False
3,Mammal,Ovis aries,"Domestic Sheep, Mouflon, Red Sheep, Sheep (Feral)",No Concern,False
4,Mammal,Cervus elaphus,Wapiti Or Elk,No Concern,False


In [15]:
# Merging both tables in one, since they have scientific_name as a common column
compound_df = pd.merge(obs_df, species_df)

In [16]:
compound_df.head()

Unnamed: 0,scientific_name,park_name,observations,category,common_names,conservation_status,concern
0,Vicia benghalensis,Great Smoky Mountains National Park,68,Vascular Plant,"Purple Vetch, Reddish Tufted Vetch",No Concern,False
1,Vicia benghalensis,Yosemite National Park,148,Vascular Plant,"Purple Vetch, Reddish Tufted Vetch",No Concern,False
2,Vicia benghalensis,Yellowstone National Park,247,Vascular Plant,"Purple Vetch, Reddish Tufted Vetch",No Concern,False
3,Vicia benghalensis,Bryce National Park,104,Vascular Plant,"Purple Vetch, Reddish Tufted Vetch",No Concern,False
4,Neovison vison,Great Smoky Mountains National Park,77,Mammal,American Mink,No Concern,False


In [17]:
compound_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 25632 entries, 0 to 25631
Data columns (total 7 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   scientific_name      25632 non-null  object
 1   park_name            25632 non-null  object
 2   observations         25632 non-null  int64 
 3   category             25632 non-null  object
 4   common_names         25632 non-null  object
 5   conservation_status  25632 non-null  object
 6   concern              25632 non-null  bool  
dtypes: bool(1), int64(1), object(5)
memory usage: 1.4+ MB


In [21]:
# Ordering Categorical Data (Conservation Status)
compound_df['conservation_status'] = pd.Categorical(compound_df.conservation_status, order, ordered = True)

***

## Summary Statistics