# Introduction

This Biodiversity in National Parks Project aims to analyze multiple data sources to determine patterns and relationships between species conservation status and their sightings at major national parks in the United States. 

This project will scope, analyze, plot, and evaluate the data to provide information necessary to answer the questions below:

1. What is the distribution of conservation status across species?
2. Are certain species more likely to face endangerment than others?
3. Is there a significant difference between species and their conservation status?
4. What animal, if any, appears to be closest to facing exctinction at this time? What is it's distribution across parks?
5. Is there an animal that is more prevelant than others? Does this prevelance differ across parks?
 
Upon answering these questions, this project will address potential issues with the data and findings, as well as open discussion for improvements and future questions that can be answered with further data collection and analysis.

### Data

Provided by [Codecademy.com](https://www.codecademy.com/)
1. observations.csv
2. species_info.csv

# Scope
The project scope serves as a roadmap to guide the project's progress and the reader's understanding of it's goals and findings. The scope of this project has been broken down into four sections:
1. **Goals** - Define project objectives, potential problems and the actions needed to achieve and overcome them respectfully.
2. **Data** - Identify data and if it is relevant to achieve project goals. Address any concerns for reliability or needs for further data to augment future research.
3. **Analysis** - Determine how data will be analyzed, including methods used and plans for validating analysis to answer questions. 
4. **Evaluation** - Review analysis to build conclusions and discuss findings. 

## Goals
The main objective of this project is to provide insights about endagered species to the National Parks Service to aid in their efforts for biodiversity conservation. Analyst objectives include understanding the characteristics of different species/animals and their conservation status and how that compares to their distribution across the different parks.

Questions this project aims to answer include:
1. What is the distribution of conservation status across species?
2. Are certain species more likely to face endangerment than others?
3. Is there a significant difference between species and their conservation status?
4. What animal, if any, appears to be closest to facing exctinction at this time? What is it's distribution across parks?
5. Is there an animal that is more prevelant than others? Does this prevelance differ across parks?

## Data
This project includes two provided datasets. The first, observations.csv, lists the number of observations of an animal (by it's scientific name) in each national park over the last 7 days. The second, species_info.csv, includes the conservation status of different animals, both their common and scientific names, and the species they belong to. 

## Analysis
Using descriptive summary statistics and visualization techniques, this section will explore the provided data to detect patterns, uncover relationships, and provide a better understanding of the data. Statistical inference will be used to determine if the values are statistically significant to support any final conclusions. 

Key metrics to compute include:
1. Counts
2. Distributions
3. Conservation status of each species
4. Relationships between species (if any)
5. Observations of species in each park

## Evaluation
This section will revisit original goals and questions and compare to the final analysis to determine if our findings support our original objectives. Final conclusions and reflections will be drafted here. In addition, any concerns over accuracy or potential for augmentation and future research will be addressed. 

In [1]:
#import Python libraries and modules
import pandas as pd
import numpy as np

from matplotlib import pyplot as plt
import seaborn as sns

#inline lead to static images of your plot embedded in the notebook, 
#notebook will lead to interactive plots (see below)
%matplotlib inline

In [5]:
#load two provided datasets (csv) into dataframes
obs_df = pd.read_csv('observations.csv')
species_df = pd.read_csv('species_info.csv')
print(obs_df.head(5))
print('\n')
print(species_df.head(5))

            scientific_name                            park_name  observations
0        Vicia benghalensis  Great Smoky Mountains National Park            68
1            Neovison vison  Great Smoky Mountains National Park            77
2         Prunus subcordata               Yosemite National Park           138
3      Abutilon theophrasti                  Bryce National Park            84
4  Githopsis specularioides  Great Smoky Mountains National Park            85


  category                scientific_name  \
0   Mammal  Clethrionomys gapperi gapperi   
1   Mammal                      Bos bison   
2   Mammal                     Bos taurus   
3   Mammal                     Ovis aries   
4   Mammal                 Cervus elaphus   

                                        common_names conservation_status  
0                           Gapper's Red-Backed Vole                 NaN  
1                              American Bison, Bison                 NaN  
2  Aurochs, Aurochs, Domesti