# National Parks Service Biodiversity
Codecademy Portfolio Project by Leah Fulmer ([Github](https://github.com/leahmfulmer))<br>
With acknowledgements and gratitude to...

#### Project Objectives from Codecademy:

* Complete a project to add to your portfolio
* Use Jupyter Notebook to communicate findings
* Run an analysis on a set of data
* Become familiar with data analysis workflow

#### Table of Contents : COMPLETE IN POST
[Section 1: Loading and Examining the Data](#data)<br>
[Section 2: Wrangling and Tidying the Data](#tidy)<br>
[Section 3: Initial Data Exploration](#initial)<br>
[Section 4: Analysis by Age](#age)<br>
[Section 5: Analysis by BMI](#bmi)<br>
[Section 6: Next Steps](#next)<br>
[Section 7: Resting Place: Where Code Goes to Rest!](#rest)<br>

### Section 1: Loading and Examining the Data <a id="data"></a>

In [1]:
# Import modules

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
# Load data

observations = pd.read_csv("observations.csv")
species_info = pd.read_csv("species_info.csv")

In [3]:
# Examine observations

print("The dataset 'observations' contains {} rows and \
{} columns.".format(observations.shape[0], observations.shape[1]))

observations.head()
# observations.count()

The dataset 'observations' contains 23296 rows and 3 columns.


Unnamed: 0,scientific_name,park_name,observations
0,Vicia benghalensis,Great Smoky Mountains National Park,68
1,Neovison vison,Great Smoky Mountains National Park,77
2,Prunus subcordata,Yosemite National Park,138
3,Abutilon theophrasti,Bryce National Park,84
4,Githopsis specularioides,Great Smoky Mountains National Park,85


In [4]:
# Examine species_info

print("The dataset 'species_info' contains {} rows and \
{} columns.".format(species_info.shape[0], species_info.shape[1]))

species_info.head()
# species_info.count()

The dataset 'species_info' contains 5824 rows and 4 columns.


Unnamed: 0,category,scientific_name,common_names,conservation_status
0,Mammal,Clethrionomys gapperi gapperi,Gapper's Red-Backed Vole,
1,Mammal,Bos bison,"American Bison, Bison",
2,Mammal,Bos taurus,"Aurochs, Aurochs, Domestic Cattle (Feral), Dom...",
3,Mammal,Ovis aries,"Domestic Sheep, Mouflon, Red Sheep, Sheep (Feral)",
4,Mammal,Cervus elaphus,Wapiti Or Elk,


### Section 2: Wrangling and Tidying the Data<a id="tidy"></a>

In [5]:
# all datasets: make all rows with strings lower case

def make_lower_case(df):
    for column in df.columns:
        if type(df[column][0]) != np.int64:
            df[column] = df[column].apply(lambda x: x.lower() if type(x) != np.float else x)

make_lower_case(observations)
make_lower_case(species_info)

In [6]:
# observations: drop completely-duplicated rows

observations.drop_duplicates(subset=['scientific_name', 'park_name', 'observations'], inplace=True)
print("The dataset 'observations' now contains {} rows and \
{} columns.".format(observations.shape[0], observations.shape[1]))

The dataset 'observations' now contains 23281 rows and 3 columns.


In [7]:
# observations: sum observational instances for rows with duplicated scientific_name and park_name 

observations = observations.groupby(['scientific_name', 'park_name'])['observations'].sum().reset_index()
print("The dataset 'observations' now contains {} rows and \
{} columns.".format(observations.shape[0], observations.shape[1]))

The dataset 'observations' now contains 22164 rows and 3 columns.


In [18]:
# species_info: join common_names for duplicated scientific_name

test = species_info.groupby(['scientific_name'])['common_names'].apply(lambda x: ', '.join(x)).reset_index()
print("The dataset 'species_info' now contains {} rows and \
{} columns.".format(test.shape[0], test.shape[1]))
test.head()

The dataset 'species_info' now contains 5541 rows and 2 columns.


Unnamed: 0,scientific_name,common_names
0,abies bifolia,rocky mountain alpine fir
1,abies concolor,"balsam fir, colorado fir, concolor fir, silver..."
2,abies fraseri,fraser fir
3,abietinella abietina,abietinella moss
4,abronia ammophila,"wyoming sand verbena, yellowstone sand verbena"


In [9]:
# Are there any completely-duplicated rows in species_info?
duplicates = species_info[species_info.duplicated(subset="scientific_name")]
print(duplicates.shape)
duplicates.head()

test = species_info[species_info.scientific_name == 'puma concolor']
test.head(10)

(283, 4)


Unnamed: 0,category,scientific_name,common_names,conservation_status
16,mammal,puma concolor,panther (mountain lion),
3022,mammal,puma concolor,"cougar, mountain lion, puma",
4451,mammal,puma concolor,mountain lion,


In [10]:
duplicates = observations[observations.duplicated(subset=["scientific_name", 'park_name'])]
duplicates.head()

test = observations[observations.scientific_name == 'agrostis mertensii']
test.head(10)

Unnamed: 0,scientific_name,park_name,observations
444,agrostis mertensii,bryce national park,162
445,agrostis mertensii,great smoky mountains national park,141
446,agrostis mertensii,yellowstone national park,522
447,agrostis mertensii,yosemite national park,263


In [11]:
# Sum observations for rows with same scientific_name and park_name
obs = observations.groupby(['scientific_name', 'park_name'])['observations'].sum().reset_index()
print(obs.shape)
obs.head()

duplicates = obs[obs.duplicated(subset=["scientific_name", 'park_name'])]
duplicates.head()

test = observations[]

test = obs[obs.scientific_name == 'agrostis mertensii']
test.head(10)

SyntaxError: invalid syntax (<ipython-input-11-def0b05d3333>, line 9)

In [None]:
# Examine duplicates among scientific_name in species_info
duplicates = species_info[species_info.duplicated(subset=["scientific_name"])]
print(duplicates.shape)
duplicates.head()
test = species_info[species_info.scientific_name == 'canis lupus']
test.head()

In [None]:
# Combine data

combined = pd.merge(observations, species_info, on = ['scientific_name'], how = 'left')

print("The dataset 'combined' contains {} rows and \
{} columns.".format(combined.shape[0], combined.shape[1]))
combined.head()

In [None]:
# Examine
duplicates = combined[combined.duplicated()]
duplicates.head()
test = combined[combined.scientific_name == 'echinochloa crus-galli']
print(test.shape)
test.head(10)