# Brown Scholars Internship 2019-2020 - Urban Wildlife in NYC

New York City is home to many diverse species of wildlife that arrived or existed long before humans settled here.
In October 2016, Mayor Bill de Blasio launched WildlifeNYC, a citywide education and awareness campaign teaching New Yorkers how to live safely and responsibly alongside wild animals including deer, raccoons, and coyotes.

Urban wildlife is any wild animal that lives in an urban environment, such as New York City. Urban wildlife includes birds, mammals, reptiles, fish and amphibians. Some urban wildlife is native, like eastern grey squirrels, while some are non-native, like mute swans. Domesticated and companion animals, like dogs, exotic pets, and farm animals are not considered urban wildlife. Domesticated but feral animals like pigeons and stray cats are also not considered urban wildlife.

Using information about requests for animal assistance, relocation, and/or rescue completed by the Urban Park Rangers, we will attempt to get a better understanding of the different species of wildlife that call the Big Apple home.

Data source: https://data.cityofnewyork.us/Environment/Urban-Park-Ranger-Animal-Condition-Response/fuhs-xmg2

In [1]:
# First we'll start by importing packages we'll use

import numpy as np
import pandas as pd
pd.set_option('display.max_rows', 1000)

In [3]:
# Then import the data. For now, the csv file should be in the same directory as the notebook.
# Notice that we are importing the date and time info as type 'datetime'

data = pd.read_csv('Urban_Park_Ranger_Animal_Condition_Response.csv',
                   parse_dates = ['Date and Time of initial call', 'Date and time of Ranger response'])

Note: if you want to export the data, use df.to_csv(filename), where df is the name of your dataframe and filename is the name of the file where you want to save the data. The csv file will get created in the same directory as the notebook.

## Inspecting and cleaning the data

Now that the data is loaded, we want to get familiar with it. To learn more about what the data looks like we can try any of the following commands:
- data.head( ) - to look at the first 5 rows
- data.tail( ) - to look at the last 5 rows
- data.shape - to get the number of rows and columns
- data.info( ) - to get the names of the columns, how many non null pieces of data is in each column, and the type of data in each column
- data.nunique( ) - to get how many unique values are in each column
- data.max() - to get the highest value in each column
- data.min() to get the lowest value in each column
- data['col'].value_counts() - to get how many unique values are in a particular column

In [4]:
# Let's figure out how much data we have


In [5]:
# Let's figure out what kind of data we have


In [6]:
# It seems like we have some null values, but do we also have repeat values?


In [7]:
# It seems like all columns have repeat vaules, going from the names of the columns that makes sense.
# For example, there are 982 data in the Borough column but only 5 unique values.
# This is congruent with our knowledge that there are only 5 boroughs in NYC:
# Bronx, Brooklyn, Manahattan, Queens, Staten Island
# Make sure our assumption is correct

By now, we also have a sense of which columns may have null values. It may be or not be ok for a column to have null values. One way to replace null values with some other value is using data.fillna(x) where x is the value we want instead of the null. It doesn't seems like this is necessary in this case.

In addition, the data may not be in 'standard' form, that is for example, having the strings 'yes', 'YES', and 'Yes' all be values contained in the same column. To verify that the data in a column is in 'standard' form, we can use data['column_name'].unique(). For example, what happens when we try data['Species Description'].unique()? What happens when we try data['Species Status'].unique()? To replace values, we can use data['column name'].replace('yes','Yes') to replace all 'yes' values with 'Yes' values (for example).

In [8]:
# Let's find out what the unique values are for "Property". We will sort them first so we can id misspelings
# hint: use data['Property'].sort_values().unique()

In [9]:
# Seems like we need to make several corrections
# Here is the first one, you do the others
data['Property'].replace('Bushwick Inlet',"Bushwick Inlet Park",inplace=True)

In [10]:
# check to see corrections took place, there should only be 188 records now

In [11]:
# Let's find out what the unique values are for "Species Description". We will sort them first so we can id misspelings
# hint: do something similar to what you did to find out the unique values for "Property"

In [12]:
# Seems like we need to make several corrections
# Here is the first one, you do the others
data['Species Description'].replace('Bird/Unspecified Species',"Bird",inplace=True)

In [13]:
# check to see corrections took place, there should only be 110 records now

In [14]:
# Finally, Let's find out what the unique values are for "Species Status".

At this point we could find answers to some basic questions we may have such as:
- NYC is home to what species of animals?
- NYC is home to what types of animals (species status)?
- Where in NYC could we find animals?

## Exploring the data

But what if we wanted to answer more complex questions such as,
- are there more native or invasive animals living in Manahttan?
- do all types ('species status') of animals live in all 5 boroughs of NYC?
- do different species of animals live in different properties?

To answer more complex questions we will try filtering and grouping the data. The decisions that we make when doing this can be based on our knowledge of the topic, our curiosity to learn from the data, as well as informed by what we learn from the data (or all three!).

### Filtering data : are there more native or invasive animals living in Manahattan?

To filter data, the following commands are useful:

- data[col] - to work only with one column
- data[data.col == 'value'] - to extract rows that meet a particular criteria
- data[(data[col] > value) & (data[col] < value)] - to extract rows that meet more than one criteria

In [15]:
# We'll create a new dataframe with only the data for Manahattan
manhattan = data[data.Borough == 'Manhattan']
manhattan.shape

(475, 22)

In [16]:
# What kind of species are in Manahattan

In [17]:
# We only want to work with the species in Manhattan that are Native or Invasive
man_nat_inv = manhattan[(manhattan['Species Status'] == 'Native') | (manhattan['Species Status'] == 'Invasive')]
man_nat_inv.shape

(444, 22)

In [18]:
# To answer our question we want to know what number of spcies is native or invasive. What is the answer?
man_nat_inv['Species Status'].value_counts()

Native      441
Invasive      3
Name: Species Status, dtype: int64

### Grouping data : do all types (species status) live in all 5 boroughs?

To group data, the following commands are useful:
- data[[col1, col2]] - to work with only some columns
- data.groupby(col) - To group the data based on the values in one column
- data.groupby([col1,col2]) - To group the data based on the values in more than one column
- If we want to find out how big each group is, we can use use .size() to count the number of rows in each group.

In [19]:
# We'll create a new dataframe with only the data for 'Borough' and Species Status'
species_status = data[['Borough','Species Status']]
species_status.shape

(982, 2)

In [20]:
q = species_status.groupby('Borough')['Species Status'].value_counts()
q

In [21]:
# Try switching 'Borough" and "Species Status" using the same command as above and see what happens.
# What answers can you infer from the analysis?

### Grouping data : do different species live in different properties (boroughs)?

In [22]:
# We'll create a new dataframe with only the data for 'Borough', Property' and 'Species Description'
species_desc = data[['Borough','Property','Species Description']]
species_desc.nunique()

Borough                  5
Property               193
Species Description    145
dtype: int64

In [23]:
g = species_desc.groupby(['Borough','Species Description'])['Property'].value_counts()
g

In [24]:
h = species_desc.groupby(['Borough','Property'])['Species Description'].value_counts()
h

In [None]:
# Try switching 'Borough", "Property" and "Species Description" using the same command as above and see what happens.
# What answers can you infer from the analysis?