# Brown Scholars Internship 2019-2020 - Urban Wildlife in NYC

New York City is home to many diverse species of wildlife that arrived or existed long before humans settled here.
In October 2016, Mayor Bill de Blasio launched WildlifeNYC, a citywide education and awareness campaign teaching New Yorkers how to live safely and responsibly alongside wild animals including deer, raccoons, and coyotes.

Urban wildlife is any wild animal that lives in an urban environment, such as New York City. Urban wildlife includes birds, mammals, reptiles, fish and amphibians. Some urban wildlife is native, like eastern grey squirrels, while some are non-native, like mute swans. Domesticated and companion animals, like dogs, exotic pets, and farm animals are not considered urban wildlife. Domesticated but feral animals like pigeons and stray cats are also not considered urban wildlife.

Using information about requests for animal assistance, relocation, and/or rescue completed by the Urban Park Rangers, we will attempt to get a better understanding of the different species of wildlife that call the Big Apple home.

Data source: https://data.cityofnewyork.us/Environment/Urban-Park-Ranger-Animal-Condition-Response/fuhs-xmg2

In [2]:
# First we'll start by importing packages we'll use

import numpy as np
import pandas as pd
pd.set_option('display.max_rows', 1000)

In [3]:
# Then import the data. For now, the csv file should be in the same directory as the notebook.
# Notice that we are importing the date and time info as type 'datetime'

data = pd.read_csv('Urban_Park_Ranger_Animal_Condition_Response.csv',
                   parse_dates = ['Date and Time of initial call', 'Date and time of Ranger response'])

Note: if you want to export the data, use df.to_csv(filename), where df is the name of your dataframe and filename is the name of the file where you want to save the data. The csv file will get created in the same directory as the notebook.

## Inspecting and cleaning the data

Now that the data is loaded, we want to get familiar with it. To learn more about what the data looks like we can try any of the following commands:
- data.head( ) - to look at the first 5 rows
- data.tail( ) - to look at the last 5 rows
- data.shape - to get the number of rows and columns
- data.info( ) - to get the names of the columns, how many non null pieces of data is in each column, and the type of data in each column
- data.nunique( ) - to get how many unique values are in each column
- data.max() - to get the highest value in each column
- data.min() to get the lowest value in each column
- data['col'].value_counts() - to get how many unique values are in a particular column

In [4]:
# Let's figure out how much data we have
data.shape
data.count()

Date and Time of initial call       982
Date and time of Ranger response    982
Borough                             982
Property                            982
Location                            918
Species Description                 969
Call Source                         982
Species Status                      968
Animal Condition                    758
Duration of Response                982
Age                                 982
Animal Class                        982
311SR Number                        573
Final Ranger Action                 982
# of Animals                        974
PEP Response                        980
Animal Monitored                    980
Rehabilitator                        53
Hours spent monitoring              120
Police Response                     980
ESU Response                        982
ACC Intake Number                   402
dtype: int64

In [5]:
# Let's figure out what kind of data we have
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 982 entries, 0 to 981
Data columns (total 22 columns):
Date and Time of initial call       982 non-null datetime64[ns]
Date and time of Ranger response    982 non-null datetime64[ns]
Borough                             982 non-null object
Property                            982 non-null object
Location                            918 non-null object
Species Description                 969 non-null object
Call Source                         982 non-null object
Species Status                      968 non-null object
Animal Condition                    758 non-null object
Duration of Response                982 non-null float64
Age                                 982 non-null object
Animal Class                        982 non-null object
311SR Number                        573 non-null object
Final Ranger Action                 982 non-null object
# of Animals                        974 non-null float64
PEP Response                        9

In [6]:
# It seems like we have some null values, but do we also have repeat values?

data.nunique()

Date and Time of initial call       903
Date and time of Ranger response    943
Borough                               5
Property                            194
Location                            867
Species Description                 145
Call Source                           6
Species Status                        4
Animal Condition                      4
Duration of Response                 29
Age                                   7
Animal Class                         18
311SR Number                        533
Final Ranger Action                   7
# of Animals                         10
PEP Response                          2
Animal Monitored                      2
Rehabilitator                        10
Hours spent monitoring               14
Police Response                       2
ESU Response                          2
ACC Intake Number                   399
dtype: int64

In [7]:
# It seems like all columns have repeat vaules, going from the names of the columns that makes sense.
# For example, there are 982 data in the Borough column but only 5 unique values.
# This is congruent with our knowledge that there are only 5 boroughs in NYC:
# Bronx, Brooklyn, Manahattan, Queens, Staten Island
# Make sure our assumption is correct
data["Borough"].value_counts()

Manhattan        475
Brooklyn         218
Queens           137
Staten Island     84
Bronx             68
Name: Borough, dtype: int64

By now, we also have a sense of which columns may have null values. It may be or not be ok for a column to have null values. One way to replace null values with some other value is using data.fillna(x) where x is the value we want instead of the null. It doesn't seems like this is necessary in this case.

In addition, the data may not be in 'standard' form, that is for example, having the strings 'yes', 'YES', and 'Yes' all be values contained in the same column. To verify that the data in a column is in 'standard' form, we can use data['column_name'].unique(). For example, what happens when we try data['Species Description'].unique()? What happens when we try data['Species Status'].unique()? To replace values, we can use data['column name'].replace('yes','Yes') to replace all 'yes' values with 'Yes' values (for example).

In [8]:
# Let's find out what the unique values are for "Property". We will sort them first so we can id misspelings
# hint: use data['Property'].sort_values().unique()
data["Property"].sort_values().unique()

array(['5 East 102nd St', '851 Fairmont Pl', 'Abingdon Square',
       'Alley Pond Park', 'Alley Pond Park Tennis Courts Near Court 16',
       'Allison Pond Park', 'Altamont House', 'Annadale Green',
       'Astoria Park', 'Baisley Pond Park', 'Bartel-Pritchard Square',
       'Battery Park City', 'Bedford Playground', 'Bellevue South Park',
       'Blood Root Valley', 'Bloomingdale Park', 'Blue Heron Park',
       'Bowling Green', 'Bowne Park', 'Bradys Pond Park', 'Bronx Park',
       'Brooklyn Bridge Park', 'Brookville Park', 'Bryant Park',
       'Bushwick Inlet', 'Bushwick Inlet Park', 'Cadman Plaza Park',
       'Calvert Vaux Park', 'Cambria Playground', 'Canarsie Park',
       'Captain Tilly Park', 'Carl Schurz Park', 'Cenral Park',
       'Centrail Park', 'Central  Park', 'Central Park', 'Chelsea Park',
       'Churchill School', 'Clove Lakes Park', 'Coffey Park',
       'College Point Park', 'Colonel Charles Young Playground',
       'Concrete Plant Park', 'Coney Island Beach 

In [9]:
# Seems like we need to make several corrections
# Here is the first one, you do the others
data['Property'].replace('Bushwick Inlet',"Bushwick Inlet Park",inplace=True)

In [10]:
# check to see corrections took place, there should only be 188 records now
data['Property'].sort_values().unique()

array(['5 East 102nd St', '851 Fairmont Pl', 'Abingdon Square',
       'Alley Pond Park', 'Alley Pond Park Tennis Courts Near Court 16',
       'Allison Pond Park', 'Altamont House', 'Annadale Green',
       'Astoria Park', 'Baisley Pond Park', 'Bartel-Pritchard Square',
       'Battery Park City', 'Bedford Playground', 'Bellevue South Park',
       'Blood Root Valley', 'Bloomingdale Park', 'Blue Heron Park',
       'Bowling Green', 'Bowne Park', 'Bradys Pond Park', 'Bronx Park',
       'Brooklyn Bridge Park', 'Brookville Park', 'Bryant Park',
       'Bushwick Inlet Park', 'Cadman Plaza Park', 'Calvert Vaux Park',
       'Cambria Playground', 'Canarsie Park', 'Captain Tilly Park',
       'Carl Schurz Park', 'Cenral Park', 'Centrail Park',
       'Central  Park', 'Central Park', 'Chelsea Park',
       'Churchill School', 'Clove Lakes Park', 'Coffey Park',
       'College Point Park', 'Colonel Charles Young Playground',
       'Concrete Plant Park', 'Coney Island Beach &amp; Boardwalk',


In [11]:
# Let's find out what the unique values are for "Species Description". We will sort them first so we can id misspelings
# hint: do something similar to what you did to find out the unique values for "Property"
data['Species Description'].sort_values().unique()

array(['2 black back gulls and 1 pigeon', 'Alligator snapping turtle',
       'American Goldfinch', 'American Oystercatcher', 'American Robin',
       'Banded Rock Pigeon', 'Bat', 'Bearded dragon', 'Big Brown Bat',
       'Bird/Unspecified Species', 'Black Racer Snake', 'Black Skimmer',
       'Black backed gull', 'Blue Jay', 'Boa Constrictor Snake', 'Brant',
       'Brant Goose', 'Canada Goose', 'Canada goose', 'Cat', 'Chicken',
       'Chinese Silky Chicken', 'Cockatiel', 'Common Snapping Turtle',
       "Cooper's Hawk", 'Coopers Hawk', 'Cormorant', 'Corn Snake',
       'Coyote', 'Deer', 'Dog', 'Dolphin', 'Domestic Dove',
       'Domestic Duck', 'Domestic Rabbit', 'Double crested cormorant',
       'Dove', 'Downy woodpecker', 'Duck', 'Duck (species unknown)',
       'Eastern Gray Squirrel', 'Egret', 'Falcon',
       'Fledgling (possibly Starling)', 'Freshwater Fish and Turtles',
       'Frog', 'Gerbil', 'Goose - White Pygmy', 'Gray squirrel',
       'Green Heron', 'Groundhog', 'Guine

In [14]:
# Seems like we need to make several corrections
# Here is the first one, you do the others
data['Species Description'].replace('Bird/Unspecified Species',"Bird",inplace=True)
data['Species Description'].replace('RACCOON', 'Raccoon', inplace=True)
data['Species Description'].replace('Racoon', 'Raccoon', inplace=True)
data['Species Description'].replace('squirrel','Squirrel',inplace=True)
data['Species Description'].replace('squirrel','Squirrel',inplace=True)
data['Species Description'].replace('Turtle/ Unspecified species','Turtle',inplace=True)
data['Species Description'].replace('Squirrells','Squirrel',inplace=True)
data['Species Description'].replace('Common Snapping Turtle','Turtle',inplace=True)


In [15]:
# check to see corrections took place, there should only be 110 records now
data['Species Description'].sort_values().unique()

array(['2 black back gulls and 1 pigeon', 'Alligator snapping turtle',
       'American Goldfinch', 'American Oystercatcher', 'American Robin',
       'Banded Rock Pigeon', 'Bat', 'Bearded dragon', 'Big Brown Bat',
       'Bird', 'Black Racer Snake', 'Black Skimmer', 'Black backed gull',
       'Blue Jay', 'Boa Constrictor Snake', 'Brant', 'Brant Goose',
       'Canada Goose', 'Canada goose', 'Cat', 'Chicken',
       'Chinese Silky Chicken', 'Cockatiel', "Cooper's Hawk",
       'Coopers Hawk', 'Cormorant', 'Corn Snake', 'Coyote', 'Deer', 'Dog',
       'Dolphin', 'Domestic Dove', 'Domestic Duck', 'Domestic Rabbit',
       'Double crested cormorant', 'Dove', 'Downy woodpecker', 'Duck',
       'Duck (species unknown)', 'Eastern Gray Squirrel', 'Egret',
       'Falcon', 'Fledgling (possibly Starling)',
       'Freshwater Fish and Turtles', 'Frog', 'Gerbil',
       'Goose - White Pygmy', 'Gray squirrel', 'Green Heron', 'Groundhog',
       'Guinea Pigs', 'Guineafowl', 'Gull', 'Harbor Porpois

In [16]:
# Finally, Let's find out what the unique values are for "Species Status".
data['Species Status'].sort_values().unique()

array(['Domestic', 'Exotic', 'Invasive', 'Native', nan], dtype=object)

At this point we could find answers to some basic questions we may have such as:
- NYC is home to what species of animals?
- NYC is home to what types of animals (species status)?
- Where in NYC could we find animals?

## Exploring the data

But what if we wanted to answer more complex questions such as,
- are there more native or invasive animals living in Manahttan?
- do all types ('species status') of animals live in all 5 boroughs of NYC?
- do different species of animals live in different properties?

To answer more complex questions we will try filtering and grouping the data. The decisions that we make when doing this can be based on our knowledge of the topic, our curiosity to learn from the data, as well as informed by what we learn from the data (or all three!).

### Filtering data : are there more native or invasive animals living in Manahattan?

To filter data, the following commands are useful:

- data[col] - to work only with one column
- data[data.col == 'value'] - to extract rows that meet a particular criteria
- data[(data[col] > value) & (data[col] < value)] - to extract rows that meet more than one criteria

In [17]:
# We'll create a new dataframe with only the data for Manahattan
manhattan = data[data.Borough == 'Manhattan']
manhattan.shape

(475, 22)

In [21]:
# What kind of species are in Manahattan
manhattan['Species Status'].unique()

array(['Native', 'Invasive', nan, 'Domestic', 'Exotic'], dtype=object)

In [22]:
# We only want to work with the species in Manhattan that are Native or Invasive
man_nat_inv = manhattan[(manhattan['Species Status'] == 'Native') | (manhattan['Species Status'] == 'Invasive')]
man_nat_inv.shape

(444, 22)

In [23]:
# To answer our question we want to know what number of spcies is native or invasive. What is the answer?
man_nat_inv['Species Status'].value_counts()

Native      441
Invasive      3
Name: Species Status, dtype: int64

### Grouping data : do all types (species status) live in all 5 boroughs?

To group data, the following commands are useful:
- data[[col1, col2]] - to work with only some columns
- data.groupby(col) - To group the data based on the values in one column
- data.groupby([col1,col2]) - To group the data based on the values in more than one column
- If we want to find out how big each group is, we can use use .size() to count the number of rows in each group.

In [25]:
# We'll create a new dataframe with only the data for 'Borough' and Species Status'
species_status = data[['Borough','Species Status']]
species_status.shape
species_status.head()

Unnamed: 0,Borough,Species Status
0,Manhattan,Native
1,Bronx,Native
2,Brooklyn,Exotic
3,Brooklyn,Domestic
4,Staten Island,Invasive


In [26]:
q = species_status.groupby('Borough')['Species Status'].value_counts()
q

Borough        Species Status
Bronx          Native             51
               Domestic           10
               Invasive            3
               Exotic              2
Brooklyn       Native            191
               Domestic           15
               Invasive            7
               Exotic              2
Manhattan      Native            441
               Domestic           18
               Exotic              7
               Invasive            3
Queens         Native            111
               Domestic           23
               Exotic              2
               Invasive            1
Staten Island  Native             71
               Domestic            9
               Invasive            1
Name: Species Status, dtype: int64

In [28]:
# Try switching 'Borough" and "Species Status" using the same command as above and see what happens.
# What answers can you infer from the analysis?
S = species_status.groupby('Species Status')['Borough'].value_counts()
S

Species Status  Borough      
Domestic        Queens            23
                Manhattan         18
                Brooklyn          15
                Bronx             10
                Staten Island      9
Exotic          Manhattan          7
                Bronx              2
                Brooklyn           2
                Queens             2
Invasive        Brooklyn           7
                Bronx              3
                Manhattan          3
                Queens             1
                Staten Island      1
Native          Manhattan        441
                Brooklyn         191
                Queens           111
                Staten Island     71
                Bronx             51
Name: Borough, dtype: int64

### Grouping data : do different species live in different properties (boroughs)?

In [29]:
# We'll create a new dataframe with only the data for 'Borough', Property' and 'Species Description'
species_desc = data[['Borough','Property','Species Description']]
species_desc.nunique()

Borough                  5
Property               193
Species Description    139
dtype: int64

In [31]:
g = species_desc.groupby(['Borough','Species Description'])['Property'].value_counts()
#g

In [36]:
h = species_desc.groupby(['Borough','Property'])['Species Description'].value_counts()
#h

In [40]:
# Try switching 'Borough", "Property" and "Species Description" using the same command as above and see what happens.
o = species_desc.groupby(['Property','Borough'])['Species Description'].value_counts()
u = species_desc.groupby(['Borough','Property'])['Species Description'].value_counts()
#o
#u
# What answers can you infer from the analysis?