# Brown Scholars Internship 2019-2020 - Urban Wildlife in NYC

New York City is home to many diverse species of wildlife that arrived or existed long before humans settled here.
In October 2016, Mayor Bill de Blasio launched WildlifeNYC, a citywide education and awareness campaign teaching New Yorkers how to live safely and responsibly alongside wild animals including deer, raccoons, and coyotes.

Urban wildlife is any wild animal that lives in an urban environment, such as New York City. Urban wildlife includes birds, mammals, reptiles, fish and amphibians. Some urban wildlife is native, like eastern grey squirrels, while some are non-native, like mute swans. Domesticated and companion animals, like dogs, exotic pets, and farm animals are not considered urban wildlife. Domesticated but feral animals like pigeons and stray cats are also not considered urban wildlife.

Data source: https://data.cityofnewyork.us/Environment/Urban-Park-Ranger-Animal-Condition-Response/fuhs-xmg2

First we'll start by importing packages we'll use

In [1]:
import pandas as pd

and then import the data. For now, the csv file should be in the same directory as the notebook. Notice that we are importing the date and time info as type 'datetime'

In [2]:
data = pd.read_csv('Urban_Park_Ranger_Animal_Condition_Response.csv',
                   parse_dates = ['Date and Time of initial call', 'Date and time of Ranger response'])

Note: if you want to export the data, use df.to_csv(filename), where df is the name of your dataframe and filename is the name of the file where you want to save the data. The csv file will get created in the same directory as the notebook.

#### Step 1: Viewing and inspecting the data

Now that the data is loaded, let's check it out. To learn more about what the data looks like we can try the following commands:
- data.head( ) - to look at the first 5 rows
- data.tail( ) - to look at the last 5 rows
- data.shape - to get the number of rows and columns
- data.info( ) - to get the names of the columns, how many non null pieces of data is in each column, and the type of data in each column
- data.nunique( ) - to get how many unique values are in each column
- data.max() - to get the highest value in each column
- data.min() to get the lowest value in each column
- data['col'].value_counts() - to get how many unique values are in a particular column

In [3]:
data.head()

Unnamed: 0,Date and Time of initial call,Date and time of Ranger response,Borough,Property,Location,Species Description,Call Source,Species Status,Animal Condition,Duration of Response,...,311SR Number,Final Ranger Action,# of Animals,PEP Response,Animal Monitored,Rehabilitator,Hours spent monitoring,Police Response,ESU Response,ACC Intake Number
0,2019-06-12 09:20:00,2019-06-12 09:20:00,Manhattan,Washingtom Square Park,on Sidewalk accross from the park near 10 Wash...,Red-tailed Hawk,Other,Native,,0.5,...,,Advised/Educated others,1.0,False,False,,,False,False,
1,2019-06-11 16:15:00,2019-06-11 16:20:00,Bronx,Van Cortlandt Park,Adjacent to VC Golf House,Canada Goose,Public,Native,Injured,0.5,...,1-1-1733837211,Unfounded,1.0,False,False,,,False,False,
2,2019-06-10 13:00:00,2019-06-10 13:30:00,Brooklyn,Irving Square Park,Northwest corner of the park,Parrot,Public,Exotic,,1.5,...,,Unfounded,1.0,False,False,,,False,False,
3,2019-06-09 09:30:00,2019-06-09 10:00:00,Brooklyn,Parade Ground,Prospect Park Parade Grounds near Tennis Center,Chicken,Central,Domestic,Healthy,3.0,...,1-1-1730643971,ACC,1.0,False,False,,,False,False,65352
4,2019-06-09 12:50:00,2019-06-09 12:55:00,Staten Island,Silver Lake Park,Bridge,Red-Eared Slider,Employee,Invasive,Injured,2.0,...,1-1-1724490913,ACC,2.0,True,False,,,False,False,65379 65380


In [4]:
data.tail()

Unnamed: 0,Date and Time of initial call,Date and time of Ranger response,Borough,Property,Location,Species Description,Call Source,Species Status,Animal Condition,Duration of Response,...,311SR Number,Final Ranger Action,# of Animals,PEP Response,Animal Monitored,Rehabilitator,Hours spent monitoring,Police Response,ESU Response,ACC Intake Number
977,2018-06-05 00:00:00,2018-06-05 00:01:00,Manhattan,Abingdon Square,,raccoon,Central,Native,Healthy,0.75,...,,Relocated/Condition Corrected,1.0,False,True,,1.0,False,False,
978,2018-06-01 12:00:00,2018-06-01 12:30:00,Manhattan,Central Park,,RACCOON,Employee,Native,Injured,1.25,...,,ACC,1.0,False,False,,,False,False,36061.0
979,2018-05-16 09:00:00,2018-05-17 10:10:00,Manhattan,Morningside Park,,Raccoon,Employee,Native,DOA,1.5,...,,ACC,2.0,False,True,,0.5,False,False,28316.0
980,2018-05-02 09:30:00,2018-05-02 12:00:00,Manhattan,Central Park,,raccoon,Public,Native,Healthy,0.75,...,,Unfounded,1.0,,,,,,False,
981,2018-05-07 13:30:00,2018-05-07 13:40:00,Manhattan,Central Park,,Red Tailed Hawk,Employee,Native,DOA,1.0,...,,Submitted for DEC Testing,1.0,,,,,,False,


In [6]:
data.shape

(982, 22)

In [11]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 982 entries, 0 to 981
Data columns (total 22 columns):
Date and Time of initial call       982 non-null datetime64[ns]
Date and time of Ranger response    982 non-null datetime64[ns]
Borough                             982 non-null object
Property                            982 non-null object
Location                            918 non-null object
Species Description                 969 non-null object
Call Source                         982 non-null object
Species Status                      968 non-null object
Animal Condition                    758 non-null object
Duration of Response                982 non-null float64
Age                                 982 non-null object
Animal Class                        982 non-null object
311SR Number                        573 non-null object
Final Ranger Action                 982 non-null object
# of Animals                        974 non-null float64
PEP Response                        9

In [12]:
data.min()

Date and Time of initial call       2018-05-02 09:30:00
Date and time of Ranger response    2018-05-02 12:00:00
Borough                                           Bronx
Property                                5 East 102nd St
Call Source                                     Central
Duration of Response                                  0
Age                                               Adult
Animal Class                                      Birds
Final Ranger Action                                 ACC
# of Animals                                          0
PEP Response                                      False
Animal Monitored                                  False
Hours spent monitoring                             0.15
Police Response                                   False
ESU Response                                      False
dtype: object

In [19]:
data['Borough'].value_counts()

Manhattan        475
Brooklyn         218
Queens           137
Staten Island     84
Bronx             68
Name: Borough, dtype: int64

In [17]:
data.max()

Date and Time of initial call                    2019-06-12 09:20:00
Date and time of Ranger response                 2019-06-12 09:20:00
Borough                                                Staten Island
Property                                            private property
Call Source                                                   Public
Duration of Response                                              21
Age                                                  Juvenile;#Adult
Animal Class                        Terrestrial Reptile or Amphibian
Final Ranger Action                                        Unfounded
# of Animals                                                      11
PEP Response                                                    True
Animal Monitored                                                True
Hours spent monitoring                                             4
Police Response                                                 True
ESU Response                      

#### Step 2: Cleaning the data

By now, we should have a sense of which columns may have null values. It may be or not be ok for a column to have null values. One way to replace null values with some other value is using, use data.fillna(x) where x is the value we want instead of the null.

In addition, the data may not be in 'standard' form, that is for example, having the strings 'yes', 'YES', and 'Yes' all be values contained in the same column. To verify that the data in a column is in 'standard' form, we can use data['column_name'].unique(). For example, what happens when we try data['Species Description'].unique()? What happens when we try data['Species Status'].unique()? To replace values, we can use data['column name'].replace('yes','Yes') to replace all 'yes' values with 'Yes' values (for example).

In [22]:
data['Animal Class'].unique()

array(['Raptors', 'Birds', 'Domestic', 'Terrestrial Reptile or Amphibian',
       'Small Mammals-RVS', 'Rare, Endangered, Dangerous',
       'Small Mammals-non RVS', 'Marine Reptiles',
       'Fish-numerous quantity;#Terrestrial Reptile or Amphibian',
       'Domestic;#Birds', 'Marine Mammals-seals only',
       'Marine Mammals-whales, Dolphin', 'Deer', 'Coyotes',
       'Domestic;#Small Mammals-RVS', 'Fish-numerous quantity',
       'Domestic;#Small Mammals-non RVS', 'Small Mammals-RVS;#Raptors'],
      dtype=object)

In [23]:
data['Species Description'].unique()

array(['Red-tailed Hawk', 'Canada Goose', 'Parrot', 'Chicken',
       'Red-Eared Slider', 'Rd-tailed Hawk', 'Cormorant', 'Raccoon',
       'Rooster', 'Dove', 'Snapping turtle', 'Monk Parakeet',
       'Mallard Duck', 'Gull', 'Snake', 'Snapping Turtle', 'Opossum',
       'Red-eyed Vireo', 'turtle', 'Squirrel', 'American Robin',
       'Domestic Duck', 'Pigeon', 'Bat', 'sparrow', nan,
       'Domestic Rabbit', 'Fledgling (possibly Starling)', 'Guineafowl',
       'American Goldfinch', 'Bird/Unspecified Species',
       'Freshwater Fish and Turtles', 'Turtle/ Unspecified species',
       'Cat', 'silver-haired bat', 'Seal', 'Egret', 'Skunk',
       'Painted Turtle', 'Northern Gannet', 'Parakeet', 'Cockatiel',
       'Saw whet owl', 'Dog', 'Robin', 'Big Brown Bat', 'Corn Snake',
       'Mute Swan', 'Coopers Hawk', 'Harbor Porpoise',
       'Boa Constrictor Snake', 'Turkey', 'Deer', 'Dolphin', 'Coyote',
       'Swan', 'Harbor Seal', 'Frog', 'Woodcock', 'Brant Goose', 'Falcon',
       "Cooper

In [24]:
data['Species Status'].unique()

array(['Native', 'Exotic', 'Domestic', 'Invasive', nan], dtype=object)

In [20]:
data['Location'].fillna(0)

0      on Sidewalk accross from the park near 10 Wash...
1                              Adjacent to VC Golf House
2                           Northwest corner of the park
3        Prospect Park Parade Grounds near Tennis Center
4                                                 Bridge
5                          Southeast Section of the park
6       Under the nest near Avenue B and East 9th Street
7      Hawk Nest (Near Avenue B and East 9th Entrance...
8                                       Tide Gate Bridge
9                                       Tide Gate Bridge
10                          144th Street and 20th Avenue
11     outside of playground in the brush close to pe...
12            Inside Miricale Gardens (community Garden)
13                   Rockaway Beach 97th St. PEP Command
14          Dyker 14th Ballfields; 40.613081, -74.014714
15     Behind the Met Museum at East Drive and East 84th
16                                     Perimeter of park
17        40.792282, -73.959797

#### Step 3: Exploring the data

Once our data is in the shape that we need it to be, we can start exploring it. To learn more about what the data can tell us we'll try filtering and grouping it, also computating some basic statistics and making graphs. The decisions that we make when doing all this can be based on our knowledge of the topic, our curiosity to learn from the data, as well as informed by what we learn from the data (or all three!).

##### Filtering data

To filter data, the following commands are useful:

- data[col] - to work only with one column
- data[data[col] > 7] - to extract rows that meet a particular criteria
- data[(data[col] > 0.5) & (data[col] < 0.7)] - to extract rows that meet more than one criteria

In [44]:
data[(data['Duration of Response'] > 1.0) & (data['Duration of Response'] < 3.0)]

Unnamed: 0,Date and Time of initial call,Date and time of Ranger response,Borough,Property,Location,Species Description,Call Source,Species Status,Animal Condition,Duration of Response,...,311SR Number,Final Ranger Action,# of Animals,PEP Response,Animal Monitored,Rehabilitator,Hours spent monitoring,Police Response,ESU Response,ACC Intake Number
2,2019-06-10 13:00:00,2019-06-10 13:30:00,Brooklyn,Irving Square Park,Northwest corner of the park,Parrot,Public,Exotic,,1.50,...,,Unfounded,1.0,False,False,,,False,False,
4,2019-06-09 12:50:00,2019-06-09 12:55:00,Staten Island,Silver Lake Park,Bridge,Red-Eared Slider,Employee,Invasive,Injured,2.00,...,1-1-1724490913,ACC,2.0,True,False,,,False,False,65379 65380
5,2019-06-08 11:00:00,2019-06-08 11:15:00,Manhattan,Washington Square Park,Southeast Section of the park,Rd-tailed Hawk,Public,Native,Healthy,1.50,...,,Monitored Animal,1.0,False,True,,1.50,False,False,
6,2019-06-08 08:44:00,2019-06-08 09:20:00,Manhattan,Tompkins Square Park,Under the nest near Avenue B and East 9th Street,Red-tailed Hawk,Public,Native,,1.50,...,1-1-1727159802,Rehabilitator,1.0,False,False,Animal Medical Center,,False,False,
7,2019-06-06 10:00:00,2019-06-07 12:00:00,Manhattan,Tompkins Square Park,Hawk Nest (Near Avenue B and East 9th Entrance...,Red-tailed Hawk,Public,Native,Unhealthy,2.00,...,,Monitored Animal,1.0,False,True,,1.00,False,False,
8,2019-06-07 15:15:00,2019-06-08 09:30:00,Queens,Flushing Meadows Corona Park,Tide Gate Bridge,Cormorant,Employee,Native,Injured,1.50,...,,Unfounded,1.0,False,False,,,False,False,
9,2019-06-07 15:15:00,2019-06-07 15:20:00,Queens,Flushing Meadows Corona Park,Tide Gate Bridge,Cormorant,Employee,Native,Injured,2.50,...,,Monitored Animal,1.0,False,True,,0.25,False,False,
12,2019-06-06 10:00:00,2019-06-06 10:45:00,Bronx,851 Fairmont Pl,Inside Miricale Gardens (community Garden),Rooster,Employee,Domestic,Healthy,2.00,...,1-1-1729971530,ACC,3.0,False,False,,,False,False,83788
14,2019-06-05 11:00:00,2019-06-05 11:20:00,Brooklyn,Dyker Beach Park,"Dyker 14th Ballfields; 40.613081, -74.014714",Raccoon,Employee,Native,Unhealthy,2.00,...,1-1-1728538570,ACC,1.0,False,False,,,True,False,64980
20,2019-06-01 15:00:00,2019-06-01 14:30:00,Bronx,Orchard Beach,Orchard Beach Parking Lot,Gull,Public,Native,Injured,2.00,...,1-1-173378903,Rehabilitator,1.0,False,False,Wildbird Fund,,False,False,


In [31]:
data['# of Animals']

0      1.0
1      1.0
2      1.0
3      1.0
4      2.0
5      1.0
6      1.0
7      1.0
8      1.0
9      1.0
10     0.0
11     1.0
12     3.0
13     1.0
14     1.0
15     1.0
16     1.0
17     1.0
18     1.0
19     2.0
20     1.0
21     1.0
22     1.0
23     1.0
24     1.0
25     1.0
26     1.0
27     1.0
28     2.0
29     2.0
      ... 
952    1.0
953    1.0
954    3.0
955    1.0
956    1.0
957    1.0
958    1.0
959    3.0
960    1.0
961    NaN
962    1.0
963    1.0
964    1.0
965    1.0
966    1.0
967    1.0
968    1.0
969    2.0
970    1.0
971    1.0
972    1.0
973    1.0
974    1.0
975    1.0
976    1.0
977    1.0
978    1.0
979    2.0
980    1.0
981    1.0
Name: # of Animals, Length: 982, dtype: float64

In [30]:
data[data['# of Animals'] > 7]

Unnamed: 0,Date and Time of initial call,Date and time of Ranger response,Borough,Property,Location,Species Description,Call Source,Species Status,Animal Condition,Duration of Response,...,311SR Number,Final Ranger Action,# of Animals,PEP Response,Animal Monitored,Rehabilitator,Hours spent monitoring,Police Response,ESU Response,ACC Intake Number
421,2018-11-03 17:00:00,2018-11-05 13:55:00,Staten Island,Willowbrook Park,Pond,Domestic Duck,Employee,Domestic,,1.25,...,,Monitored Animal,11.0,False,True,,0.5,False,False,
477,2018-10-19 15:20:00,2018-10-19 16:00:00,Brooklyn,Garden Playground,Beaver St. Between Fayette and Egbert,Canada Goose,Employee,Native,DOA,1.0,...,,Unfounded,9.0,False,False,,,False,False,


In [69]:
data["Species Description"].replace('racoon', "Raccon", inplace = True)

##### Grouping data

To group data, the following commands are useful:
- data[[col1, col2]] - to work with only some columns
- data.groupby(col) - To group the data based on the values in one column
- data.groupby([col1,col2]) - To group the data based on the values in more than one column
- If we want to find out how big each group is, we can use use .size() to count the number of rows in each group.

In [74]:
g = data.groupby(['Animal Class','Age'])

In [75]:
g.size()

Animal Class                                              Age                    
Birds                                                     Adult                      129
                                                          Infant                      11
                                                          Juvenile                    26
Coyotes                                                   Adult                        6
Deer                                                      Adult                       15
                                                          Infant                       1
                                                          Juvenile                     4
Domestic                                                  Adult                       44
                                                          Infant                       1
                                                          Infant;#Juvenile             1
                            

##### Basic statistics

To compute some basic statistics we can use:
- data.describe() - summary statistics for numerical columns
- data.mean() - mean of all columns
- data.median() - median of each column
- data.std() - standard deviation of each column
- data.corr() - to get the correlation between columns

In [52]:
data.describe()

Unnamed: 0,Duration of Response,# of Animals,Hours spent monitoring
count,982.0,974.0,120.0
mean,1.389868,1.090349,0.966667
std,1.135502,0.732486,0.847818
min,0.0,0.0,0.15
25%,0.75,1.0,0.5
50%,1.0,1.0,0.5
75%,2.0,1.0,1.0
max,21.0,11.0,4.0


In [53]:
data.median()

Duration of Response      1.0
# of Animals              1.0
Hours spent monitoring    0.5
ESU Response              0.0
dtype: float64

In [54]:
data.mean()

Duration of Response      1.389868
# of Animals              1.090349
Hours spent monitoring    0.966667
ESU Response              0.006110
dtype: float64

In [55]:
data.std()

Duration of Response      1.135502
# of Animals              0.732486
Hours spent monitoring    0.847818
ESU Response              0.077967
dtype: float64

In [56]:
data.corr()

Unnamed: 0,Duration of Response,# of Animals,Hours spent monitoring,ESU Response
Duration of Response,1.0,0.027589,0.632575,0.046757
# of Animals,0.027589,1.0,-0.025882,-0.009716
Hours spent monitoring,0.632575,-0.025882,1.0,0.220775
ESU Response,0.046757,-0.009716,0.220775,1.0


##### Making graphs

To visualize categorical data we can use:
- g = data['col'].value_counts()
- g.plot(x=g.index, y=g.values, kind = 'bar')or g.plot.pie(y='Borough')