# Brown Scholars Internship 2019-2020 - Urban Wildlife in NYC

New York City is home to many diverse species of wildlife that arrived or existed long before humans settled here.
In October 2016, Mayor Bill de Blasio launched WildlifeNYC, a citywide education and awareness campaign teaching New Yorkers how to live safely and responsibly alongside wild animals including deer, raccoons, and coyotes.

Urban wildlife is any wild animal that lives in an urban environment, such as New York City. Urban wildlife includes birds, mammals, reptiles, fish and amphibians. Some urban wildlife is native, like eastern grey squirrels, while some are non-native, like mute swans. Domesticated and companion animals, like dogs, exotic pets, and farm animals are not considered urban wildlife. Domesticated but feral animals like pigeons and stray cats are also not considered urban wildlife.

Data source: https://data.cityofnewyork.us/Environment/Urban-Park-Ranger-Animal-Condition-Response/fuhs-xmg2

First we'll start by importing packages we'll use

In [1]:
import pandas as pd

and then import the data. For now, the csv file should be in the same directory as the notebook. Notice that we are importing the date and time info as type 'datetime'

In [2]:
data = pd.read_csv('Urban_Park_Ranger_Animal_Condition_Response.csv',
                   parse_dates = ['Date and Time of initial call', 'Date and time of Ranger response'])

Note: if you want to export the data, use df.to_csv(filename), where df is the name of your dataframe and filename is the name of the file where you want to save the data. The csv file will get created in the same directory as the notebook.

#### Step 1: Viewing and inspecting the data

Now that the data is loaded, let's check it out. To learn more about what the data looks like we can try the following commands:
- data.head( ) - to look at the first 5 rows
- data.tail( ) - to look at the last 5 rows
- data.shape - to get the number of rows and columns
- data.info( ) - to get the names of the columns, how many non null pieces of data is in each column, and the type of data in each column
- data.nunique( ) - to get how many unique values are in each column
- data.max() - to get the highest value in each column
- data.min() to get the lowest value in each column
- data['col'].value_counts() - to get how many unique values are in a particular column

#### Step 2: Cleaning the data

By now, we should have a sense of which columns may have null values. It may be or not be ok for a column to have null values. One way to replace null values with some other value is using, use data.fillna(x) where x is the value we want instead of the null.

In addition, the data may not be in 'standard' form, that is for example, having the strings 'yes', 'YES', and 'Yes' all be values contained in the same column. To verify that the data in a column is in 'standard' form, we can use data['column_name'].unique(). For example, what happens when we try data['Species Description'].unique()? What happens when we try data['Species Status'].unique()? To replace values, we can use data['column name'].replace('yes','Yes') to replace all 'yes' values with 'Yes' values (for example).

#### Step 3: Exploring the data

Once our data is in the shape that we need it to be, we can start exploring it. To learn more about what the data can tell us we'll try filtering and grouping it, also computating some basic statistics and making graphs. The decisions that we make when doing all this can be based on our knowledge of the topic, our curiosity to learn from the data, as well as informed by what we learn from the data (or all three!).

##### Filtering data

To filter data, the following commands are useful:

- data[col] - to work only with one column
- data[data[col] > 7] - to extract rows that meet a particular criteria
- data[(data[col] > 0.5) & (data[col] < 0.7)] - to extract rows that meet more than one criteria

##### Grouping data

To group data, the following commands are useful:
- data[[col1, col2]] - to work with only some columns
- data.groupby(col) - To group the data based on the values in one column
- data.groupby([col1,col2]) - To group the data based on the values in more than one column
- If we want to find out how big each group is, we can use use .size() to count the number of rows in each group.

##### Basic statistics

To compute some basic statistics we can use:
- data.describe() - summary statistics for numerical columns
- data.mean() - mean of all columns
- data.median() - median of each column
- data.std() - standard deviation of each column
- data.corr() - to get the correlation between columns

##### Making graphs

To visualize categorical data we can use:
- g = data['col'].value_counts()
- g.plot(x=g.index, y=g.values, kind = 'bar')or g.plot.pie(y='Borough')