# Austin Animal Shelter Data Exploration

![Front of the Austin Animal Center building](images/austin-animal-center-front.jpeg)

## Scenario

The Austin Animal Center keeps great records of their animal intakes, but it's a lot of data. Your task - to process and find some initial high-level insights to start figuring out what trends are in the data. But insights left in a notebook like this are wasted - you also want to visualize what you've found to showcase to others!

## The Data

[Austin Animal Center Intakes Data](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm/) - updated pretty much every day!

Note - I did some initial pre-processing, and used an additional library to add more details to the location data provided (finding associated latitudes, longitudes and zipcodes for about a third of the data we'll use today).

## The Questions

1. What kind of animals are brought into the Center?

2. How has the number of animals brought in changed over time?

3. Where are most of the animals found?

## Getting Started

In [1]:
# Imports
# Pandas for data manipulation
import pandas as pd

# Plotly for data visualization
import plotly.express as px

ModuleNotFoundError: No module named 'plotly'

In [None]:
# Read in the data
df = pd.read_csv('data/Austin_Animal_Center_Intakes_061521_with_location_details.csv')

In [None]:
# Check it out - let's look at the first five rows
df.head()

In [None]:
# Let's also look some information on the data
df.info()

In [None]:
# And let's see if we can describe it to find trends
df.describe()

## Question 1: What kind of animals are brought into the Center?

Just need one column for this - the Animal Type.

In [None]:
# Explore the breakdown of the Animal Type column
df['Animal Type'].value_counts()

In [None]:
# Wow - birds and livestock make up such a small percentage
# Let's lump them in with 'Other' for a more effective visualization with replace

df['Animal Type'] = df['Animal Type'].replace({'Bird': 'Other', 'Livestock': 'Other'})

In [None]:
# Now let's see how that changed
df['Animal Type'].value_counts()

In [None]:
# Capture that output in a variable - then reset the index to make it a dataframe
types = df['Animal Type'].value_counts().reset_index()

In [None]:
# Explore the variable we just created - we should rename these columns!
types.head()

In [None]:
# So let's do that - rename the columns to actually describe the data
types = types.rename(columns={'index':'Type', 'Animal Type': 'Count'})

In [None]:
# Check our work
types.head()

In [None]:
# Visualize it! With the world's most controversial chart... a pie chart
# https://plotly.com/python/pie-charts/
fig = px.pie(types, values='Count', names='Type',
             color_discrete_sequence=px.colors.qualitative.Pastel)
fig.show()

## Question 2: How has the number of animals brought in changed over time?

Here we'll need to look at our DateTime column - but also get an idea of the number of animals arriving per day. Time for a group by!

In [None]:
# Let's explore our DateTime column using describe
df['DateTime'].describe()

In [None]:
# Pandas isn't recognizing this as a datetime object - let's fix that
df['DateTime'] = pd.to_datetime(df['DateTime'])

In [None]:
# Check our work using describe again
df['DateTime'].describe(datetime_is_numeric=True)

In [None]:
# We won't need the hour/minute/second data - just the date
# Can use normalize on the datetime attribute of this column to fix it
df['DateTime'].dt.normalize().head()

In [None]:
# Let's save that output as a new column, Date
df['Date'] = df['DateTime'].dt.normalize()

In [None]:
# Check our work - let's use info
df.info()

In [None]:
# Now - time for that group by!
# Let's explore what's happening in the groupby, then save it to a variable
count_over_time = df.groupby(by='Date').count()['Animal ID']

In [None]:
# Time for a line chart!
# https://plotly.com/python/line-charts/
fig = px.line(count_over_time, y='Animal ID')
fig.show()

In [None]:
# Woah - that's a bit messy. Let's just look at a montly breakdown
# We can resample, then grab the sum per month
count_over_time = count_over_time.resample('M').sum()

In [None]:
# Time for another line chart!
fig = px.line(count_over_time, y='Animal ID')
fig.show()

In [None]:
# Looks like we have an annual trend - let's take a better look...
# Let's go back to our original dataframe and create a new groupby for this...
# First - grab out the Year and Month as new columns
df['Year'] = df['DateTime'].dt.year
df['Month'] = df['DateTime'].dt.month

In [None]:
# Check our work...
df.head()

In [None]:
# A new groupby - now with two columns to group by!
# Let's explore, reset the index for clarity, then save to a variable
annual_trend = df.groupby(['Year', 'Month']).count()['Animal ID'].reset_index()

In [None]:
# One last line chart!
fig = px.line(annual_trend, y='Animal ID', x='Month', color='Year',
              labels={'Animal ID': 'Number of Animal Intakes'}) # better x label for clarity
fig.show()

## Question 3: Where are most of the animals found?

Last question! Here, we'll start playing with map objects, using some of our location data.

Let's group by the number of animals found at each location.

In [None]:
# First - let's explore our Found Location column
df['Found Location'].value_counts()

In [None]:
# Now - let's just see the animals that we have precise location details for
# How can we get just the animals with Found Zipcode details?
# Let's save that new subset dataframe to a new variable
location_df = df.loc[df['Found Zipcode'].isna() == False]

In [None]:
# Now let's check our the breakdown of our Found Location column
location_df['Found Location'].value_counts()

In [None]:
# Time to group by! We want the count of animals at each location
# Note - we want the latitude and longitude too, to visualize in a minute
location_count = location_df.groupby(by=['Found Location', 
                                         'Found Latitude', 
                                         'Found Longitude']).count()['Animal ID'].reset_index()

In [None]:
# Let's rename that Animal ID column to be descriptive
location_count = location_count.rename(columns={'Animal ID':'Count'})

In [None]:
# Check our work
location_count.head()

In [None]:
# Now... map time!
# https://plotly.com/python/mapbox-layers/
fig = px.scatter_mapbox(location_count, lat="Found Latitude", lon="Found Longitude",
                        color="Count", size="Count",zoom=10,
                        hover_name='Found Location')
fig.update_layout(mapbox_style="open-street-map")
fig.show()

## Move to Streamlit!

Now that we've answered our three questions - let's open up our `app.py` file in this folder to see how this can translate to Streamlit.

If you're running this at home, instead of in Binder, you can also run the app using the terminal command `streamlit run app.py`

Or - check out the deployed version! https://austin-animal-center-data.herokuapp.com/

### Thank you for joining us!