# Visualizing and Deploying Data Analysis Using Python Workshop

![Data Science Process from AJ Goldstein](images/ajgoldstein-datascienceprocess.png)

[Image Source](https://ajgoldsteindotcom.wordpress.com/2017/11/12/deconstructing-data-science/)

There are many ways to break down the Data Science Process - I like this representation because it keeps things high level and emphasizes that the process is circular. 

Today we're focusing on Steps 3, 4 & 6 in the above breakdown of the data science process. We'll walk through a scenario and some collected data to focus on how you can process and explore data, then communicate your findings. 

Let's get started!

![Front of the Austin Animal Center building](images/austin-animal-center-front.jpeg)

## Scenario

The Austin Animal Center keeps great records of their animal intakes, but it's a lot of data. Your task - to process and find some initial high-level insights to start figuring out what trends are in the data. But insights left in a notebook like this are wasted - you also want to visualize what you've found to showcase to others!

## The Data

[Austin Animal Center Intakes Data](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm/) - updated pretty much every day!

Note - I did some initial pre-processing, and used an additional library to add more details to the location data provided (finding associated latitudes, longitudes and zipcodes for about a third of the data we'll use today).

## The Questions

1. What kind of animals are brought into the Center?

2. How has the number of animals brought in changed over time?

3. Where are most of the animals found?

## Getting Started

In [None]:
# Imports
# Pandas for data manipulation

# Plotly for data visualization


In [None]:
# Read in the data


In [None]:
# Check it out - let's look at the first five rows


In [None]:
# Let's also look some information on the data


In [None]:
# And let's see if we can describe it to find trends
df.describe()

## Question 1: What kind of animals are brought into the Center?

Just need one column for this - the Animal Type.

In [None]:
# Explore the breakdown of the Animal Type column


In [None]:
# Wow - birds and livestock make up such a small percentage
# Let's lump them in with 'Other' for a more effective visualization with replace


In [None]:
# Now let's see how that changed


In [None]:
# Capture that output in a variable - then reset the index to make it a dataframe


In [None]:
# Explore the variable we just created - we should rename these columns!


In [None]:
# So let's do that - rename the columns to actually describe the data


In [None]:
# Check our work


In [None]:
# Visualize it! With the world's most controversial chart... a pie chart
# https://plotly.com/python/pie-charts/


## Question 2: How has the number of animals brought in changed over time?

Here we'll need to look at our DateTime column - but also get an idea of the number of animals arriving per day. Time for a group by!

In [None]:
# Let's explore our DateTime column using describe


In [None]:
# Pandas isn't recognizing this as a datetime object - let's fix that


In [None]:
# Check our work using describe again


In [None]:
# We won't need the hour/minute/second data - just the date
# Can use normalize on the datetime attribute of this column to fix it


In [None]:
# Let's save that output as a new column, Date


In [None]:
# Check our work - let's use info


In [None]:
# Now - time for that group by!
# Let's explore what's happening in the groupby, then save it to a variable


In [None]:
# Time for a line chart!
# https://plotly.com/python/line-charts/


In [None]:
# Woah - that's a bit messy. Let's just look at a montly breakdown
# We can resample, then grab the sum per month


In [None]:
# Time for another line chart!


In [None]:
# Looks like we have an annual trend - let's take a better look...
# Let's go back to our original dataframe and create a new groupby for this...
# First - grab out the Year and Month as new columns


In [None]:
# Check our work...


In [None]:
# A new groupby - now with two columns to group by!
# Let's explore, reset the index for clarity, then save to a variable


In [None]:
# One last line chart!


## Question 3: Where are most of the animals found?

Last question! Here, we'll start playing with map objects, using some of our location data.

Let's group by the number of animals found at each location.

In [None]:
# First - let's explore our Found Location column


In [None]:
# Now - let's just see the animals that we have precise location details for
# How can we get just the animals with Found Zipcode details?
# Let's save that new subset dataframe to a new variable


In [None]:
# Now let's check our the breakdown of our Found Location column


In [None]:
# Time to group by! We want the count of animals at each location
# Note - we want the latitude and longitude too, to visualize in a minute


In [None]:
# Let's rename that Animal ID column to be descriptive


In [None]:
# Check our work


In [None]:
# Now... map time!
# https://plotly.com/python/mapbox-layers/


## Move to Streamlit!

Now that we've answered our three questions - let's open up our `app.py` file in this folder to see how this can translate to Streamlit.

If you're running this at home, instead of in Binder, you can also run the app using the terminal command `streamlit run app.py`

Or - check out the deployed version! https://austin-animal-center-data.herokuapp.com/

### Thank you for joining us!