# Austin Shelter Pet Outcomes <a name='top'></a>
*By Stephen FitzSimon*

## Key Takeaways From This Report

## Contents <a name='contents'></a>

*Note: the following hyperlinks will only work on local copies of this notebook; they will not function on GitHub!*

1. <a href='#introduction'>Introduction</a>
    1. <a href='#data_source'>Data Source</a>
2. <a href='#wrangle'>Wrangle The Data</a>
3. <a href='#exploring'>Exploring The Data

## Introduction <a name='introduction'></a>

<a href='https://www.austintexas.gov/austin-animal-center'>Austin Animal Center</a> provides animal services for the city of Austin, and unincorporated Travis county.  Since 2010 they have implemented a <a href='https://www.austintexas.gov/page/no-kill-plan'>'no-kill' strategy</a> to increase live outcomes for adoptions; this included community partnerships, community education, increased animal services and better data collection.  The goal of this project is to help understand how the Austin Animal Center can provide better services and outcomes for the animals and owners that they serve.

This report is structered around the data science pipeline of 1. wrangle the data, 2. explore the data and 3. model the data.  Each sections includes key takeaways followed by a discussion that provides more in depth analysis; these are designed to provide all the necessary information to understand the project and arguments.  For more technical and analytical information, visualizations, calculations and code follow these two sections along with annotations and discussions.  If more detail is needed, reference the series of other notebooks as mentioned in the <a href='https://github.com/stephenfitzsimon/pet_adoption_project/blob/main/README.md'>readme file</a>. Hyperlinks are provided throughout to allow the user to navigate to relevant sections.

### Data Source <a name='data_source'></a>
The City of Austin provides an <a href='https://data.austintexas.gov/'>open data website</a> where the data for this project can be found.
- <a href= 'https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238'>Outcome data can be found here</a>
- <a href= 'https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm'>Intake data can be found here</a>

<a href='#contents'>Back to Contents</a>

In [1]:
import pandas as pd
import wrangle

## Wrangle The Data <a name='wrangle'></a>

### Key Wrangle Takeaways
- Two tables of data are retrieved from the Austin <a href='https://data.austintexas.gov/'>open data website</a>
- The `Outcome` table contained 141170 records and 12 columns; the `Intake` table contained 141303 records and 12 columns
- Both tables were merged on the key `animal_id`
    - Duplicate records were dropped.  Many of the duplicate records were the same animal.
- Date columns were aggregated into `intake_date` and `outcome_date` and scaled to days
- Repeated columns were removed (example: `color` and `breed` were the same for both intake and outcome tables)
- `name` and `outcome_subtype` had null values inferred
- Remaining nulls were dropped
- Final table is 113944 rows and 16 columns

### Discusion

Data was retrieved from the Austin <a href='https://data.austintexas.gov/'>open data website</a> via the Socrata Open Data API (SODA) and the `sodapy` python module.  The <a href= 'https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238'>outcome data</a> contained 141170 records with 12 columns and the <a href= 'https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm'>intake data</a> contained 141303 records across 12 rows.  

Each row represents a single animal intake or outcome (a data dictionary can be found <a href='#appedix'>here</a>). The tables were merged on the `animal_id` column; however, duplicates were dropped for this project.  Many of the duplicate rows represented a single animal that had been returned to the shelter after a series of different outcomes--for example, record `A700407` indicates that the animal most likely escaped from the owner and was then returned to the owner multiple times over a period of 6 years (the animal is listed as `Stray` on the `Intake` table, and the the outcome is listed as `Returned to owner` / `Adoption` on the `Outcome` table).  While these records would be ideal to concatenate in a larger project, this project focuses on the adoptability of animals first entering the shelter.  It is most likely the case that the reduplicated records represent an animal that is difficult to adopt.

The date columns did not differ intra-table, and so were aggregated into `intake_date` and `outcome_date` and truncated to year-month-date form.  In addition, columns repeated on both tables were found to contain the same information, and these replicated columns were dropped.  The `outcome_subtype` and `name` columns had a significant number of null values; these were inferred to be `no name` and `no subtype`.  Several columns--`outcome_type`, `sex_at_outcome`, `age_at_outcome` and `sex_at_intake`--had null value; at most these represented 15 rows, and consequently were dropped. The `age_upon_intake` and `age_upon_outcome` both contained ages that were `String` data types of the forms `"x years"`, `"x months"` or `"x days"`. These were converted to a number of days; months were assumed to have $30.5$ days, and years were assumed to have $365.25$ days.  The final calculation was then rounded to `int` datatype.  As these ages are most likely estimation on the part of shelter staff, this is most likely enough accuracy.  Finally, as `date_of_birth` most likely was also an estimation, the column was dropped.

### Code

Code used in this section is contained in `wrangle.py` which programmatically retrieves and prepares the data.  It contains the following functions:
- `get_pet_data` : Checks if a csv file is present, and retrieves data from csv or url. A url query can be forced via `query_url=True`
- `download_data` : Returns the pet outcome and pet intake dataframes from the SODA API
- `make_date_columns` : Aggregates datetime columns into outcome and intake dates
- `null_fill_and_drop` : Fills nulls and drops null values. `name` columns nulls are inferred as no name and `outcome_subtype` is inferred as no subtype.  Remaining nulls are dropped
- `convert_age_column` : Converts age columns to days
- `rename_intake_cols` : Renames the columns from the intake table to make calling them easier
- `rename_intake` : Suffixes the column names of the intake table to distinguish from outcome columns after merge
- `join_tables` : Joins the intake and outake tables on the animal_id column
- Flow control functions:
    - `make_pet_dataframe` : Flow control function to retrieve data and prepare it for exploration
    - `get_pet_dataframe` : Flow control function to get dataframe from url or .csv file and join both tables
    - `prepare_pet_dataframe` : Flow control function to prepare data

In [2]:
df = wrangle.make_pet_dataframe()
df.sample(5)

Returning saved csv files.


Unnamed: 0,animal_id,name,outcome_type,animal_type,sex_upon_outcome,breed,color,outcome_subtype,found_location,intake_type,intake_condition,sex_upon_intake,outcome_date,intake_date,age_at_outcome,age_at_intake
59216,A751857,Maybe,Adoption,Cat,Spayed Female,Domestic Longhair Mix,Blue,no subtype,16809 Grave Send Rd in Pflugerville (TX),Owner Surrender,Normal,Spayed Female,2017-07-02,2017-06-14,2192,2192
84521,A711586,no name,Transfer,Cat,Unknown,Domestic Medium Hair Mix,Orange Tabby,Partner,5129 Cameron Rd in Austin (TX),Stray,Normal,Unknown,2015-09-09,2015-09-09,14,14
25083,A808289,no name,Died,Cat,Intact Male,Domestic Shorthair,Brown Tabby/White,In Kennel,2305 Santa Rosa Street in Austin (TX),Stray,Normal,Intact Male,2019-11-10,2019-11-06,30,30
42427,A780168,Jake,Adoption,Dog,Neutered Male,Labrador Retriever/German Shepherd,Tan,no subtype,Oakmont in Austin (TX),Stray,Normal,Intact Male,2018-09-24,2018-09-10,183,183
32152,A798171,Ozzy,Adoption,Dog,Neutered Male,Boxer/Labrador Retriever,Black/Brown,no subtype,Austin (TX),Owner Surrender,Normal,Neutered Male,2019-06-23,2019-06-22,92,92


<a href='#contents'>Backt to Contents</a>

## Exploring The Data <a name='exploring'></a>