# Austin Animal Center Intakes and Outcomes Analysis


![title](aac_logo.jpg)

The Austin Animal Center has 2 Datasets available for download and analysis. One may also utilize the SODA API to retrieve Data. 
<br>Listed below are the links to the 2 datasets we analyzed:</br>

<a href="https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm">Austin Animal Center Intakes</a>
<br>
<a href="https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238?no_mobile=true">Austin Animal Center Outcomes</a></br>

Each Dataset consisted of 114k Rows & 12 Columns and contained information on a variety of animals. For this project, we decided to only look at dogs.

## Questions we set out to answer were the following:

- What is the average length of stay for a dog at the shelter?
    - Does it vary by breed?
- How many dogs are "boomerangs"? (Entered the Shelter -> Left the Shelter -> Came Back)
- What are the age demographics of the dogs in the shelter
- People say that Black dogs have more trouble getting out of the shelter, does the data support that? 
    - Which colors get out quickest?
- Most common dog names (male/female)

# The Process, The Munging, The Analysis

## Retrieving Data

#### 1. Get Intakes and Outcomes 
#### 2. Select "Dogs" and "Adopted"
#### 3. Normalize Columns - create signatures for Intake/Outcome
#### 4. Join tables through Concatenation
#### 5. 68,444 entries

The first step in the process was to trim the datasets by only selecting rows where "animal_type" was == to 'Dog'.
We also trimmed the outcomes dataset to only give us rows where the 'outcome_type' was == to 'Adoption'.
This brought our datasets down to 64,554 rows and 30,144 rows respectively.

We then decided to join these 2 datasets and create our working dataframe. We initially tried doing this by merging them together on "animal_id", but in the end opted for the concatenation of the two. 

In order to concatenate 2 dataframes, each df must contain the same amount of columns, and the same names. We "normalized" the column names to snake_case_lower, and added a "is_intake" column to distinguish the rows for intakes vs outcomes.

Our final concatenated_df consisted of 68444 rows × 14 columns.

## Refining and Taking out the Trash

### 1. Determining "Qualified" Animal ID's
    if num_intakes == num_outcomes or num_intakes-1 == num_outcomes:
        is_qualified=True

</p>Next we needed to find the animal_id's that had at least one intake and one outcome, in that order.
For that, we first created 2 dataframes with aggregated counts(# of records) grouped by animal_id. One df showed num_intakes, and one df showed num_outcomes.</p>

</p>We then merged these 2 dataframes and checked each animal id to determine if they were qualified. To determine which were qualified, we created a function. See below.</p>

### 60,891 Entries
</p>From here, we obtained a series of animal_id's which were qualified. We did a loc for these ID's on the concatenated_df from earlier to produce our final working_df which ended up having 60,891 rows & 14 columns.</p>



### 2. Munging Breeds and Colors
- 1629 Unique Breeds
- 280 Unique Color Combinations

#### 1. Separate Primary from Secondary
#### 2. Colors: Create Primary/Secondary Values
#### 3. Breeds: Assign "Mix" to Primary Breed

The breeds consisted of "Pure", "Mix" or "Primary/Secondary". ex. "Chihuahua", "Chihuahua Mix" or "Chihuahua/Pomeranian"
To clean things up, we assigned dogs which had "Primary/Secondary" values into the Mix Category, based on which breed was their Primary Breed. ex. "Pit Bull/Rotweiller" became "Pit Bull Mix"
This brought our final number of unique breeds down to 324.

The color values consisted of variants on primary colors, such as "Black Tiger" or "Black Smoke". We looked up these variants and converted them to just their primary color value. ex. "Black Tiger" became "Black".
The column also had values with the format of primary/secondary colors, so we then decided to split these into two columns. This left us with 9 primary colors and 10 secondary colors.

### Matching Intakes to Outcomes per Dog

#### 1. Sort by Animal ID and Date
#### 2. Cycle through entries checking Intake THEN Outcome
#### 3. Calculating Number of Stays
#### 4. Averaging Stay Lengths
#### 5. Group by Animal ID



# Findings

## Average Length of Stay 

#### Average Number of Stays: 1.137 Stays
#### Average Stay Length : 25.09 Days


## Does it vary by breed?

![title](NumberOfDaysInShelterByBreed.png)


 ## What are the age demographics? Do age or sex influence stays?
 
![title](StaysByAgeSex.png)

## Influence of Color on Stay Length

![title](AvgLenStayByColor.png)

## Most common female dog names

Bella - 143
Luna - 142
Daisy - 124
<br>Lucy - 98</br>
Princess - 79
<br>Lola - 79</br>
Coco - 64
<br>Sadie - 60</br>
Penny - 53
<br>Bailey - 47</br>

## Most common male dog names

Max - 142
<br>Buddy - 93</br>
Rocky - 83
<br>Charlie - 63</br>
Zeus - 60
<br>Jack - 53</br>
Duke - 48
<br>Toby - 45</br>
Blue - 44
<br>Milo - 43</br>