# Animal Shelter Outcomes
---

Problem Statement...

### Background
...

---
## Data Cleaning

In [1]:
# imports
import pandas as pd
import numpy as np

### Shelter Intakes

In [2]:
# reading in shelter intakes
intakes =  pd.read_csv('../data/austin_animal_center_intakes_20241017.csv')
intakes.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color
0,A786884,*Brock,01/03/2019 04:19:00 PM,January 2019,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor
1,A706918,Belle,07/05/2015 12:59:00 PM,July 2015,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver
2,A724273,Runster,04/14/2016 06:43:00 PM,April 2016,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White
3,A665644,,10/21/2013 07:59:00 AM,October 2013,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico
4,A857105,Johnny Ringo,05/12/2022 12:23:00 AM,May 2022,4404 Sarasota Drive in Austin (TX),Public Assist,Normal,Cat,Neutered Male,2 years,Domestic Shorthair,Orange Tabby


In [3]:
intakes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 168040 entries, 0 to 168039
Data columns (total 12 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   Animal ID         168040 non-null  object
 1   Name              119647 non-null  object
 2   DateTime          168040 non-null  object
 3   MonthYear         168040 non-null  object
 4   Found Location    168040 non-null  object
 5   Intake Type       168040 non-null  object
 6   Intake Condition  168040 non-null  object
 7   Animal Type       168040 non-null  object
 8   Sex upon Intake   168038 non-null  object
 9   Age upon Intake   168039 non-null  object
 10  Breed             168040 non-null  object
 11  Color             168040 non-null  object
dtypes: object(12)
memory usage: 15.4+ MB


*The Name column has a lot of null values and we're assuming a pet's name won't affect their chances of adoption, so going to drop this column. Also dropping MonthYear because that information is also in the DateTime column.*

In [4]:
# dropping inconsequential columns
intakes.drop(columns=['Name', 'MonthYear'], inplace=True)

In [5]:
# renaming columns to be intake specific and snake case
columns = {
    'Animal ID': 'animal_id',
    'DateTime': 'intake_time',
    'Found Location': 'found_location',
    'Intake Type': 'intake_type',
    'Intake Condition': 'intake_condition',
    'Animal Type': 'animal_type',
    'Sex upon Intake': 'intake_gender',
    'Age upon Intake': 'intake_age',
    'Breed': 'intake_breed',
    'Color': 'intake_color'    
}

intakes = intakes.rename(columns=columns)

In [6]:
# converting intake_time column to datetime format
intakes['intake_time'] = pd.to_datetime(intakes['intake_time'], format='%m/%d/%Y %H:%M:%S %p')

In [7]:
intakes.nunique()

animal_id           151007
intake_time         115671
found_location       68164
intake_type              6
intake_condition        20
animal_type              5
intake_gender            5
intake_age              55
intake_breed          2969
intake_color           651
dtype: int64

*There are many animals that have more than one stay at a shelter. In order to have accurate merging between intake and outcomes we are dropping any duplicate animals. Sorting by intake time first so we keep the most recent.*

In [9]:
# sort intakes by most recent intakes first
intakes.sort_values(by=['intake_time', 'animal_id'], inplace=True, ascending = False)

# drop duplicate observations
intakes.drop_duplicates(subset='animal_id', inplace = True)

### Shelter Outcomes

In [11]:
# reading in shelter outcomes
outcomes = pd.read_csv('../data/austin_animal_center_outcomes_20241017.csv')
outcomes.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color
0,A882831,*Hamilton,07/01/2023 06:12:00 PM,Jul 2023,03/25/2023,Adoption,,Cat,Neutered Male,3 months,Domestic Shorthair Mix,Black/White
1,A794011,Chunk,05/08/2019 06:20:00 PM,May 2019,05/02/2017,Rto-Adopt,,Cat,Neutered Male,2 years,Domestic Shorthair Mix,Brown Tabby/White
2,A776359,Gizmo,07/18/2018 04:02:00 PM,Jul 2018,07/12/2017,Adoption,,Dog,Neutered Male,1 year,Chihuahua Shorthair Mix,White/Brown
3,A821648,,08/16/2020 11:38:00 AM,Aug 2020,08/16/2019,Euthanasia,,Other,Unknown,1 year,Raccoon,Gray
4,A720371,Moose,02/13/2016 05:59:00 PM,Feb 2016,10/08/2015,Adoption,,Dog,Neutered Male,4 months,Anatol Shepherd/Labrador Retriever,Buff


In [12]:
outcomes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 167942 entries, 0 to 167941
Data columns (total 12 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   Animal ID         167942 non-null  object
 1   Name              119733 non-null  object
 2   DateTime          167942 non-null  object
 3   MonthYear         167942 non-null  object
 4   Date of Birth     167942 non-null  object
 5   Outcome Type      167896 non-null  object
 6   Outcome Subtype   77144 non-null   object
 7   Animal Type       167942 non-null  object
 8   Sex upon Outcome  167940 non-null  object
 9   Age upon Outcome  167926 non-null  object
 10  Breed             167942 non-null  object
 11  Color             167942 non-null  object
dtypes: object(12)
memory usage: 15.4+ MB


*Dropping Name and MonthYear columns for outcomes data as well as Outcome Subtype. This column has even more null values than Name and we'll be focusing on the primary Outcome Type only.*

In [13]:
# dropping inconsequential columns
outcomes.drop(columns=['Name', 'MonthYear', 'Outcome Subtype'], inplace=True)

In [14]:
# renaming columns to be outcome specific and snake case
outcome_columns = {
    'Animal ID': 'animal_id',
    'DateTime': 'outcome_time',
    'Date of Birth': 'date_of_birth',
    'Outcome Type': 'outcome_type',
    'Animal Type': 'outcome_animal_type',
    'Sex upon Outcome': 'outcome_gender',
    'Age upon Outcome': 'outcome_age',
    'Breed': 'outcome_breed',
    'Color': 'outcome_color'    
}

outcomes = outcomes.rename(columns=outcome_columns)

In [15]:
# converting outcome_time and date of birth columns to datetime format
outcomes['outcome_time'] = pd.to_datetime(outcomes['outcome_time'], format='%m/%d/%Y %H:%M:%S %p')
outcomes['date_of_birth'] = pd.to_datetime(outcomes['date_of_birth'], format='%m/%d/%Y')

In [16]:
outcomes.nunique()

animal_id              150912
outcome_time           139971
date_of_birth            8501
outcome_type               11
outcome_animal_type         5
outcome_gender              5
outcome_age                55
outcome_breed            2969
outcome_color             653
dtype: int64

*Dropping duplicate animal_id's for outcomes as well.*

In [17]:
# sort intakes by most recent intakes first
outcomes.sort_values(by=['outcome_time', 'animal_id'], inplace=True, ascending = False)

# drop duplicate observations
outcomes.drop_duplicates(subset='animal_id', inplace = True)

### Merge DataFrames

In [19]:
# meging intakes and outcomes
intakes_outcomes = pd.merge(left=outcomes, right=intakes, how='inner', on='animal_id')
intakes_outcomes.head()

Unnamed: 0,animal_id,outcome_time,date_of_birth,outcome_type,outcome_animal_type,outcome_gender,outcome_age,outcome_breed,outcome_color,intake_time,found_location,intake_type,intake_condition,animal_type,intake_gender,intake_age,intake_breed,intake_color
0,A912055,2024-10-17 12:25:00,2023-10-25,Adoption,Cat,Neutered Male,11 months,Domestic Shorthair,Brown Tabby/White,2024-08-25 08:20:00,1800 Fairlawn Lane in Austin (TX),Stray,Injured,Cat,Intact Male,10 months,Domestic Shorthair,Brown Tabby/White
1,A915002,2024-10-17 12:21:00,2023-10-10,Return to Owner,Dog,Intact Male,1 year,German Shepherd Mix,Tan,2024-10-10 12:10:00,Austin (TX),Public Assist,Normal,Dog,Intact Male,1 year,German Shepherd Mix,Tan
2,A832172,2024-10-17 12:20:00,2021-01-24,Return to Owner,Dog,Neutered Male,3 years,Pit Bull Mix,Brown/White,2024-10-10 12:10:00,Austin (TX),Public Assist,Normal,Dog,Neutered Male,3 years,Pit Bull Mix,Brown/White
3,A915279,2024-10-17 12:00:00,2022-10-14,Transfer,Cat,Intact Female,2 years,Domestic Shorthair,Black,2024-10-14 11:47:00,14514 Highsmith Street in Austin (TX),Stray,Normal,Cat,Intact Female,2 years,Domestic Shorthair,Black
4,A915162,2024-10-17 12:00:00,2022-10-12,Transfer,Cat,Unknown,2 years,Domestic Shorthair,Brown Tabby,2024-10-12 10:31:00,102 East Rundberg Lane in Austin (TX),Stray,Normal,Cat,Unknown,2 years,Domestic Shorthair,Brown Tabby


*There were couple columns that were the same category, going to investigate these. Also converting age columns to age in months. First making a new column for how long an animal stays in the shelter.*

In [21]:
intakes_outcomes['stay_duration'] = intakes_outcomes['outcome_time'] - intakes_outcomes['intake_time']

# convert stay_duration to number of days
intakes_outcomes['stay_duration'] = intakes_outcomes['stay_duration'].dt.days.astype(int)

# check for negative stay_duration
intakes_outcomes[intakes_outcomes['stay_duration'] < 0]['stay_duration'].value_counts()

stay_duration
-1      7242
-4         5
-19        3
-20        3
-7         3
        ... 
-251       1
-228       1
-245       1
-80        1
-3         1
Name: count, Length: 94, dtype: int64

In [27]:
intakes_outcomes[intakes_outcomes['stay_duration'] == -1]['outcome_type'].value_counts()

outcome_type
Transfer           3773
Return to Owner    2289
Euthanasia          813
Adoption            257
Died                 65
Disposal             33
Rto-Adopt             8
Relocate              2
Name: count, dtype: int64

*There are a lot of observations where the stay is -1 days. We are making the assumption this is because of reporting errors and are going to assume they are all zero. There are only 93 other negative stay durations, instead of making assumptions with those we will drop these observations.*

In [28]:
intakes_outcomes['stay_duration'] = intakes_outcomes['stay_duration'].map(lambda x: 0 if x == -1 else x)

In [31]:
# compare intake/outcome animal_type, gender, breed, and color
print(f'Number of animal type changes: {intakes_outcomes[intakes_outcomes['animal_type'] != intakes_outcomes['outcome_animal_type']].shape[0]}')
print(f'Number of neuters/spays: {intakes_outcomes[intakes_outcomes['intake_gender'] != intakes_outcomes['outcome_gender']].shape[0]}')
print(f'Number of breed changes: {intakes_outcomes[intakes_outcomes['intake_breed'] != intakes_outcomes['outcome_breed']].shape[0]}')
print(f'Number of color changes: {intakes_outcomes[intakes_outcomes['intake_color'] != intakes_outcomes['outcome_color']].shape[0]}')

Number of animal type changes: 0
Number of neuters/spays: 60506
Number of breed changes: 0
Number of color changes: 0


*No changes in animal type, breed, or color from intake to outcome, dropping the duplicate column. Also making column showing if an animal is neutered or spayed while in the shelter.*

In [32]:
# dropping duplicate columns
intakes_outcomes.drop(columns=['outcome_animal_type', 'outcome_breed', 'outcome_color'], inplace=True)

# renaming columns without intake specifier
intakes_outcomes.rename(columns={'intake_breed': 'breed', 'intake_color': 'color'}, inplace=True)

In [43]:
intakes_outcomes['spay_neuter'] = (intakes_outcomes['intake_gender'] != intakes_outcomes['outcome_gender']).astype(int)

In [44]:
# checking remaining null values
intakes_outcomes.isnull().sum()

animal_id           0
outcome_time        0
date_of_birth       0
outcome_type        0
outcome_gender      0
outcome_age         0
intake_time         0
found_location      0
intake_type         0
intake_condition    0
animal_type         0
intake_gender       0
intake_age          0
breed               0
color               0
stay_duration       0
spay_neuter         0
dtype: int64

*At most there are 60 observations with null values, dropping these rows.*

In [34]:
print(intakes_outcomes.shape)
intakes_outcomes.dropna(inplace=True)
print(intakes_outcomes.shape)

(150098, 16)
(150043, 16)


In [35]:
# function for converting age columns to age in months
def convert_age(age): 
    ### add doc string for fucntion!!!!!!!! ###
    value, unit = age.split()
    value = abs(int(value)) # assume the nagetive age is typo 
    
    if 'year' in unit:
        return value * 12
    elif 'month' in unit:
        return value
    elif 'week' in unit:
        return round(float(value * 0.23), 2)
    elif 'day' in unit:
        return round(float(value * 0.033), 2)
    else:
        return 0

In [36]:
intakes_outcomes['intake_age'] = intakes_outcomes['intake_age'].map(convert_age)
intakes_outcomes['outcome_age'] = intakes_outcomes['outcome_age'].map(convert_age)

In [37]:
# checking animal types
intakes_outcomes['animal_type'].value_counts(normalize=True)

animal_type
Dog          0.515359
Cat          0.421359
Other        0.057490
Bird         0.005598
Livestock    0.000193
Name: proportion, dtype: float64

In [38]:
# see breeds under Other animal type
intakes_outcomes[intakes_outcomes['animal_type'] == 'Other']['breed'].unique()

array(['Guinea Pig', 'Bat', 'Raccoon', 'Squirrel', 'Bat/Mex Free-Tail',
       'Opossum', 'Fox', 'Lizard/Gecko', 'Rabbit Sh', 'Tortoise', 'Deer',
       'Skunk', 'Rat', 'Jersey Wooly', 'Ringtail', 'Snake', 'Lop-French',
       'Ferret', 'Cottontail', 'Angora-English Mix',
       'Turtle/Redeared Slider', 'Lizard', 'Florida White',
       'Rabbit Sh Mix', 'Chinchilla', 'Coyote', 'Hamster',
       'Lizard/Bearded Dragon', 'Flemish Giant', 'Hedgehog',
       'Californian', 'Lop-Holland', 'Rex Mix', 'Lop-Holland Mix',
       'Californian Mix', 'Lionhead', 'Himalayan', 'Lop-Mini', 'Turtle',
       'Lionhead Mix', 'Dutch Mix', 'Rabbit Sh/Dwarf Hotot', 'Gerbil',
       'Rabbit Lh', 'Lop-English Mix', 'Angora-English', 'Snake/Python',
       'New Zealand Wht/Lop-Holland', 'Lop-Amer Fuzzy', 'Rex',
       'English Spot Mix', 'Hotot', 'New Zealand Wht', 'Harlequin Mix',
       'Armadillo', 'Rex-Mini', 'Dutch', 'English Spot', 'Cold Water',
       'Chinchilla-Stnd', 'Lop-Mini/Hotot', 'Mouse', 'Dwa

*There are very fiew observations labeled as Bird or Livestock. Animals labeled as Other contains some household pets but also a lot of wildlife that doesn't pertain to our problem statement. Dropping everything that isn't a cat or dog.*

In [40]:
# dropping obsevations that aren't cats or dogs
print(intakes_outcomes.shape)
intakes_outcomes = intakes_outcomes[intakes_outcomes['animal_type'] != 'Other']
intakes_outcomes = intakes_outcomes[intakes_outcomes['animal_type'] != 'Bird']
intakes_outcomes = intakes_outcomes[intakes_outcomes['animal_type'] != 'Livestock']
intakes_outcomes.shape

(150043, 16)


(140548, 16)

In [31]:
# saving combined data to use in other notebooks
intakes_outcomes.to_csv('../data/combined-shelter-data.csv', index=False)

In [45]:
intakes_outcomes.info()

<class 'pandas.core.frame.DataFrame'>
Index: 140548 entries, 0 to 150097
Data columns (total 17 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   animal_id         140548 non-null  object        
 1   outcome_time      140548 non-null  datetime64[ns]
 2   date_of_birth     140548 non-null  datetime64[ns]
 3   outcome_type      140548 non-null  object        
 4   outcome_gender    140548 non-null  object        
 5   outcome_age       140548 non-null  float64       
 6   intake_time       140548 non-null  datetime64[ns]
 7   found_location    140548 non-null  object        
 8   intake_type       140548 non-null  object        
 9   intake_condition  140548 non-null  object        
 10  animal_type       140548 non-null  object        
 11  intake_gender     140548 non-null  object        
 12  intake_age        140548 non-null  float64       
 13  breed             140548 non-null  object        
 14  color    

---
## Data Dictionary

All intake data is from [Austin Animal Center Intakes](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm/about_data) and outcome data is from [Austin Animal Center Outcomes](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238/about_data).

|feature|type|description|
|---|---|---|
|**animal_id**|*str*|Unique animal ID|
|**outcome_time**|*datetime*|Day and time of animal outcome|
|**date_of_birth**|*datetime*|Animal's date of birth|
|**outcome_type**|*str*|Outcome of animal|
|**outcome_gender**|*str*|Neuter/spay status at outcome|
|**outcome_age**|*float*|Animal age at outcome|
|**intake_time**|*datetime*|Day and time animal is taken in by shelter|
|**found_location**|*str*|Where animal is found|
|**intake_type**|*str*|How animal is taken in|
|**intake_condition**|*str*|Animal's health condition upon intake|
|**animal_type**|*str*|Type of animal|
|**intake_gender**|*str*|Neuter/spay status upon intake|
|**intake_age**|*float*|Animal age in months upon intake|
|**breed**|*str*|Animal breed|
|**color**|*str*|Animal color|
|**stay_duration**|*str*|How many days animal is in shelter before outcome|
|**spay_neuter**|*int*|1 if an animal is spayed or neutered while in the shelter|