# Animal Shelter Outcomes

Every year, approximately 7.6 million companion animals end up in US shelters. Many animals are given up as unwanted by their owners, while others are picked up after getting lost or taken out of cruelty situations. Many of these animals find forever families to take them home, but just as many are not so lucky. 2.7 million dogs and cats are euthanized in the US every year.

### Data

Kaggle is hosting this competition for the machine learning community to use for data science practice and social good. The dataset is brought to you by Austin Animal Center. Shelter animal statistics were taken from the ASPCA.

### Predictors and Outcomes

Predictors are age, animal type and sex. The outcome will identify whether the animal was adopted, killed, returned to owner, or transfered.

In [2]:
import pandas as pd
import numpy as np
import matplotlib as pyplot
%matplotlib inline

In [73]:
# First import data file and look at data fields

animals = pd.read_csv("train.csv")
animals.head(5)

Unnamed: 0,AnimalID,Name,DateTime,OutcomeType,OutcomeSubtype,AnimalType,SexuponOutcome,AgeuponOutcome,Breed,Color
0,A671945,Hambone,2014-02-12 18:22:00,Return_to_owner,,Dog,Neutered Male,1 year,Shetland Sheepdog Mix,Brown/White
1,A656520,Emily,2013-10-13 12:44:00,Euthanasia,Suffering,Cat,Spayed Female,1 year,Domestic Shorthair Mix,Cream Tabby
2,A686464,Pearce,2015-01-31 12:28:00,Adoption,Foster,Dog,Neutered Male,2 years,Pit Bull Mix,Blue/White
3,A683430,,2014-07-11 19:09:00,Transfer,Partner,Cat,Intact Male,3 weeks,Domestic Shorthair Mix,Blue Cream
4,A667013,,2013-11-15 12:52:00,Transfer,Partner,Dog,Neutered Male,2 years,Lhasa Apso/Miniature Poodle,Tan


In [65]:
# How many records are there?

print len(animals.index)

26729


In [72]:
# Check for missing data

missing_name = animals['Name'].isnull().sum()
print "Missing Names: %r" %missing_name

missing_datetime = animals['DateTime'].isnull().sum()
print "Missing Dates: %r" %missing_datetime

missing_type = animals['AnimalType'].isnull().sum()
print "Missing Animal Types: %r" %missing_type

missing_sex = animals['SexuponOutcome'].isnull().sum()
print "Missing Sexes:  %r" %missing_sex

missing_age = animals['AgeuponOutcome'].isnull().sum()
print "Missing Ages: %r" %missing_age

missing_breed = animals['Breed'].isnull().sum()
print "Missing Breeds: %r" %missing_breed

missing_color = animals['Color'].isnull().sum()
print "Missing Colors: %r" %missing_color

Missing Names: 7691
Missing Dates: 0
Missing Animal Types: 0
Missing Sexes:  1
Missing Ages: 18
Missing Breeds: 0
Missing Colors: 0


While names will likely not affect the outcome of shelter animals, the missing ages and sexes will need to be replaced with values as they will be an important predictor on whether or not an animal gets adopted.

In [74]:
# Split ages into values that can be converted to integers

new = animals['AgeuponOutcome'].str.partition(" ")
new.head(5)


Unnamed: 0,0,1,2
0,1,,year
1,1,,year
2,2,,years
3,3,,weeks
4,2,,years


In [60]:
new.columns = ['how_old','space','age_unit']

In [61]:
del new['space']

In [62]:
new_animals = pd.concat([animals, new],axis = 1)
new_animals.head(5)

Unnamed: 0,ID,Name,DateTime,AnimalType,SexuponOutcome,AgeuponOutcome,Breed,Color,how_old,age_unit
0,1,Summer,2015-10-12 12:15:00,Dog,Intact Female,10 months,Labrador Retriever Mix,Red/White,10,months
1,2,Cheyenne,2014-07-26 17:59:00,Dog,Spayed Female,2 years,German Shepherd/Siberian Husky,Black/Tan,2,years
2,3,Gus,2016-01-13 12:20:00,Cat,Neutered Male,1 year,Domestic Shorthair Mix,Brown Tabby,1,year
3,4,Pongo,2013-12-28 18:12:00,Dog,Intact Male,4 months,Collie Smooth Mix,Tricolor,4,months
4,5,Skooter,2015-09-24 17:59:00,Dog,Neutered Male,2 years,Miniature Poodle Mix,White,2,years


In [6]:
pd.unique(animals["SexuponOutcome"])

array(['Intact Female', 'Spayed Female', 'Neutered Male', 'Intact Male',
       'Unknown'], dtype=object)

In [43]:
pd.unique(animals['AnimalType'])

array(['Dog', 'Cat'], dtype=object)

In [63]:
breeder = pd.unique(animals['Breed'])
len(breeder)


913