# Animal Shelter Outcome Analysis
This project focuses on finding the effects of type, gender, age, breed, reproductive state on an animal's adoption/ transfer. The analysis are answers the following questions:

- What are the general outcomes?
- Can we spot seasonality in the adoption/ transfer trends?
- Do cats have more luck in getting adopted?
- Do male dogs & cats have a better shot at getting adopted than their female counterparts?
- What are the most prevelants breeds of animals in the shelter?
- Do people prefer sterile animals for a pet?
- Is age a factor?
- A few interesting analyses on insights found along the way

The dataset can be downloaded from [here](https://www.kaggle.com/c/shelter-animal-outcomes/data). 

In [28]:
import pandas as pd
df = pd.read_csv('shelter_outcome.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26729 entries, 0 to 26728
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   AnimalID        26729 non-null  object
 1   Name            19038 non-null  object
 2   DateTime        26729 non-null  object
 3   OutcomeType     26729 non-null  object
 4   OutcomeSubtype  13117 non-null  object
 5   AnimalType      26729 non-null  object
 6   SexuponOutcome  26728 non-null  object
 7   AgeuponOutcome  26711 non-null  object
 8   Breed           26729 non-null  object
 9   Color           26729 non-null  object
dtypes: object(10)
memory usage: 2.0+ MB


In [29]:
df.isnull().sum()

AnimalID              0
Name               7691
DateTime              0
OutcomeType           0
OutcomeSubtype    13612
AnimalType            0
SexuponOutcome        1
AgeuponOutcome       18
Breed                 0
Color                 0
dtype: int64

Only the `'Name'` and `'OutcomeSubType'` columns have missing values which does not affect the analysis. <br>
Changing CamelCase columns to snake_case and getting rid of redundant details:

In [30]:
col_list =[]

for col in df.columns:
    col = col.lower()
    col = col.replace('animal', '')
    col = col.replace('uponoutcome', '')
    col = col.replace('outcome', 'outcome_')
    col_list.append(col)

df.columns = col_list
df.head()

Unnamed: 0,id,name,datetime,outcome_type,outcome_subtype,type,sex,age,breed,color
0,A671945,Hambone,2014-02-12 18:22:00,Return_to_owner,,Dog,Neutered Male,1 year,Shetland Sheepdog Mix,Brown/White
1,A656520,Emily,2013-10-13 12:44:00,Euthanasia,Suffering,Cat,Spayed Female,1 year,Domestic Shorthair Mix,Cream Tabby
2,A686464,Pearce,2015-01-31 12:28:00,Adoption,Foster,Dog,Neutered Male,2 years,Pit Bull Mix,Blue/White
3,A683430,,2014-07-11 19:09:00,Transfer,Partner,Cat,Intact Male,3 weeks,Domestic Shorthair Mix,Blue Cream
4,A667013,,2013-11-15 12:52:00,Transfer,Partner,Dog,Neutered Male,2 years,Lhasa Apso/Miniature Poodle,Tan


Date and time information are given in the object format. Converting them to datetime to use in time series analysis later:

In [31]:
import datetime as dt
df['datetime'] = pd.to_datetime(df['datetime'])
df.dtypes

id                         object
name                       object
datetime           datetime64[ns]
outcome_type               object
outcome_subtype            object
type                       object
sex                        object
age                        object
breed                      object
color                      object
dtype: object

Analysing the outcome strength of the shelter by year:

In [32]:
df['year'] = pd.DatetimeIndex(df['datetime']).year
df['year'].value_counts().sort_index()

2013     2702
2014    11179
2015    11481
2016     1367
Name: year, dtype: int64

In [33]:
df['month'] = pd.DatetimeIndex(df['datetime']).month
for i in range(2013, 2017):
    print('Months in the year {}: '.format(i), end = '')
    print(df.loc[(df['year'] == i), 'month'].unique())

Months in the year 2013: [10 11 12]
Months in the year 2014: [ 2  7  4  5  6  1  8 11  9 12  3 10]
Months in the year 2015: [ 1  3  4  6 11  9  8 10  7  5 12  2]
Months in the year 2016: [2 1]


As mentioned [here](https://www.kaggle.com/c/shelter-animal-outcomes/data), the years 2013 and 2016 has values only for a few months. This can hinder the analysis of yearly adoption/ transfer trends. So the data from these two years are removed from furthur analysis.

In [34]:
df.drop(df[(df['year'] == 2013) | (df['year'] == 2016)].index, inplace = True)
years = df['year'].unique()
total = df.shape[0]

print('Now the dataset has values only for the years {0} and {1} with a total strength of {2} animals'.format(years[0], years[1], total))

Now the dataset has values only for the years 2014 and 2015 with a total strength of 22660 animals


## General analysis of the outcomes types:

### Outcome proportion analysis:

In [35]:
df['outcome_type'].value_counts().sort_index()/total

Adoption           0.396205
Died               0.007635
Euthanasia         0.058870
Return_to_owner    0.178597
Transfer           0.358694
Name: outcome_type, dtype: float64

It can be see that about 40% of the animals get adopted, 35% of them are transferred elsewhere, 5% of the animals are put to sleep and a negligible number of animals die in this shelter. Each of the above category has a designated `'outcome_subtype'` column that gives further information about them. Their analysis are as follows:

### Adoption subtype:

In [36]:
adoption = df[df['outcome_type'] == 'Adoption']
adoption['outcome_subtype'].value_counts(dropna=False)/adoption.shape[0]

NaN        0.812876
Foster     0.170305
Offsite    0.016708
Barn       0.000111
Name: outcome_subtype, dtype: float64

The adoption subtype has over 80% missing values which could mean that when a person is adopting an animal, the shelter leaves the subtype column empty. About 20% of the animals are sent to foster care as it is difficult to care of animals that need special attention in places like common shelters.

### Transfer subtype:

In [37]:
transfer = df[df['outcome_type'] == 'Transfer']
transfer['outcome_subtype'].value_counts(dropna=False)/transfer.shape[0]

Partner    0.828863
SCRP       0.170768
NaN        0.000246
Barn       0.000123
Name: outcome_subtype, dtype: float64

Over 80% of the animals seem to be transferred to a partnering shelter for various reason which will be analysed along the way.

### Euthanasia subtype:

In [38]:
euthanasia = df[df['outcome_type'] == 'Euthanasia']
euthanasia['outcome_subtype'].value_counts()/euthanasia.shape[0]

Suffering              0.658171
Aggressive             0.197151
Behavior               0.048726
Rabies Risk            0.048726
Medical                0.042729
Court/Investigation    0.003748
Name: outcome_subtype, dtype: float64

A majority of the animals that are put to sleep seem to be suffering and another portion of the animals which could not be put up for adopted/ taken care of due behaviour/ diseases hold reasonable grounds for addressing euthanasia.

### Death subtype:

In [39]:
death = df[df['outcome_type'] == 'Died']
death['outcome_subtype'].value_counts()/death.shape[0]

In Kennel     0.572254
In Foster     0.254335
Enroute       0.046243
At Vet        0.023121
In Surgery    0.017341
Name: outcome_subtype, dtype: float64

The subtypes for death seems to be for common reason.