## Austin Animal Center Adoption Analysis
Using data from the Austin Animal Center on outcomes of animals passed through their facility, we will train machine learning models to predict each animal's likelihood of adoption.

Data was gathered from [data.austintexas.gov](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238). The version of the data used in this notebook includes data from October 1st, 2013 to November 29th, 2022.

A SQL query was used to filter data for analysis:

```SQL
SELECT DISTINCT outcome_type,name,age_upon_outcome,sex_upon_outcome,animal_type FROM austin_animal_center_outcomes
WHERE outcome_type IS NOT NULL
    AND outcome_type NOT IN ('Return to Owner','Rto-Adopt')
    AND age_upon_outcome IS NOT NULL
    AND age_upon_outcome != 'NULL'
    AND age_upon_outcome NOT LIKE '-%'
    AND sex_upon_outcome != 'NULL'
```
The following exclusions were made:
- Animals which were returned to owner
- Animals with no listed outcome
- Animals with NULL or 'NULL' values in `age_upon_outcome` or `sex_upon_outcome`
- Animals with negative ages

Guidance for preparing data for, training, and using the models in this project came from the [GWC x BAXA Scikit-Learn Workshop](https://colab.research.google.com/drive/11zlsao10uB9acaXTOQBwxea9kcMTW-qU?usp=sharing).

In [1]:
# Essentials 
import pandas as pd
import sklearn as sk
import numpy as np

# Feature Engineering
from sklearn import preprocessing
from sklearn.model_selection import train_test_split

# Model Processing and Evaluating
import imblearn
import sklearn.pipeline
from imblearn.pipeline import make_pipeline
from sklearn import preprocessing
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import classification_report
from sklearn.neural_network import MLPClassifier

# Models
from sklearn import tree
from sklearn import neighbors
from sklearn import neural_network

data = pd.read_csv('aac_data_updated.csv')
data.head()

Unnamed: 0,outcome_type,name,age_upon_outcome,sex_upon_outcome,animal_type
0,Adoption,Gizmo,1 year,Neutered Male,Dog
1,Euthanasia,,1 year,Unknown,Other
2,Adoption,Moose,4 months,Neutered Male,Dog
3,Transfer,,6 days,Intact Male,Cat
4,Adoption,Princess,7 years,Spayed Female,Dog


The remaining columns are cleaned using the following methods:

**Name:**
- Fill empty values with `False`
- Set previously non-empty values to `1` and now-`False` values to `0`
- Rename column appropriately

**Age:**
- Group any animals aged < 1 into a single category
- Group any animals aged > 15 years into a single category

**Animal Type:**
- Change label of Livestock and Bird types to `'Other'`

**Outcome:**
- Change `'Adoption'` to `1` and anything else to `0` to represent `True` and `False`
- Rename column appropriately

In [2]:
# Simplify Name
data.fillna(False,inplace=True)
simplify_name = lambda name: 1 if name != False else 0

data['name'] = data['name'].apply(simplify_name)
data.rename(columns={"name": "is_named"},inplace=True)

# Simplify Age
simplify_age_young = lambda age: '< 1 year' if (age == '0 years' or 'week' in str(age) or 'month' in str(age) or 'day' in str(age)) else age
simplify_age_old = lambda age: '15+ years' if age in ('30 years','25 years','23 years', '22 years', '21 years', '20 years','19 years','18 years','17 years','16 years','15 years') else age

data['age_upon_outcome'] = data['age_upon_outcome'].apply(simplify_age_young)
data['age_upon_outcome'] = data['age_upon_outcome'].apply(simplify_age_old)

# Simplify Type
simplify_type = lambda type: 'Other' if type in ('Livestock','Bird','Other') else type
data['animal_type'] = data['animal_type'].apply(simplify_type)

# Simplify Outcome
simplify_outcome = lambda outcome: 1 if outcome == 'Adoption' else 0
data['outcome_type'] = data['outcome_type'].apply(simplify_outcome)
data.rename(columns={"outcome_type": "was_adopted"},inplace=True)

data

Unnamed: 0,was_adopted,is_named,age_upon_outcome,sex_upon_outcome,animal_type
0,1,1,1 year,Neutered Male,Dog
1,0,0,1 year,Unknown,Other
2,1,1,< 1 year,Neutered Male,Dog
3,0,0,< 1 year,Intact Male,Cat
4,1,1,7 years,Spayed Female,Dog
...,...,...,...,...,...
121564,1,1,< 1 year,Spayed Female,Dog
121565,1,1,< 1 year,Neutered Male,Dog
121566,0,0,10 years,Intact Male,Cat
121567,1,0,3 years,Intact Male,Dog


Now, to prepare for model training, categorical variables must be separated into dummy variables. For example, the `animal_type` column will become 3 separate columns (`animal_type_Dog`, `animal_type_Cat`, and `animal_type_Other`) with `1`s or `0`s as values to indicate the presence or absence of that quality.

In [3]:
cat_vars = ['age_upon_outcome','sex_upon_outcome','animal_type']
for var in cat_vars:
    cat_list = 'var'+'_'+var
    cat_list = pd.get_dummies(data[var],prefix=var)
    data1=data.join(cat_list)
    data=data1

data_vars = data.columns.values.tolist()
to_keep = [i for i in data_vars if i not in cat_vars]

data_final = data[to_keep]
data_final.columns.values

array(['was_adopted', 'is_named', 'age_upon_outcome_1 year',
       'age_upon_outcome_10 years', 'age_upon_outcome_11 years',
       'age_upon_outcome_12 years', 'age_upon_outcome_13 years',
       'age_upon_outcome_14 years', 'age_upon_outcome_15+ years',
       'age_upon_outcome_2 years', 'age_upon_outcome_25 years',
       'age_upon_outcome_3 years', 'age_upon_outcome_30 years',
       'age_upon_outcome_4 years', 'age_upon_outcome_5 years',
       'age_upon_outcome_6 years', 'age_upon_outcome_7 years',
       'age_upon_outcome_8 years', 'age_upon_outcome_9 years',
       'age_upon_outcome_< 1 year', 'sex_upon_outcome_Intact Female',
       'sex_upon_outcome_Intact Male', 'sex_upon_outcome_Neutered Male',
       'sex_upon_outcome_Spayed Female', 'sex_upon_outcome_Unknown',
       'animal_type_Cat', 'animal_type_Dog', 'animal_type_Other'],
      dtype=object)

The data must now be split between training and test sets for the model. For our models, 30% (`test_size=0.3`) will be reserved for testing.

In [4]:
label = pd.get_dummies(data['was_adopted'])
label = label.drop(0,axis=1)

features = data_final.drop(columns='was_adopted')

feat_train, feat_test, label_train, label_test = train_test_split(features,label,test_size=0.3, random_state=1, stratify=label)
label_train=label_train.squeeze()

Now, the training data is used to train 3 different models: a neural netword, a k-nearest neighbors classifier, and a decision tree.

Each model is tested and outputs its classification report.

In [5]:
#create model
mlp_model = sk.neural_network.MLPClassifier()

#fit and get predictions
mlp_model = mlp_model.fit(feat_train, label_train)
predictions = mlp_model.predict(feat_test)

#print results
print(pd.Series(predictions).value_counts())
print(classification_report(label_test, predictions))

1    23530
0    12941
dtype: int64
              precision    recall  f1-score   support

           0       0.92      0.74      0.82     16143
           1       0.82      0.95      0.88     20328

    accuracy                           0.86     36471
   macro avg       0.87      0.84      0.85     36471
weighted avg       0.87      0.86      0.85     36471



In [6]:
#create model
knn_model = sk.neighbors.KNeighborsClassifier(n_neighbors=2)

#fit and get predictions
knn_model = knn_model.fit(feat_train, label_train)
predictions = knn_model.predict(feat_test)

#print results
print(pd.Series(predictions).value_counts())
print(classification_report(label_test, predictions))

0    20364
1    16107
dtype: int64
              precision    recall  f1-score   support

           0       0.66      0.83      0.73     16143
           1       0.83      0.66      0.73     20328

    accuracy                           0.73     36471
   macro avg       0.74      0.74      0.73     36471
weighted avg       0.75      0.73      0.73     36471



In [7]:
#Create Decision Tree
decision_tree = tree.DecisionTreeClassifier(max_depth=5)

#fit and get the predictions
decision_tree = decision_tree.fit(feat_train, label_train)
predictions = decision_tree.predict(feat_test)

#print results
print(pd.Series(predictions).value_counts())
print(classification_report(label_test, predictions))

1    23630
0    12841
dtype: int64
              precision    recall  f1-score   support

           0       0.92      0.73      0.82     16143
           1       0.82      0.95      0.88     20328

    accuracy                           0.86     36471
   macro avg       0.87      0.84      0.85     36471
weighted avg       0.86      0.86      0.85     36471



In [49]:
print('Test an animal.\n')

test_data = {'is_named':0,
             'age_upon_outcome_1 year':[0],'age_upon_outcome_10 years':[0],
             'age_upon_outcome_11 years':[0],'age_upon_outcome_12 years':[0],
             'age_upon_outcome_13 years':[0],'age_upon_outcome_14 years':[0],
             'age_upon_outcome_15+ years':[0],'age_upon_outcome_2 years':[0],
             'age_upon_outcome_25 years':[0],'age_upon_outcome_3 years':[0],
             'age_upon_outcome_30 years':[0],'age_upon_outcome_4 years':[0],
             'age_upon_outcome_5 years':[0],'age_upon_outcome_6 years':[0],
             'age_upon_outcome_7 years':[0],'age_upon_outcome_8 years':[0],
             'age_upon_outcome_9 years':[0],'age_upon_outcome_< 1 year':[0],
             'sex_upon_outcome_Intact Female':[0],
             'sex_upon_outcome_Intact Male':[0],
             'sex_upon_outcome_Neutered Male':[0],
             'sex_upon_outcome_Spayed Female':[0],
             'sex_upon_outcome_Unknown':[0],
             'animal_type_Cat':[0],'animal_type_Dog':[0],'animal_type_Other':[0]}

is_named = input('Is the animal named? (Y/N)  ').lower()
if is_named in ('y','yes'):
    test_data['is_named'] = [1]

age = input('How old is the animal? (Enter as X month(s)/year(s)/etc.)  ').lower()
age = age.split()
age[0] = int(age[0])
print(age)
if 'year' not in age[1]:
    test_data['age_upon_outcome_< 1 year'] = [1]
elif age[0] >= 15:
    test_data['age_upon_outcome_15+ years'] = [1]
else:
    for i in range(1,15):
        if age[0] == i:
            test_data[f'age_upon_outcome_{age[0]} {age[1]}'] = [1]
            break


gender = input('Is the animal male or female? (Enter \'Unknown\' if Unknown)  ').lower()
if gender == 'unknown':
    test_data['sex_upon_outcome_Unknown'] = [1]
else:
    fixed = input('Is the animal fixed? (Y/N)  ').lower()
    if fixed in ('y','yes') and gender in ('m','male'):
        test_data['sex_upon_outcome_Neutered Male'] = [1]
    elif fixed in ('y','yes') and gender in ('f','female'):
        test_data['sex_upon_outcome_Spayed Female'] = [1]
    elif gender in ('m','male'):
        test_data['sex_upon_outcome_Intact Male'] = [1]
    else:
        test_data['sex_upon_outcome_Intact Female'] = [1]
    
type = input('Animal Type? (Dog, Cat, Other)  ').lower()
if type == 'dog':
    test_data['animal_type_Dog'] = [1]
elif type == 'cat':
    test_data['animal_type_Cat'] = [1]
else:
    test_data['animal_type_Other'] = [1]
    
test_df = pd.DataFrame(test_data)
test_df

Test an animal.

Is the animal named? (Y/N)  y
How old is the animal? (Enter as X month(s)/year(s)/etc.)  7 weeks
[7, 'weeks']
Is the animal male or female? (Enter 'Unknown' if Unknown)  unknown
Animal Type? (Dog, Cat, Other)  cat


Unnamed: 0,is_named,age_upon_outcome_1 year,age_upon_outcome_10 years,age_upon_outcome_11 years,age_upon_outcome_12 years,age_upon_outcome_13 years,age_upon_outcome_14 years,age_upon_outcome_15+ years,age_upon_outcome_2 years,age_upon_outcome_25 years,...,age_upon_outcome_9 years,age_upon_outcome_< 1 year,sex_upon_outcome_Intact Female,sex_upon_outcome_Intact Male,sex_upon_outcome_Neutered Male,sex_upon_outcome_Spayed Female,sex_upon_outcome_Unknown,animal_type_Cat,animal_type_Dog,animal_type_Other
0,1,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,1,1,0,0


In [50]:
decision_tree.predict(test_df)

array([0], dtype=uint8)