## Data Cleanup

The Austin Animal Center publishes daily data containing shelter animals processed through the shelter, various characteristics of the animal, and their outcomes. This notebook is used to clean the data in a format used for analysis. Note feature engineering is applied to the data. <br><br>

Data source: https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238 <br><br>

There was a Kaggle competition based on a similar dataset. The dataset provided from the website provides more data, including the date of birth of the animal and more observations. <br><br>

Kaggle competition: https://www.kaggle.com/c/shelter-animal-outcomes/data <br><br>

Note that there is a lot of data given that couldn't possible be known when an animal is processed into the animal shelter e.g. the day and time of the outcome. Although these can be useful pieces of information (e.g. during what month are adoptions most likely to happen?), we will focus our analysis on characteristics of the dog that would be known near the time of intake. We do wish that the day and time the animal was processed into the shelter was provided.

In [None]:
#Import relevant packages.
import pandas as pd

In [None]:
#Read the data.
dataset = pd.read_csv(r"C:\Users\Patrick\Desktop\ShelterOutcomes2\Austin_Animal_Center_Outcomes.csv")

#Only dogs are considered in this study.
dataset = dataset[dataset['Animal Type'] == 'Dog']

#Keep only the year of birth of the dog.
dataset['Year of Birth'] = dataset['Date of Birth'].map(lambda x: x[-4:])

#Separate the spay/neuter status and the sex of the dog.
dataset['Sex upon Outcome'] = dataset['Sex upon Outcome'].map(lambda x: 'Unknown Unknown' if x == 'Unknown' else x)
split = dataset['Sex upon Outcome'].str.split(pat = " ", n = 1, expand = True)
dataset['Altered'] = split[0].map(lambda x: 'Altered' if x == 'Neutered' or x == 'Spayed' else x)
dataset['Sex'] = split[1]

#Drop data that is missing outcome types; there are only three of them.
dataset = dataset[pd.notna(dataset['Outcome Type'])]

#Drop data that won't be used in our analysis.
to_drop = ['DateTime', 'MonthYear', 'Age upon Outcome', 'Animal Type', 'Outcome Subtype', 'Date of Birth', 'Sex upon Outcome']
for col in to_drop:
    dataset.drop(col, axis = 1, inplace = True)

In [None]:
#Export the data
dataset.to_csv(path_or_buf = r"C:\Users\Patrick\Desktop\ShelterOutcomes2\Austin_Animal_Center_Outcomes_clean.csv"
               , index = False)