### Wrangle Data

The following data was acquired from the Austin Animal Center and was last updated October 11th, 2021. There are two different files. One is for the animal center intakes and the other is for the animal center outcomes. I plan on merging both of these files so that I have information on which animals were adopted and which weren't. I can't just use the outcomes file because it only includes the animals that had a certain outcome.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Get the Intake Data

In [2]:
intakes = pd.read_csv('Austin_Animal_Center_Intakes.csv')

In [3]:
intakes.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Found Location,Intake Type,Intake Condition,Animal Type,Sex upon Intake,Age upon Intake,Breed,Color
0,A786884,*Brock,01/03/2019 04:19:00 PM,01/03/2019 04:19:00 PM,2501 Magin Meadow Dr in Austin (TX),Stray,Normal,Dog,Neutered Male,2 years,Beagle Mix,Tricolor
1,A706918,Belle,07/05/2015 12:59:00 PM,07/05/2015 12:59:00 PM,9409 Bluegrass Dr in Austin (TX),Stray,Normal,Dog,Spayed Female,8 years,English Springer Spaniel,White/Liver
2,A724273,Runster,04/14/2016 06:43:00 PM,04/14/2016 06:43:00 PM,2818 Palomino Trail in Austin (TX),Stray,Normal,Dog,Intact Male,11 months,Basenji Mix,Sable/White
3,A665644,,10/21/2013 07:59:00 AM,10/21/2013 07:59:00 AM,Austin (TX),Stray,Sick,Cat,Intact Female,4 weeks,Domestic Shorthair Mix,Calico
4,A682524,Rio,06/29/2014 10:38:00 AM,06/29/2014 10:38:00 AM,800 Grove Blvd in Austin (TX),Stray,Normal,Dog,Neutered Male,4 years,Doberman Pinsch/Australian Cattle Dog,Tan/Gray


In [4]:
intakes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 132220 entries, 0 to 132219
Data columns (total 12 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   Animal ID         132220 non-null  object
 1   Name              91800 non-null   object
 2   DateTime          132220 non-null  object
 3   MonthYear         132220 non-null  object
 4   Found Location    132220 non-null  object
 5   Intake Type       132220 non-null  object
 6   Intake Condition  132220 non-null  object
 7   Animal Type       132220 non-null  object
 8   Sex upon Intake   132219 non-null  object
 9   Age upon Intake   132220 non-null  object
 10  Breed             132220 non-null  object
 11  Color             132220 non-null  object
dtypes: object(12)
memory usage: 12.1+ MB


Key Takeaways:
* There are 132,220 entries.
* MonthYear appears to be the same info as DateTime, so I can drop it. I think I will create a new column called 'time_at_shelter' and investigate for a relationship with adoptability.
* There are a few null values, but most of them can be dropped without worry.
* I will drop 'Name' because it won't have any bearing on adoptability
* I will also drop 'Found Location' because it shouldn't matter.
* I will eventually drop 'DateTime' but I need it to create the 'time_at_shelter' column
* I will rename the 'Intake Type', 'Intake Condition', 'Animal Type', 'Sex upon Intake', and 'Age upon Intake' columns
* There is a single null for 'Sex upon Intake'. I will probably just guess the sex, but may end up dropping it.
* I would like to change age to an int for years
* I may create a new column called 'spayed/neutered' and change the sex to only male or female. However, I believe all animals are spayed or neutered before being put up for adoption, so it may not matter at all for this project.

In [8]:
intakes = intakes.drop(columns = ['Name', 'MonthYear', 'Found Location'])

In [9]:
intakes.rename(columns = {
    'Intake Type': 'intake_type',
    'Intake Condition': 'intake_condition',
    'Animal Type': 'animal_type',
    'Sex upon Intake': 'intake_sex',
    'Age upon Intake': 'intake_age'
}, inplace = True)

In [10]:
#Assume the animal with a missing sex value is a netured male
intakes = intakes.fillna('Neutered Male')

In [11]:
intakes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 132220 entries, 0 to 132219
Data columns (total 9 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   Animal ID         132220 non-null  object
 1   DateTime          132220 non-null  object
 2   intake_type       132220 non-null  object
 3   intake_condition  132220 non-null  object
 4   animal_type       132220 non-null  object
 5   intake_sex        132220 non-null  object
 6   intake_age        132220 non-null  object
 7   Breed             132220 non-null  object
 8   Color             132220 non-null  object
dtypes: object(9)
memory usage: 9.1+ MB


In [12]:
intakes.isnull().sum()

Animal ID           0
DateTime            0
intake_type         0
intake_condition    0
animal_type         0
intake_sex          0
intake_age          0
Breed               0
Color               0
dtype: int64