**Overview**

For this project I am interested in if dog color, size or behavior attributes result in more adoptions. I am also interested if there are more animals of a specific demographic that are entered into the system and therefore result in more adoptions.

**Data Profile**

I will be using datasets from Petfinder, which is a website that connects potential adopters to available animals, and the Austin Animal Shelter, which has information relating to what happens to the animals that are dropped off there. 

Some of the interesting attributes from the Austin Animal Shelterare: 
-Animal_Type
-Breed
-Color
-Outcome_subtype

From Petfinder: 
-Type 
-Color
-Breed
-Size

I will use the Austin Animal Shelter dataset to analysis what happens to the dogs that are left there and if there are any signficant attribute groupings. I will then use the Petfinder dataset to see the overall layout of dogs that are available and have been adopted through the site.

**Analysis **

Importing both datasets

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import datetime as dt 
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt


In [None]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("PetFinder")
secret_value_1 = user_secrets.get_secret("PetFinder_Secret")


In [None]:
pip install petpy

In [None]:
from petpy import Petfinder
pf = Petfinder(key=secret_value_0, secret=secret_value_1)

**Austin Animal Shelter Dataset**

To start, I will narrow down the animal_type to only dogs. I will also see the varity of possible outcomes for those dogs and remove the ones that don't apply to the research question.

In [None]:
df = pd.read_csv('/kaggle/input/austin-animal-center-shelter-outcomes-and/aac_shelter_outcomes.csv')
print(df.shape)
df.head(50)

#This is to see an overview of how the data is presented

In [None]:
sns.countplot(df.animal_type, palette='pastel')

In [None]:
df = df[~df['animal_type'].isin(['Cat','Other','Bird','Livestock',])]

df.head(50)

#I have decided to focus soley on dogs for this assignment. 
#So I redefined the dataset to only include those animals whose animal_type is listed as dog
#I printed out a graph to check

In [None]:
sns.countplot(df.animal_type, palette='pastel')


In [None]:
plt.figure(figsize=(20, 8))
sns.countplot(df.sex_upon_outcome, palette='pastel')

#This graph represented the sex of the animal at the end of it's journey at the shelter
#I thought it would be interesting to known more about the dog demographics
#I am using a bar graph to show the sex vs count
#I made the graph bigger so it was easier to read

In [None]:
plt.figure(figsize=(30, 8))
sns.countplot(df.outcome_type, palette='pastel')

#Here all all of the different outcomes for the dogs. I am removing the last four since there number is small and doesn't apply to my research question.

In [None]:
df = df[~df['outcome_type'].isin(['Died','Missing','Disposal','Rto-Adopt',])]
plt.figure(figsize=(30, 8))
sns.countplot(df.outcome_type, palette='pastel')

#Since several of the outcomes have very few  results and aren't in line with our research question. I removed them.

In [None]:
df = df[~df['outcome_type'].isin(['Return to Owner',])]
plt.figure(figsize=(30, 8))
sns.countplot(df.outcome_type, palette='pastel')

#I am also removing Return to Owner since it also doesn't really apply.

I am also interested in looking at the outcome_subtypes. This might give us more instight in to WHY the dog had the outcome they did.

In [None]:
plt.figure(figsize=(30, 8))
sns.countplot(df.outcome_subtype, palette='pastel')

#These are all of the subtypes associated with these outcomes. 

In [None]:
df = df[~df['outcome_subtype'].isin(['In Surgery','Medical','Possible Theft','Court/Investigation','At Vet','Enroute','Barn','Snr','In Kennel'])]
plt.figure(figsize=(30, 8))
sns.countplot(df.outcome_subtype, palette='pastel')

#I am also removing the subtypes with low numbers and that don't apply to the research question.

In [None]:
#In order to find out more about how age impacts the outcome of the animal, I need to write a function to transform the age_upon_outcome column into an integer. 


def years_old(x):  #function years_old to hold transformation
    x = str(x)   #converts all values to strings
    if x == 'nan': #if the value is null, removes
        return 0
    HowOld = int(x.split()[0])  #splits the string and converts all numbers to integers with base 0
    if x.find('year') != -1:    #if year in string, returns that numbers
        return HowOld   
    if x.find('month')!= -1:       #if month in string, returns that number divided by 12
        return HowOld  / 12
    if x.find('week')!= -1:        #if week in string, returns that number divided by 52
        return HowOld  / 52
    if x.find('day')!= -1:         #if day in string, returns that number divided by 365
        return HowOld  / 365
    else: 
        return 0                   #if no value in string, returns 0
df['AnimalAge'] = df.age_upon_outcome.apply(years_old)    #uses variable AnimalAge to apply function to dataset. 
print(df['AnimalAge'].head(5))   #print dataset head/tail to see if it worked
df['AnimalAge'].tail(5)

In [None]:
plt.figure(figsize=(20, 8))
sns.boxplot(x='outcome_type', y='AnimalAge', data=df, palette='pastel' )

#this plot uses the AnimalAge variable we defined above. 
#this shows the outcome of the animal along with it's relation to the animal's age


In the graph above, it is important to note that transfer is referring to an another rescue organization, we see in the outcome_subtypes that partner is the largest category. Adoption is referring to an adoption directly from the facility. It is interesting that age for euthanasia is older than the ages for transfer and adoption. We might be able to see why when we further break this down. 

In [None]:
sns.set(style="ticks", palette="pastel")
plt.figure(figsize=(20, 8))
# Draw a nested boxplot to show the outcome_type with outcome_subtype by Age
sns.boxplot(x="outcome_type", y="AnimalAge",
            hue="outcome_subtype",
            data=df)


In the graph above we can clearly see that if an animal is transfered it goes with a partner. If an animal is adopted, it might be fostered or offsite, and finally, if an animal is euthansied the main causes are aggressive, suffering, behavior or a rabies risk. It is interesting to note that suffering type has a significantly older demo which makes sense than the other types. 

In [None]:
#Here I am taking the top colors that have over 800 values and adding them to a list, to narrow down the total amount so they will be able to be displayed on a graph. 


color_counts = df['color'].value_counts() #counting values
color_others = set(color_counts[color_counts < 800].index) #setting colors with less than 800 counts to other
df['top_colors'] = df['color'].replace(list(color_others), 'Others')
print(df['top_colors'].nunique()) #making a list of unique color values

print(df['top_colors'])
df['top_colors'].head()

sns.set(style="ticks", palette="pastel") 
plt.figure(figsize=(80, 8))
# Draw a nested boxplot to show Animal Age by Color and Outcome
sns.boxplot(x="top_colors",y="AnimalAge",
            hue="outcome_type",
            data=df)


In this graph it is interesting to note that black/white when as seen in the graph below has the largest total dog count but in the graph above seems to have a small age range for adoption and transfer. 

In [None]:
df.head()

plt.figure(figsize=(80, 8))
sns.countplot(df.top_colors, palette='pastel')

In [None]:
#Here I am writing a function to see the split between pure breed dogs and mixed breed dogs. I am not sure how accurate the data is from the shelter though. 

def pure_breed(m):
    m=str(m)
    if m.find('Mix') != -1:
        return 'mixed_breed'
    else:
        return 'pure_breed'
    
    
df['Mix']=df.breed.apply(pure_breed)

sns.countplot(df.Mix, palette='pastel')

Not surprisingly, the majority of dogs are listed as mixed breed animals. 

In [None]:
#Here I am generating a list of the top breeds in this data set. 
df['breed'].value_counts()

In [None]:
#Here I am making a list of the breeds that have over 300 values, so I am able to graph them. 

breed_counts = df['breed'].value_counts()#counting values
breed_others = set(breed_counts[breed_counts < 300].index)  #setting colors with less than 300 counts to other
df['top_breeds'] = df['breed'].replace(list(breed_others), 'Others')
print(df['top_breeds'].nunique()) #making a list of unique breed values

print(df['top_breeds'])
df['top_breeds'].head()

sns.set(style="ticks", palette="pastel")
plt.figure(figsize=(80, 8))
# Draw a nested boxplot to show AnimalAge by Breed and Outcome
sns.boxplot(x="top_breeds",y="AnimalAge",
            hue="outcome_type",
            data=df)
#sns.despine(offset=10, trim=True)

In [None]:
plt.figure(figsize=(80, 8))
sns.countplot(df.top_breeds, palette='pastel')

With the two graphs above, it is interesting to note that pit bull terrier is the most common breed, they also have the lowest age ranges of transfer,adopt and euthansia. Perhaps since they are an aggressive breed dog, they overall have a shorter lifespan. Perhaps graphing by sub_type will help us visualize.

In [None]:
sns.set(style="ticks", palette="pastel")
plt.figure(figsize=(80, 8))
# Draw a nested boxplot to show AnimalAge by Breed and Outcome_subtype
sns.boxplot(x="top_breeds",y="AnimalAge",
            hue="outcome_subtype",
            data=df)
#sns.despine(offset=10, trim=True)

Pit bulls have notably shorter lifespans and the youngest for euthansia if they are suffering. Pit bulls, labs and other are also the only breeds that have all outcome_subtypes that led to being euthansied, which we saw in a previous graph. 

In [None]:
#Here is a graph of outcome_subtypes broken down by color. 
sns.set(style="ticks", palette="pastel")
plt.figure(figsize=(80, 8))
# Draw a nested boxplot to show AnimalAge by Breed and Outcome_subtype
sns.boxplot(x="top_colors",y="AnimalAge",
            hue="outcome_subtype",
            data=df)
#sns.despine(offset=10, trim=True)

**Petfinder**



Since the Petfinder api uses the curl method, and also needs to be refreshed every 60 minutes, I decided to use the Petpy api wrapper. This made call the different attributes somewhat different than I had done in the Austin Animal Shelter dataset. They also limit the calls per day, and amount of items returned per call.

In [None]:
#adoptable dog variable
dogs = pf.animals(animal_type='dog', status='adoptable', breed='pit bull terrier,chihuahua,labrador retriever',
                  results_per_page=100, pages=10, return_df=True)


dogs.head(10)

Since in the dataset above we saw that pit bulls, chihuahuas and labs were the most common types of dogs, I will limit my data set here to those types of dogs. I will compare the datasets for those dogs that have the status adoptable vs adopted in terms of age, coat and gender. I  will also filter the results by specific behavior attributes to see if that increases or decreased the overall amount of animals.

In [None]:
#Here is the list of available breeds
dogs_breed = pf.breeds('dog', return_df=True)
dogs_breed.groupby('breed').head()

In [None]:
#adopted dog variable
dogs_adopted = pf.animals(animal_type='dog', status='adopted', breed='pit bull terrier,chihuahua,labrador retriever',
                  results_per_page=100, pages=10, return_df=True)

dogs_adopted.head(10)

In [None]:
#here I am sorting both variable by if they are  good with children
dogs.sort_values('environment.children',ascending=True)
dogs_adopted.sort_values('environment.children',ascending=True)

In [None]:
#here is the breakdown of gender for the adoptable dogs
plt.figure(figsize=(20, 8))
sns.countplot(dogs.gender, palette='pastel')

In [None]:
#here is the breakdown of gender for the adopted dogs
plt.figure(figsize=(20, 8))
sns.countplot(dogs_adopted.gender, palette='pastel')

There are more  female dogs that have been adopted while more male dogs are up for adoption.

In [None]:
#Here is breakdown of age of the adoptable dogs
plt.figure(figsize=(20, 8))
sns.countplot(dogs.age, palette='pastel')

In [None]:
#Here is breakdown of age of the adoptable dogs
plt.figure(figsize=(20, 8))
sns.countplot(dogs_adopted.age, palette='pastel')

There are significantly more baby (puppies!) that have been adopted than are up for adoption. Similarily, more young dogs that have been adopted than are up for adoption. The younger the dog the more popular it seems. While, the most adult dogs are up for adoption. 

In [None]:
#Here I am looking at the coats of the adoptable dogs. 
plt.figure(figsize=(20, 8))
sns.countplot(dogs.coat, palette='pastel')

In [None]:
#Here I am looking at the coats of the adopted dogs. 
plt.figure(figsize=(20, 8))
sns.countplot(dogs_adopted.coat, palette='pastel')

It is interesting that not enought curly or wire haired dogs to plot on the adopted grid. This might be due to the breeds that we have limited it down to. 

**Results**

Overall, I learned that behavior seemed to have more of a factor than breed, but pit bulls seemed to have the worse outcome of all the breeds by having significantly lower lifespan ranges which suggests that they were euthansied at a yonger age more frequently especially since they are the most common type of dog. The results were about color where inconclusive. There was a  pretty even split of outcome_subtypes across all colors unlike breed. The results from petfinder where pretty staggering when it came to age and adoption. Even though there are many more adult dogs  up for adoption, people continue to adopt puppies! 

****Conclusion********

Overall, I thought this project was pretty challenging. Using the Petfinder api was timeconsuming to understand how to actually call the data since it was different than other sources we have looked at. It was also challending because some of the documentation did not see entirely accurate. I would like to further explore the petfinder dataset since I  had some trouble doing more complex visualizations. 