FetchMaker’s mission is to match up prospective dog owners with their perfect pet. FetchMaker has been collecting data on their adoptable dogs, and it’s our job to analyze some of that data.

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Import data
dogs = pd.read_csv('data/dog_data.csv')
dogs.head()

Unnamed: 0,is_rescue,weight,tail_length,age,color,likes_children,is_hypoallergenic,name,breed
0,0,6,2.25,2,black,1,0,Huey,chihuahua
1,0,4,5.36,4,black,0,0,Cherish,chihuahua
2,0,7,3.63,3,black,0,1,Becka,chihuahua
3,0,5,0.19,2,black,0,0,Addie,chihuahua
4,0,5,0.37,1,black,1,1,Beverlee,chihuahua


FetchMaker estimates (based on historical data for all dogs) that 8% of dogs in their system are rescues. They would like to know if whippets are significantly more or less likely than other dogs to be a rescue.

Hypothesis:
- Null: 8% of whippets are rescues
- Alternative: more or less than 8% of whippets are rescues

In [3]:
#Number of whippets in the dataset
num_whippets = len(dogs[dogs['breed'] == 'whippet'])

#whippets that are rescue
rescue_whippets = dogs[dogs['breed'] == 'whippet']['is_rescue']

#Number of whippets that are rescue
num_whippets_rescue = np.sum(rescue_whippets == 1)

print('Total Number of whippets: ', num_whippets)
print('Number of rescue whippets: ', num_whippets_rescue)

Total Number of whippets:  100
Number of rescue whippets:  6


In [4]:
#Import binomial test
from scipy.stats import binom_test

#Perform binomial test
pval =  binom_test(num_whippets_rescue, num_whippets, 0.08)
print('P-Value:', pval)

P-Value: 0.5811780106238105


We choose the alternative hypothesis over the null hypothesis in this case.

Three of FetchMaker’s most popular mid-sized dog breeds are 'whippet's, 'terrier's, and 'pitbull's. Is there a significant difference in the average weights of these three dog breeds?

- Null: whippets, terriers, and pitbulls all weigh the same amount on average
- Alternative: whippets, terriers, and pitbulls do not all weigh the same amount on average (at least one pair of breeds has differing average weights)

In [5]:
# Subset to just whippets, terriers, and pitbulls
dogs_wtp = dogs[dogs.breed.isin(['whippet', 'terrier', 'pitbull'])]

# Subset to just poodles and shihtzus
dogs_ps = dogs[dogs.breed.isin(['poodle', 'shihtzu'])]

In [6]:
#Separating the weights for the different breeds

#Whippet weights
wt_whippet = dogs[dogs['breed'] == 'whippet']['weight']
wt_terrier = dogs[dogs['breed'] == 'terrier']['weight']
wt_pitbull = dogs[dogs['breed'] == 'pitbull']['weight']

#importing f_oneway for ANOVA test
from scipy.stats import f_oneway
fstat, pval = f_oneway(wt_whippet, wt_terrier, wt_pitbull)
print('P-Value: ', pval)

P-Value:  3.276415588274815e-17


With a p-value less than 0.05, shows that at a significant difference exists among at least one pair of dog breeds.

In [7]:
#Finding the pairs with the difference

from statsmodels.stats.multicomp import pairwise_tukeyhsd

tukey_results = pairwise_tukeyhsd(dogs_wtp.weight, dogs_wtp.breed, 0.05)
print(tukey_results)

  Multiple Comparison of Means - Tukey HSD, FWER=0.05  
 group1  group2 meandiff p-adj   lower    upper  reject
-------------------------------------------------------
pitbull terrier   -13.24   -0.0 -16.7278 -9.7522   True
pitbull whippet    -3.34 0.0638  -6.8278  0.1478  False
terrier whippet      9.9    0.0   6.4122 13.3878   True
-------------------------------------------------------


FetchMaker wants to know if 'poodle's and 'shihtzu's come in different colors. 

Hypothesis
- Null: There is an association between breed (poodle vs. shihtzu) and color.
- Alternative: There is not an association between breed (poodle vs. shihtzu) and color.

In [12]:
# Create a contingency table of color vs. breed
Xtab = pd.crosstab(dogs_ps.color, dogs_ps.breed)
print(Xtab)

# Run a Chi-Square Test
from scipy.stats import chi2_contingency
chi2, pval, dof, exp = chi2_contingency(Xtab)
print('No siginificant association' if pval > 0.05 else 'significant association exists between breed(poodle and shitzu) and color)')

breed  poodle  shihtzu
color                 
black      17       10
brown      13       36
gold        8        6
grey       52       41
white      10        7
significant association exists between breed(poodle and shitzu) and color)


In [11]:
print(pval)

0.005302408293244593
