# FetchMaker
Congratulations! You’ve just started working at the hottest new tech startup, FetchMaker. FetchMaker’s mission is to match up prospective dog owners with their perfect pet. FetchMaker has been collecting data on their adoptable dogs, and it’s your job to analyze some of that data.

Note that a solution.py file is also loaded for you in the workspace, which contains solution code for this project. We highly recommend that you complete the project on your own without checking the solution, but feel free to take a look if you get stuck or want to check your answers!

In [2]:
# import modules
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Data to the Rescue
## 1
FetchMaker has provided us with data for a sample of dogs from their app, including the following attributes:

- weight, an integer representing how heavy a dog is in pounds
- tail_length, a float representing tail length in inches
- age, in years
- color, a String such as "brown" or "grey"
- is_rescue, a boolean 0 or 1  

The data has been saved for you as a pandas DataFrame named dogs. Use the .head() method to inspect the first five rows of the dataset.

In [3]:
# Import data
dogs = pd.read_csv('dog_data.csv')

# Subset to just whippets, terriers, and pitbulls
dogs_wtp = dogs[dogs.breed.isin(['whippet', 'terrier', 'pitbull'])]

# Subset to just poodles and shihtzus
dogs_ps = dogs[dogs.breed.isin(['poodle', 'shihtzu'])]

In [12]:
dogs.head()
print(dogs['breed'].unique())

['chihuahua' 'greyhound' 'pitbull' 'poodle' 'rottweiler' 'shihtzu'
 'terrier' 'whippet']


In [13]:
dogs_wtp.head()

Unnamed: 0,is_rescue,weight,tail_length,age,color,likes_children,is_hypoallergenic,name,breed
200,0,71,5.74,4,black,0,0,Charlot,pitbull
201,0,26,11.56,3,black,0,0,Jud,pitbull
202,0,56,10.76,4,black,0,0,Rosamund,pitbull
203,0,33,6.32,4,black,1,0,Ruthann,pitbull
204,0,54,17.18,4,black,1,1,Bryon,pitbull


In [14]:
dogs_ps.head()

Unnamed: 0,is_rescue,weight,tail_length,age,color,likes_children,is_hypoallergenic,name,breed
300,0,58,8.05,1,black,1,0,Moise,poodle
301,0,56,9.44,4,black,1,0,Boote,poodle
302,1,59,4.04,4,black,1,0,Beatrix,poodle
303,0,70,12.37,1,black,1,0,Rabbi,poodle
304,0,52,11.42,2,black,0,0,Tallou,poodle


## 2
FetchMaker estimates (based on historical data for all dogs) that 8% of dogs in their system are rescues.

They would like to know if whippets are significantly more or less likely than other dogs to be a rescue.

Store the is_rescue values for 'whippet's in a variable called whippet_rescue.

In [25]:
whippit_rescue = dogs_wtp[dogs_wtp['breed'] == 'whippet']
whippit_rescue.head()

Unnamed: 0,is_rescue,weight,tail_length,age,color,likes_children,is_hypoallergenic,name,breed
700,0,12,12.87,3,black,0,1,Ingelbert,whippet
701,0,46,11.09,4,black,1,1,Carson,whippet
702,0,13,13.23,8,black,1,1,Glory,whippet
703,0,52,15.59,4,black,0,1,Patrizia,whippet
704,0,53,8.04,1,black,0,0,Cass,whippet


## 3
How many whippets are rescues (remember that the value of is_rescue is 1 for rescues and 0 otherwise)? Save this number as num_whippet_rescues and print it out.

In [26]:
num_whippet_rescues = np.sum(whippit_rescue['is_rescue'])
print(f'The number of whippet rescues is {num_whippet_rescues}.')

The number of whippet rescues is 6.


## 4
How many whippets are in this sample of data in total? Save this number as num_whippets and print it out.

In [35]:
num_whippets = whippit_rescue.size
print(f'The total number of whippets is {num_whippets}')

The total number of whippets is 900


## 5
Use a hypothesis test to test the following null and alternative hypotheses:

- Null: 8% of whippets are rescues
- Alternative: more or less than 8% of whippets are rescues

Save the p-value from this test as pval and print it out. Using a significance threshold of 0.05, Is the proportion of whippets who are rescues significantly different from 8%?

In [38]:
from scipy.stats import binom_test
# we use binomian because we  are looking at binary data
# is it a rescue or not
pval = binom_test(x=num_whippet_rescues, n=num_whippets, p=0.05)
print('8% of whippets are rescues' if pval > 0.05 else ' more or less than 8% of whippets are rescues')

 more or less than 8% of whippets are rescues


## 6
Three of FetchMaker’s most popular mid-sized dog breeds are 'whippet's, 'terrier's, and 'pitbull's. Is there a significant difference in the average weights of these three dog breeds?

To start answering this question, save the weights of each of these breeds in three separate series named wt_whippets, wt_terriers, and wt_pitbulls, respectively.

In [49]:
wt_whippets = dogs_wtp[dogs_wtp['breed'] == 'whippet']['weight']
wt_terriers = dogs_wtp[dogs_wtp['breed'] == 'terrier']['weight']
wt_pitbulls = dogs_wtp[dogs_wtp['breed'] == 'pitbull']['weight']

## 7
Run a single hypothesis test to address the following null and alternative hypotheses:

- Null: whippets, terriers, and pitbulls all weigh the same amount on average
- Alternative: whippets, terriers, and pitbulls do not all weigh the same amount on average (at least one pair of breeds has differing average weights)  

Save the resulting p-value as pval and print it out. Using a significance threshold of 0.05, is there at least one pair of dog breeds that have significantly different average weights?

In [59]:
from scipy.stats import f_oneway
_, pval = f_oneway( wt_whippets, wt_terriers, wt_pitbulls)
print('whippets, terriers, and pitbulls all weigh the same amount on average' if float(str(pval)) > 0.05 else 'whippets, terriers, and pitbulls all don\'t weigh the same amount on average')

whippets, terriers, and pitbulls all don't weigh the same amount on average


## 8 
If you completed the previous step correctly, you should have concluded that at least one pair of dog breeds have significantly different average weights.

Run another hypothesis test to determine which of those breeds (whippets, terriers, and pitbulls) weigh different amounts on average. Use an overall type I error rate of 0.05 for all three comparisons.

In [63]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd
tukey_results = pairwise_tukeyhsd(dogs_wtp['weight'], dogs_wtp['breed'], 0.05)
print(tukey_results)

 Multiple Comparison of Means - Tukey HSD, FWER=0.05 
 group1  group2 meandiff p-adj   lower  upper  reject
-----------------------------------------------------
pitbull terrier   -13.24  0.001 -16.728 -9.752   True
pitbull whippet    -3.34 0.0639  -6.828  0.148  False
terrier whippet      9.9  0.001   6.412 13.388   True
-----------------------------------------------------


**Mike:** From this data the pitbull is significantly different from the terried but not the whippetm and the terrier is significantly different from the whippet.

# Poodle and Shihtzu Colors
## 9
FetchMaker wants to know if 'poodle's and 'shihtzu's come in different colors. 

To start, use the subsetted data to create a contingency table of dog colors by breed (poodle vs. shihtzu). Save the table as Xtab and print it out.

In [65]:
dogs_ps.head()

Unnamed: 0,is_rescue,weight,tail_length,age,color,likes_children,is_hypoallergenic,name,breed
300,0,58,8.05,1,black,1,0,Moise,poodle
301,0,56,9.44,4,black,1,0,Boote,poodle
302,1,59,4.04,4,black,1,0,Beatrix,poodle
303,0,70,12.37,1,black,1,0,Rabbi,poodle
304,0,52,11.42,2,black,0,0,Tallou,poodle


In [67]:
Xtab = pd.crosstab(dogs_ps['color'], dogs_ps['breed'])
Xtab.head()

breed,poodle,shihtzu
color,Unnamed: 1_level_1,Unnamed: 2_level_1
black,17,10
brown,13,36
gold,8,6
grey,52,41
white,10,7


## 10
Run a hypothesis test for the following null and alternative hypotheses:

- Null: There is an association between breed (poodle vs. shihtzu) and color.
- Alternative: There is not an association between breed (poodle vs. shihtzu) and color.

Save the p-value as pval and print it out. Do poodles and shihtzus come in significantly different color combinations? Use a significance threshold of 0.05.

In [None]:
from scipy.stats import chi