In [20]:
import pandas as pd

data = pd.read_csv("thanksgiving.csv", encoding = "Latin-1")

print(data.head(2))

   RespondentID Do you celebrate Thanksgiving?  \
0    4337954960                            Yes   
1    4337951949                            Yes   

  What is typically the main dish at your Thanksgiving dinner?  \
0                                             Turkey             
1                                             Turkey             

  What is typically the main dish at your Thanksgiving dinner? - Other (please specify)  \
0                                                NaN                                      
1                                                NaN                                      

  How is the main dish typically cooked?  \
0                                  Baked   
1                                  Baked   

  How is the main dish typically cooked? - Other (please specify)  \
0                                                NaN                
1                                                NaN                

  What kind of stuffing/dressing do

In [21]:
print(data.columns.values)

['RespondentID' 'Do you celebrate Thanksgiving?'
 'What is typically the main dish at your Thanksgiving dinner?'
 'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)'
 'How is the main dish typically cooked?'
 'How is the main dish typically cooked? - Other (please specify)'
 'What kind of stuffing/dressing do you typically have?'
 'What kind of stuffing/dressing do you typically have? - Other (please specify)'
 'What type of cranberry saucedo you typically have?'
 'What type of cranberry saucedo you typically have? - Other (please specify)'
 'Do you typically have gravy?'
 'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts'
 'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots'
 'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Cauliflower'
 'Which of these

In [22]:
data["Do you celebrate Thanksgiving?"].value_counts()

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64

In [23]:
thanksgiving_folk = data[data["Do you celebrate Thanksgiving?"] == "Yes"]

In [24]:
thanksgiving_folk["What is typically the main dish at your Thanksgiving dinner?"].value_counts()

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

In [25]:
data[data["What is typically the main dish at your Thanksgiving dinner?"]== "Tofurkey"]["Do you typically have gravy?"]


4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object

In [26]:
apple_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"])

pumpkin_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"])

pecan_isnull = pd.isnull(data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"])

ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull

ate_pies.value_counts()

False    876
True     182
dtype: int64

In [27]:
def convert_age_to_int(string):
    if(pd.isnull(string)):
        return None
    string = string.split(" ")[0]
    string = string.replace("+","")
    return int(string)
        
data["int_age"] = data["Age"].apply(convert_age_to_int)

data["int_age"].describe()

count    1025.000000
mean       39.383415
std        15.398493
min        18.000000
25%        30.000000
50%        45.000000
75%        60.000000
max        60.000000
Name: int_age, dtype: float64

## Is there anything that we should be aware of about the results or our methodology?
#### When finding the average age, we were using the low end of a range of values (18 for 18 - 29, 30 for 30 - 44, 45 for 45 - 59, and 60 for 60+) along with not accounting for much higher end values. Our mean could be 10 - 15 years lower than the actual, given that we consistently used the low end of the range of values.

## Is this a true depiction of the ages of the survey participants?
#### It gives us a rough estimate that the survey was towards the upper 40's and lower 50's age range, with half identifying in the 60 years of age and up. I believe the median would be a better estimate of this case or 1 standard deviation over of 55 years of age.

In [28]:
def convert_salary_to_int(string):
    if(pd.isnull(string)):
        return None
    string = string.split(" ")[0]
    if(string == "Prefer"):
        return None
    string = string.replace("$","")
    string = string.replace(",","")
    return int(string)
        
data["int_income"] = data["How much total combined money did all members of your HOUSEHOLD earn last year?"].apply(convert_salary_to_int)

data["int_income"].describe()

count       889.000000
mean      74077.615298
std       59360.742902
min           0.000000
25%       25000.000000
50%       50000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64

# Is there anything that we should be aware of about the results of our methodology?
#### Similar to age, we were using the bottom end of a range, but it appears that the distribution of salary is significantly more balanced. The only discrepency is the standard deviation being very large indicating almost a two peak distribution. 

# Is this a true depiction of the incomes of survey participants?
#### I think what this does is show that majority of the population is toward the high end of the standard deviation in the age range as the salaries are on the high side. Generally, you make more money is you are older and there is a skew right in the amount of money.

In [29]:
less_than_most = data[int_income < 150000]
less_than_most["How far will you travel for Thanksgiving?"].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64

In [30]:
more_than_most = data[int_income > 150000]
more_than_most["How far will you travel for Thanksgiving?"].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64

## Findings
#### It appears that there isn't that much of a significant difference in travel for household incomes over 150,000 vs. under 150,000. It does appear though, if there is long travel, it is significantly more likely that it's someone that makes more than 150,000 than someone that makes less.

In [31]:
data.pivot_table(
    index="Have you ever tried to meet up with hometown friends on Thanksgiving night?", 
    columns='Have you ever attended a "Friendsgiving?"',
    values="int_age"
)

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


In [32]:
import numpy as np
data.pivot_table(
    index="Have you ever tried to meet up with hometown friends on Thanksgiving night?", 
    columns='Have you ever attended a "Friendsgiving?"',
    values="int_income",
    aggfunc = np.mean
)

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


# Findings
#### It appears those with lower income are more likely to have a "Friendsgiving". As stated in previous findings, it appears more likely that older people tend to make more money than younger as in individuals 45 and older make more money than those that are less than 45. This cues us to believe that the celebration of "Friendsgiving" is more likely to occur with young people or people that make less than 73,000 dollars.