# Thanksgiving Dinner Statistics 

This data analyzes 1058 responses to an online survey data (survey monkey) fielded in 2015 regarding what Americans eat for Thanksgiving dinner.  The data comes from https://fivethirtyeight.com/features/heres-what-your-part-of-america-eats-on-thanksgiving/.  The results of the analysis should be interpreted with caution as the responses are likely biased since the main mechanism through which it was executed was via online.  It is likely that the population that responds is more computer saavy, younger and more educated than the standard person in the population.  Nevertheless it does allow us to explore some of the regional differences in behavior (that are also correlated with the make-up and demographics of the population).

In [2]:
import pandas as pd

#read-in the thanksgiving dinner data that is based on survey data using pandas
data = pd.read_csv("thanksgiving.csv", encoding="Latin-1")
data.head(2)

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central


In [3]:
print(data.columns)

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

In [4]:
#Filtering out rows where people do not celebrate thanksgiving
col = data["Do you celebrate Thanksgiving?"]
#print(col)
cnt_values = col.value_counts()
yes_thanksgiving = data[col == "Yes"]
print(yes_thanksgiving["Do you celebrate Thanksgiving?"].head(5))
print(len(yes_thanksgiving["Do you celebrate Thanksgiving?"]))

0    Yes
1    Yes
2    Yes
3    Yes
4    Yes
Name: Do you celebrate Thanksgiving?, dtype: object
980


In [5]:
#examining the type of dishes people eat and 
#whether people who have tofurkey use gravy
cnt_values = data["What is typically the main dish at your Thanksgiving dinner?"].value_counts()
print(cnt_values)
yes_tofurkey = data["What is typically the main dish at your Thanksgiving dinner?"] == "Tofurkey"
rows_tofurkey = data[yes_tofurkey]
print(rows_tofurkey["Do you typically have gravy?"])

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64
4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object


In [8]:
#figuring out if people ate any pies
apple_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"].isnull()
pumpkin_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"].isnull()
pecan_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"].isnull()
#print(apple_isnull.head(10))
no_pies = apple_isnull & pumpkin_isnull & pecan_isnull 
print(no_pies.head(10))
print(no_pies.value_counts())
#876 ate pies compared to 182 who did not eat pies

0    False
1    False
2    False
3    False
4    False
5     True
6    False
7     True
8    False
9    False
dtype: bool
False    876
True     182
dtype: int64


In [7]:
#converting age to numeric
import string
cnt_values = data["Age"].value_counts()
print(cnt_values)

def str_to_int(strval):
    if strval == "":
        #print("Is Null")
        return None
    else:
        s = strval.split(" ")
        s1 = s[0]
        s2 = s1.replace("+","")
        return int(s2)

import numpy as np
age = data["Age"]
age = age.replace(np.nan,"",regex=True)
#print(age.head(10))
int_age = age.apply(str_to_int)
print(int_age.describe())
data["int_age"] = int_age

45 - 59    286
60+        264
30 - 44    259
18 - 29    216
Name: Age, dtype: int64
count    1025.000000
mean       39.383415
std        15.398493
min        18.000000
25%        30.000000
50%        45.000000
75%        60.000000
max        60.000000
Name: Age, dtype: float64


### Thanksgiving Dinners in the US

Majority of the people surveyed in the US are traditionalist eating Turkey and pies for dinner at nearly 90% of those surveyd.  Only a small percentage of individuals eat Tofurkery accounting for only about 20 people out of 1000+ surveyed.  Moreover, 80% of the population had some form of pies for Thanksgiving

It is difficult to assess the age distributio since age is reported over a large range of values rather than an exact number.  However as we took the minimum age in the range it is likely that the age if far higher than the average age computed of 39.

In [9]:
#analyzing household income of those surveyed
def str_to_int(strval):
    if strval == "":
        #print("Is Null")
        return None
    else:
        s = strval.split(" ")
        s1 = s[0]
        if s1 == "Prefer":
            return None
        else:
            s2 = s1.replace("+","")
            s2 = s2.replace("$","")
            s2 = s2.replace(",","")
            return int(s2)

inc = data["How much total combined money did all members of your HOUSEHOLD earn last year?"]
inc = inc.replace(np.nan,"",regex=True)
#print(age.head(10))
int_income = inc.apply(str_to_int)
data["int_income"] = int_income
print(int_income.describe())

count       889.000000
mean      74077.615298
std       59360.742902
min           0.000000
25%       25000.000000
50%       50000.000000
75%      100000.000000
max      200000.000000
Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: float64


### Household incomes of those surveyed

Taking the lower bound of the ranges of income groups of household that were surveyed we found that the mean income was ~\$74K.  This suggests that at minimum the average income is \$74K, but is likely to be far higher.  This shows that the population that filled out the online survey is a highly selected population set as the typical income in the US was around $52K in 2014.  This suggests that the population is much more wealthy and educated than the standard population.


In [7]:
under_150K = data[int_income < 150000]
travel_under_150K = under_150K["How far will you travel for Thanksgiving?"]
print(travel_under_150K.value_counts())

over_150K = data[int_income > 150000]
travel_over_150K = over_150K["How far will you travel for Thanksgiving?"]
print(travel_over_150K.value_counts())


Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64
Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64


### Travel distance based on income

Examining the distribution of where people are having Thanksgiving people who have greaer \$150K in income are about 5[\%] more likely to be holding dinner at their home at 45[\%].  However, they are also much more likely to travel far at 12[\%] compared to 7[\%] of the population that makes under \$150K

In [10]:
import pandas as pd
#Examining Friendship and Age
friends_age = data.pivot_table(
    values="int_age", 
    index="Have you ever tried to meet up with hometown friends on Thanksgiving night?",
    columns='Have you ever attended a "Friendsgiving?"'
)
print(friends_age)

friends_inc = data.pivot_table(
    values="int_income", 
    index="Have you ever tried to meet up with hometown friends on Thanksgiving night?",
    columns='Have you ever attended a "Friendsgiving?"'
)
print(friends_inc)

Have you ever attended a "Friendsgiving?"                  No        Yes
Have you ever tried to meet up with hometown fr...                      
No                                                  42.283702  37.010526
Yes                                                 41.475410  33.976744
Have you ever attended a "Friendsgiving?"                     No           Yes
Have you ever tried to meet up with hometown fr...                            
No                                                  78914.549654  72894.736842
Yes                                                 78750.000000  66019.736842


# Friendship and its linkage with age and income

Younger people are in general more sociable.  They are much more likely to meet up with friends for Thanksgiving as the average age of those having attended a Friendsgiving and meeting up with friends is 33.97 (lower bound).  In contrast the average age of people who have done neither is 42.3 (lower bound).  This is also correlated with income.  However, the incomes are similar across groups who say they have attended a friendsgiving, but have not tried to meet up with hometown friends at $78K (lower bound).



In [60]:
#Types of deserts people eat (turn everything into 0/1 variables)

dessert = ["Apple cobbler", "Blondies", "Brownies", "Cheesecake", "Cookies", "Fudge", "Ice cream", "Peach cobbler", "None", "Other (please specify)"]
strdessert = 'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - '
for i in dessert:
    strnew = strdessert + i
    data[i] = data[strnew].isnull()==0
    #data[i]=data[i].fillna(0)
    print(data[i].value_counts())

pies = ["Apple","Buttermilk","Cherry","Chocolate","Coconut cream", "Key lime", "Peach", "Pecan", "Pumpkin", "Sweet Potato", "None", "Other (please specify)"]
strpies = "Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - "
for i in pies:
    strnew = strpies + i
    data[i] = data[strnew].isnull()==0
    #data[i]=data[i].fillna(0)
    print(data[i].value_counts())
    
data["some_dessert"]=data[dessert].sum(axis=1)
data["some_dessert"]=data["some_dessert"]>=1
print(data["some_dessert"].value_counts())

data["some_pies"]=data[pies].sum(axis=1)
data["some_pies"]=data["some_pies"]>=1
print(data["some_pies"].value_counts())
#Most common complete meal


False    948
True     110
Name: Apple cobbler, dtype: int64
False    1042
True       16
Name: Blondies, dtype: int64
False    930
True     128
Name: Brownies, dtype: int64
False    867
True     191
Name: Cheesecake, dtype: int64
False    854
True     204
Name: Cookies, dtype: int64
False    1015
True       43
Name: Fudge, dtype: int64
False    792
True     266
Name: Ice cream, dtype: int64
False    955
True     103
Name: Peach cobbler, dtype: int64
False    763
True     295
Name: None, dtype: int64
False    924
True     134
Name: Other (please specify), dtype: int64
False    544
True     514
Name: Apple, dtype: int64
False    1023
True       35
Name: Buttermilk, dtype: int64
False    945
True     113
Name: Cherry, dtype: int64
False    925
True     133
Name: Chocolate, dtype: int64
False    1022
True       36
Name: Coconut cream, dtype: int64
False    1019
True       39
Name: Key lime, dtype: int64
False    1024
True       34
Name: Peach, dtype: int64
False    716
True     342
Name: Pe

In [54]:
#Number of people working on Thanksgiving
data['Will you employer make you work on Black Friday?'].value_counts()

Yes              43
No               20
Doesn't apply     7
Name: Will you employer make you work on Black Friday?, dtype: int64

In [55]:
pies = ["Apple","Buttermilk","Cherry","Chocolate","Coconut cream", "Key lime", "Peach", "Pecan", "Pumpkin", "Sweet Potato", "None", "Other (please specify)"]
strpies = "Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - "

desserts = ["Apple cobbler", "Blondies", "Brownies", "Cheesecake", "Cookies", "Fudge", "Ice cream", "Peach cobbler", "None", "Other (please specify)"]
strdesserts = 'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - '

unique_region = data["US Region"].unique()
print(unique_region)

def count_byvar(strtypes, types, byname, byvar):
    for u in byvar:
        row_data_u = data[data[byname]==u]
        for i in types:
            newstrtypes = strtypes + i
            row_data_u[i] = row_data_u[newstrtypes]
            print(byname, ": ", u, ", ", row_data_u[i].value_counts())

#call functions to look at regional patterns in data menues      
count_byvar(strpies, pies, "US Region", unique_region)
count_byvar(strdesserts, desserts, "US Region", unique_region)



['Middle Atlantic' 'East South Central' 'Mountain' 'Pacific'
 'East North Central' 'West North Central' 'West South Central'
 'South Atlantic' 'New England' nan]
US Region :  Middle Atlantic ,  Apple    106
Name: Apple, dtype: int64

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



US Region :  Middle Atlantic ,  Buttermilk    2
Name: Buttermilk, dtype: int64
US Region :  Middle Atlantic ,  Cherry    17
Name: Cherry, dtype: int64
US Region :  Middle Atlantic ,  Chocolate    16
Name: Chocolate, dtype: int64
US Region :  Middle Atlantic ,  Coconut cream    9
Name: Coconut cream, dtype: int64
US Region :  Middle Atlantic ,  Key lime    1
Name: Key lime, dtype: int64
US Region :  Middle Atlantic ,  Peach    4
Name: Peach, dtype: int64
US Region :  Middle Atlantic ,  Pecan    31
Name: Pecan, dtype: int64
US Region :  Middle Atlantic ,  Pumpkin    115
Name: Pumpkin, dtype: int64
US Region :  Middle Atlantic ,  Sweet Potato    21
Name: Sweet Potato, dtype: int64
US Region :  Middle Atlantic ,  None    6
Name: None, dtype: int64
US Region :  Middle Atlantic ,  Other (please specify)    8
Name: Other (please specify), dtype: int64
US Region :  East South Central ,  Apple    19
Name: Apple, dtype: int64
US Region :  East South Central ,  Buttermilk    6
Name: Buttermilk, 

In [56]:
#Age, Gender, Income Based Patterns in Dinner Menus
for i in pies:
    newstrpies = strpies + i
    data[i] = data[newstrpies]
    tempdata=data[data[i]==i]
    print("Mean Age Pie ", i,":",round(tempdata["int_age"].mean(),1))
    print("Mean Income Pie ", i,":",round(tempdata["int_income"].mean(),1))
    print("Counts Gender Pie ", i,":",tempdata["What is your gender?"].value_counts())

for i in desserts:
    newstrdesserts = strdesserts + i
    data[i] = data[newstrdesserts]
    tempdata=data[data[i]==i]
    print("Mean Age Dessert ", i,":",round(tempdata["int_age"].mean(),1))
    print("Mean Income Dessert ", i,":",round(tempdata["int_income"].mean(),1))
    print("Counts Gender Pie ", i,":",tempdata["What is your gender?"].value_counts())    

#Note that the above is still not ideal in that 

Mean Age Pie  Apple : 39.8
Mean Income Pie  Apple : 74828.8
Counts Gender Pie  Apple : Female    287
Male      222
Name: What is your gender?, dtype: int64
Mean Age Pie  Buttermilk : 34.0
Mean Income Pie  Buttermilk : 63750.0
Counts Gender Pie  Buttermilk : Female    21
Male      12
Name: What is your gender?, dtype: int64
Mean Age Pie  Cherry : 36.7
Mean Income Pie  Cherry : 56907.2
Counts Gender Pie  Cherry : Female    58
Male      52
Name: What is your gender?, dtype: int64
Mean Age Pie  Chocolate : 36.3
Mean Income Pie  Chocolate : 69678.9
Counts Gender Pie  Chocolate : Female    72
Male      58
Name: What is your gender?, dtype: int64
Mean Age Pie  Coconut cream : 41.1
Mean Income Pie  Coconut cream : 76250.0
Counts Gender Pie  Coconut cream : Male      21
Female    13
Name: What is your gender?, dtype: int64
Mean Age Pie  Key lime : 35.5
Mean Income Pie  Key lime : 77833.3
Counts Gender Pie  Key lime : Male      24
Female    14
Name: What is your gender?, dtype: int64
Mean Age Pi

In [61]:
#lets examine how much gender, age, income plays a role in determining whether a person eats certain types of food for 
#Thanksgiving
#from sklearn.linear_model import LogisticRegression
import statsmodels.api as sm
import numpy as np

#print(data.dtypes)
data["female"] = data["What is your gender?"]=="female"
data["female"] = data["female"].astype(int)
print(data["Apple"].value_counts())
data["Apple"] = data["Apple"].astype(int)
#too few variables to look at some dessert
data["some_dessert"]=data["some_dessert"].astype(int)
#dummy_ranks = pd.get_dummies(df['prestige'], prefix='prestige')
xcols = ["intercept", "int_age","int_income","female"]
temp_data = data[(data["int_age"].isnull()==0) & (data["int_income"].isnull()==0)]
temp_data["intercept"] = 1.0
print(len(data),len(temp_data))

#only stats model has the standard statitics to analyze a regression.  sci-kit learn is more for machine learning
#logit = sm.Logit(temp_data["Apple"],temp_data[xcols])
#result = logit.fit()
#print(result.summary())


False    544
True     514
Name: Apple, dtype: int64
1058 889


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


### Conclusions

Unfortunately is difficult to use for regression analysis unless we start grouping diffferent food types together.  However, simply looking at say Apple Pie or All Desserts have too few people and too many people, respectively to run a regression analysis.  In general, caution should be used when interpreting and analyzing this data.