# Thanksgiving dinner in the US


In [1]:
import pandas as pd

data = pd.read_csv("thanksgiving.csv",encoding="Latin-1")
data.head(3)

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain


In [2]:
data.columns

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

## Filtering out people who do not celebrate thanksgiving

In [3]:
data["Do you celebrate Thanksgiving?"].value_counts()

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64

In [4]:
data = data[data["Do you celebrate Thanksgiving?"]=="Yes"]
data["Do you celebrate Thanksgiving?"].value_counts()

Yes    980
Name: Do you celebrate Thanksgiving?, dtype: int64

## What main dishes people tend to eat during Thanksgiving dinner

In [5]:
data["What is typically the main dish at your Thanksgiving dinner?"].value_counts()

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

In [6]:
data[data["What is typically the main dish at your Thanksgiving dinner?"]=="Tofurkey"]["Do you typically have gravy?"]

4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object

##  How many people eat Apple, Pecan, or Pumpkin pie during Thanksgiving dinner

In [7]:
apple_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"].isnull()
pumpkin_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"].isnull()
pecan_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"].isnull()

ate_pies = ~(apple_isnull & pumpkin_isnull & pumpkin_isnull)
ate_pies.value_counts()

True     831
False    149
dtype: int64

## How is age relevant in Thanksgiving celebrations

In [8]:
def str2int(s):
    if pd.isnull(s):
        return None
    else:
        return int(s[:2])
    
data["int_age"] = data["Age"].apply(str2int)    

print(data["int_age"].describe())
print(data["int_age"].value_counts())


count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64
45.0    269
60.0    258
30.0    235
18.0    185
Name: int_age, dtype: int64


Note, that mean description of "int_age" column applies to the lower boundary of age intervals.
To see if the survey gives results unbiased by age we would have to compare intervals count with the population age curve.
    


## How is household earnings relevant in Thanksgiving celebrations


In [9]:
def earnings2int(s):
    if pd.isnull(s) or s[:6] == "Prefer":
        return None
    else:
        lower_earnings_bound = int(s.split(" ")[0].replace("$","").replace(",",""))
        return lower_earnings_bound
    
data["int_income"] = data["How much total combined money did all members of your HOUSEHOLD earn last year?"].apply(earnings2int)  

This result is similar to "age_int", and should also be treated with knowing that "int_income" is in fact a lower boundarie of incom interval.

The interesting thing is, if the people who did not answer question about earning tend to be in some particular group, or are they just random.

In [10]:
print("all data: \n", data.describe())
no_income_data = data[data["int_income"].isnull() & data["int_age"].notnull()]
no_income_age_data = data[data["int_income"].isnull() & data["int_age"].isnull()]
print("no_income_data: \n", no_income_data.describe())
print("no_income_age_data: \n", no_income_age_data.describe())

all data: 
        RespondentID     int_age     int_income
count  9.800000e+02  947.000000     829.000000
mean   4.336737e+09   40.089757   75965.018094
std    4.861132e+05   15.352014   59068.636748
min    4.335895e+09   18.000000       0.000000
25%    4.336368e+09   30.000000   25000.000000
50%    4.336802e+09   45.000000   75000.000000
75%    4.337009e+09   60.000000  100000.000000
max    4.337955e+09   60.000000  200000.000000
no_income_data: 
        RespondentID     int_age  int_income
count  1.180000e+02  118.000000         0.0
mean   4.336764e+09   36.483051         NaN
std    4.665789e+05   15.893735         NaN
min    4.335954e+09   18.000000         NaN
25%    4.336402e+09   18.000000         NaN
50%    4.336805e+09   30.000000         NaN
75%    4.336985e+09   45.000000         NaN
max    4.337916e+09   60.000000         NaN
no_income_age_data: 
        RespondentID  int_age  int_income
count  3.300000e+01      0.0         0.0
mean   4.336468e+09      NaN         NaN
std   

It looks like younger people don't like to share their earnings. Also, that all people who did not share earnings did share their age.

##  How the distance someone travels for Thanksgiving dinner relates to their income level

In [29]:
travelling_under_150000 = data[data["int_income"]<150000]["How far will you travel for Thanksgiving?"].value_counts()
travelling_over_150000 = data[data["int_income"]>=150000]["How far will you travel for Thanksgiving?"].value_counts()

sum_under_150000 = travelling_under_150000.sum()
sum_over_150000 = travelling_over_150000.sum()

travelling_under_150000_normalized = travelling_under_150000/sum_under_150000
travelling_over_150000_normalized = travelling_over_150000/sum_over_150000

print("under_150000: \n ",travelling_under_150000_normalized)
print("\n over_150000: \n ",travelling_over_150000_normalized)

under_150000: 
  Thanksgiving is happening at my home--I won't travel at all                         0.407837
Thanksgiving is local--it will take place in the town I live in                     0.294630
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    0.217707
Thanksgiving is out of town and far away--I have to drive several hours or fly      0.079826
Name: How far will you travel for Thanksgiving?, dtype: float64

 over_150000: 
  Thanksgiving is happening at my home--I won't travel at all                         0.471429
Thanksgiving is local--it will take place in the town I live in                     0.242857
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    0.178571
Thanksgiving is out of town and far away--I have to drive several hours or fly      0.107143
Name: How far will you travel for Thanksgiving?, dtype: float64


People who earn more than 150000$/year are more likely to host the Thanksgiving dinner at their homes than people who ern less than 150000.
If we take into account people who host the dinner or spend it locally, we do not see any major difference - around 70% people do not travel far

## Friends instead of family ?

In [38]:
data_friends = pd.pivot_table(data,values="int_age", index="Have you ever tried to meet up with hometown friends on Thanksgiving night?",
                             columns='Have you ever attended a "Friendsgiving?"' )
print(data_friends)
data_friends_income = pd.pivot_table(data,values="int_income", index="Have you ever tried to meet up with hometown friends on Thanksgiving night?",
                             columns='Have you ever attended a "Friendsgiving?"' )

print("\n",data_friends_income)

Have you ever attended a "Friendsgiving?"                  No        Yes
Have you ever tried to meet up with hometown fr...                      
No                                                  42.283702  37.010526
Yes                                                 41.475410  33.976744

 Have you ever attended a "Friendsgiving?"                     No           Yes
Have you ever tried to meet up with hometown fr...                            
No                                                  78914.549654  72894.736842
Yes                                                 78750.000000  66019.736842


When it comes to spending Thanksgiving with friends, it seems that Younger (and poorer) people do thatmore often.