# Thanksgiving Dinner Data

thanksgiving.csv : what Americans eat for Thanksgiving

- 65 columns
- 1058 rows

## Important columns

- `RespondentID` -- a unique ID of the respondent to the survey.
- `Do you celebrate Thanksgiving?` -- a Yes/No reponse to the question.
- `How would you describe where you live?` -- responses are Suburban, Urban, and Rural.
- `Age` -- resposes are one of several categories, such as 18-29, and 30-44
- `How much total combined money did all members of your HOUSEHOLD earn last year?` -- one of several categories, such as \$75,000 to $99,999.

In [4]:
import pandas as pd

data = pd.read_csv('thanksgiving.csv',encoding='Latin-1')
data.head(5)

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain
3,4337933040,Yes,Turkey,,Baked,,Bread-based,,Homemade,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$200,000 and up",Pacific
4,4337931983,Yes,Tofurkey,,Baked,,Bread-based,,Canned,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$100,000 to $124,999",Pacific


In [5]:
# all of the columns names
data.columns

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

In [6]:
data['Do you celebrate Thanksgiving?'].value_counts()

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64

In [7]:
# we delete all rows where the answer is 'No'
data = data[data['Do you celebrate Thanksgiving?'] == 'Yes']

## Explore main dishes

In [8]:
data['What is typically the main dish at your Thanksgiving dinner?'].value_counts()

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

In [9]:
isTofurkey = data[data['What is typically the main dish at your Thanksgiving dinner?'] == 'Tofurkey']
isTofurkey['Do you typically have gravy?']

4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object

## Types of pie

How many people eat pie for Thanksgiving ?


In [10]:
apple_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'].isnull()
pumpkin_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'].isnull()
pecan_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'].isnull()
ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull

ate_pies.value_counts()

False    876
True     104
dtype: int64

11.9% of people who answered do not eat one of those 3 kinds of pie for Thanksgiving : `Apple`, `Pecan`, `Pumpkin`

## Age of respondents

In [11]:
data['Age'].value_counts()


45 - 59    269
60+        258
30 - 44    235
18 - 29    185
Name: Age, dtype: int64

In [12]:
def stringToAge(stringInput):
    if pd.isnull(stringInput):
        return None
    else:
        stringAge = stringInput.split(' ')[0]
        stringAge = stringAge.replace('+','')
        return int(stringAge)
    
data['int_age'] = data['Age'].apply(stringToAge)
data['int_age'].describe()

count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64

- The result is not really precise as we only took the youngest age from each age categories.
- For the same reason, the max value is not right.
- That being said, the age are well distributed

## Income of respondents


In [13]:
data['How much total combined money did all members of your HOUSEHOLD earn last year?'].value_counts()


$25,000 to $49,999      166
$75,000 to $99,999      127
$50,000 to $74,999      127
Prefer not to answer    118
$100,000 to $124,999    109
$200,000 and up          76
$10,000 to $24,999       60
$0 to $9,999             52
$125,000 to $149,999     48
$150,000 to $174,999     38
$175,000 to $199,999     26
Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: int64

In [14]:
def stringToIncome(stringInput):
    if pd.isnull(stringInput):
        return None
    else:
        stringInput = stringInput.split(' ')[0]
        if stringInput == 'Prefer': return None
        stringInput = stringInput.replace('$','')
        stringInput = stringInput.replace(',','')
        return int(stringInput)
    
data['int_income']=data['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(stringToIncome)
data['int_income'].describe()

count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64

- Same remark that I had for the age distribution

## Correlation between Travel Distance and Income

In [15]:
# how far people earning under 50000 will travel
under150000 = data[data['int_income'] < 50000]
under150000['How far will you travel for Thanksgiving?'].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         106
Thanksgiving is local--it will take place in the town I live in                      92
Thanksgiving is out of town but not too far--it's a drive of a few hours or less     64
Thanksgiving is out of town and far away--I have to drive several hours or fly       16
Name: How far will you travel for Thanksgiving?, dtype: int64

In [16]:
# how far people earning over 150000 will travel
over150000 = data[data['int_income'] >= 150000]
over150000['How far will you travel for Thanksgiving?'].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         66
Thanksgiving is local--it will take place in the town I live in                     34
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    25
Thanksgiving is out of town and far away--I have to drive several hours or fly      15
Name: How far will you travel for Thanksgiving?, dtype: int64

- People who earn more are more likely to have Thanksgiving at home than others. The reason might be that the students tend to go home to their parents.

## Linking Friendship and Age

In [17]:
linkFriendshipAge = data.pivot_table(index='Have you ever tried to meet up with hometown friends on Thanksgiving night?',values='int_age',columns='Have you ever attended a "Friendsgiving?"')
linkFriendshipAge

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


- Young people tend to be more 'social' than older people for the Thanksgiving events. Might be because older people tend to have thanksgiving dinner with their families.

## Linking Friendship and Income

In [18]:
linkFriendshipIncome = data.pivot_table(index='Have you ever tried to meet up with hometown friends on Thanksgiving night?',values='int_income',columns='Have you ever attended a "Friendsgiving?"')
linkFriendshipIncome

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


- People with less income also tend to be more 'social'. It is in adequation with the link between friendship and age, as young people have lower income.

# Next steps to explore

- Most common dessert people eat
- How many people work on Black Friday
- Regional patterns in the dinner menus
- Age, gender, income based paterns in dinner menus


## Most common dessert people eat

In [54]:
strLeft = 'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - '
listStrRight = ['Apple cobbler','Blondies','Brownies','Carrot cake','Cheesecake','Cookies','Fudge','Ice cream','Peach cobbler','None','Other (please specify)','Other (please specify).1']
dictDessert = {}
for strRight in listStrRight:
    dessertColumn = strLeft + strRight
    isNull = data[dessertColumn] == strRight
    dictDessert[strRight] = list(isNull.value_counts().values)[-1]
    
dictDessert

{'Apple cobbler': 110,
 'Blondies': 16,
 'Brownies': 128,
 'Carrot cake': 72,
 'Cheesecake': 191,
 'Cookies': 204,
 'Fudge': 43,
 'Ice cream': 266,
 'None': 295,
 'Other (please specify)': 134,
 'Other (please specify).1': 980,
 'Peach cobbler': 103}

The most common listed dessert that people eat on Thanksgiving is `Ice cream`, followed by `Cookies` and `Cheescake`

In [75]:
others = data['Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Other (please specify).1']
isNull = others.isnull()
othersNotNull = others[isNull == False]
othersNotNull

1       Jelly roll, sweet cheeseball, chocolate dipped...
11                                                    Pie
27                                  Sparkling Apple Cider
47                                            Pumpkin Pie
52                                                   pies
54                                                  Lefse
61                                                 eclair
66                                               as above
68                                         Pie is dessert
77                                       chocolate mousse
111                                                   Pie
121                                        pine nut cake 
123                                          Pumpkin pie.
134                                          Pie, pumpkin
135                                       we stick to pie
140                                            choc. cake
152                                           pumpkin pie
153           

We can see that many people seem to eat pie as a dessert, but how many ?

In [81]:
matchesPie = othersNotNull.str.match('.*[Pp][Ii][Ee].*')
list(matchesPie.value_counts())[0]


75

If we consider only the people who answered with an other dessert, `pie` seems to be the most popular with more than 75 choices out of 134.

## How many people work on Black Friday


In [85]:
blackFridayWork = data['Will you employer make you work on Black Friday?'].value_counts()
blackFridayWork


Yes              43
No               20
Doesn't apply     7
Name: Will you employer make you work on Black Friday?, dtype: int64

Out of 70 answers, 42 of them will work on Black Friday. 

## Regional patterns in the dinner menus

In [96]:
regions = data['US Region'].unique()
dinnersForEachRegion = dict()
for region in regions:
    subset = data[data['US Region'] == region]
    dinnersForEachRegion[region] = subset['What is typically the main dish at your Thanksgiving dinner?'].value_counts().head(3)
    
dinnersForEachRegion

{'Middle Atlantic': Turkey                    130
 Tofurkey                    5
 Other (please specify)      4
 Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64,
 'East South Central': Turkey                    50
 Other (please specify)     4
 Ham/Pork                   1
 Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64,
 'Mountain': Turkey      37
 Tofurkey     2
 Ham/Pork     1
 Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64,
 'Pacific': Turkey                    107
 Other (please specify)      9
 Ham/Pork                    6
 Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64,
 'East North Central': Turkey                    135
 Other (please specify)      5
 Ham/Pork                    4
 Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64,
 'West North Central': Turkey                    60
 Ham/Pork                   4


We can see with no surprise that Turkey is the most eaten main dish in Thanksgiving for all the regions.