# Analyzing Thanksgiving Dinner

In [1]:
import pandas
data = pandas.read_csv('thanksgiving.csv', encoding = 'Latin-1')
#csv file is not encoded normally
data[:3]

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain


In [2]:
print(data.columns)

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

In [3]:
print(data['Do you celebrate Thanksgiving?'].value_counts())

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64


In [4]:
#limiting data to respondents who do celebrate Thanksgiving
data = data.loc[data['Do you celebrate Thanksgiving?'] == 'Yes']

In [5]:
print(data['Do you celebrate Thanksgiving?'].value_counts())

Yes    980
Name: Do you celebrate Thanksgiving?, dtype: int64


## What Do People Eat?

In [6]:
print(data['What is typically the main dish at your Thanksgiving dinner?'].value_counts())

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64


In [7]:
#Do does who eat Tofurkey typically have gravy?
subdata = data.loc[data['What is typically the main dish at your Thanksgiving dinner?']=='Tofurkey']
print(subdata['Do you typically have gravy?'])

4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object


In [8]:
#Figuring out what pies people eat
apple_isnull = pandas.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'])
pumpkin_isnull = pandas.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'])
pecan_isnull = pandas.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'])

#if all three values from a respnont are null, then they did not eat pies (False)
ate_pies = (apple_isnull & pumpkin_isnull & pecan_isnull)== False
print(ate_pies.value_counts())

True     876
False    104
dtype: int64


In [9]:
print('How many of the respondents...')
print('Ate at least one pie: ' + str(ate_pies[ate_pies].count()))
print('Ate apple pie: ' + str(apple_isnull[apple_isnull == False].count()))
print('Ate pumpkin pie: ' + str(pumpkin_isnull[pumpkin_isnull == False].count()))
print('Ate pecan pie: ' + str(pecan_isnull[pecan_isnull == False].count()))

How many of the respondents...
Ate at least one pie: 876
Ate apple pie: 514
Ate pumpkin pie: 729
Ate pecan pie: 342


## Converting Age and Income to Useful Integers

In [10]:
#Parsing out the lowest age of each range
import re
def FirstAge (string):
    #requires pandas and re libraries
    if pandas.isnull(string):
        return None
    else:
        age = string.split(' ')[0]
        age = int(re.sub('\+','', age))
        return age

data['int_age'] = data['Age'].apply(FirstAge)
data['int_age'].describe()

count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64

The summary of the 'int_age' column shown above is not a true depiction of the ages of survey participants.  It assumes the lowest age given each participant's age range, therefore the mean shown is the minimum possible value.

In [11]:
#Parsing out the lowest income of each income range
def FirstIncome(string):
    #requires pandas and re libraries
    if pandas.isnull(string):
        return None
    else:
        inc = string.split(' ')[0]
        if inc == 'Prefer':
            return None
        else:
            inc = re.sub('\$', '', inc)
            inc = int(re.sub(',', '', inc))
            return inc
incomerange = 'How much total combined money did all members of your HOUSEHOLD earn last year?'       
data['int_income'] = data[incomerange].apply(FirstIncome)
data['int_income'].describe()

count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64

Once again, the new 'int_income' column is not a true depiction of the participants' household incomes.  It represents the lowest possible incomes given each respondent's household income range.  

## Correlations

### Household Income and Distance

In [12]:
traveldist = 'How far will you travel for Thanksgiving?'
print ('How far people with household earnings under $150,000 travel: ')
print(data.loc[data['int_income'] < 150000, traveldist].value_counts())

print ('How far people with household earnings over $150,000 travel: ')
print(data.loc[data['int_income'] > 150000, traveldist].value_counts())

How far people with household earnings under $150,000 travel: 
Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64
How far people with household earnings over $150,000 travel: 
Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64


There are more respondents with household earning of under \$150,000 than those of over \$150,000.  However, it can still be seen within each group that the longer the distance, the less people choose to travel that distance.

### Age or Income and Celebrating with Friends

#### Average Age of Those Who Have Celebrated with Friends

In [13]:
agefriends = data.pivot_table(index = 'Have you ever tried to meet up with hometown friends on Thanksgiving night?', columns = 'Have you ever attended a "Friendsgiving?"', values = 'int_age')
print(agefriends)

Have you ever attended a "Friendsgiving?"                  No        Yes
Have you ever tried to meet up with hometown fr...                      
No                                                  42.283702  37.010526
Yes                                                 41.475410  33.976744


#### Average Income of Those Who Have Celebrated with Friends

In [17]:
incfriends = data.pivot_table(index = 'Have you ever tried to meet up with hometown friends on Thanksgiving night?', columns = 'Have you ever attended a "Friendsgiving?"', values = 'int_income')
print(incfriends)

Have you ever attended a "Friendsgiving?"                     No           Yes
Have you ever tried to meet up with hometown fr...                            
No                                                  78914.549654  72894.736842
Yes                                                 78750.000000  66019.736842


The first pivot table shows that the respondents who celebrated Friendsgiving and/or tried to meet up with hometown friends on Thanksgiving night were on average younger than does who did not.  The second pivot table shows that the respondents who have celebrated Friendsgiving have a higher income than those who haven't.  However, those who have t