# Analyzing Thanksgiving Dinner

The dataset contains 1058 responses to an online survey about what Americans eat for Thanksgiving dinner. Each survey respondent was asked questions about what they typically eat for Thanksgiving, along with some demographic questions, like their gender, income, and location. This dataset will allow us to discover regional and income-based patterns in what Americans eat for Thanksgiving dinner.

The dataset has 65 columns, and 1058 rows. Most of the column names are questions, and most of the column values are string responses to the questions. Most of the columns are categorical, as a survey respondent had to select one of a few options. For example, one of the first column names is What is typically the main dish at your Thanksgiving dinner?. The potential responses are:
Turkey
Other (please specify)
Ham/Pork
Tofurkey
Chicken
Roast beef
I don't know
Turducken

Most of the columns follow the same question/response format as the above. There are also quite a few NaN values in the columns, which occurred when a survey respondent didn't fill out a question because they didn't want to, or it didn't apply to them.



### Import libraries and read in Data

In [2]:
import pandas as pd
import numpy as np
data = pd.read_csv('thanksgiving.csv', encoding='Latin-1')

data.head()

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain
3,4337933040,Yes,Turkey,,Baked,,Bread-based,,Homemade,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$200,000 and up",Pacific
4,4337931983,Yes,Tofurkey,,Baked,,Bread-based,,Canned,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$100,000 to $124,999",Pacific


In [21]:
# Display the column names

data.columns

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

### Remove any responses from people who don't celebrate it

In [12]:
# display counts of how many times each category occurs 
# in the Do you celebrate Thanksgiving? column

data['Do you celebrate Thanksgiving?'].value_counts()

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64

In [6]:
# Filter out any rows in data where the response to 
# Do you celebrate Thanksgiving? is not Yes

data = data[data['Do you celebrate Thanksgiving?'] == 'Yes']


### Explore what main dishes people tend to eat during Thanksgiving dinner

In [2]:
# display counts of how many times each category occurs in the What is typically the main dish at your Thanksgiving dinner? column

data['What is typically the main dish at your Thanksgiving dinner?'].value_counts()

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

In [4]:
# Display the Do you typically have gravy? column for any rows from data 
# where the What is typically the main dish at your Thanksgiving dinner? column equals Tofurkey

data[data['What is typically the main dish at your Thanksgiving dinner?']=='Tofurkey']['Do you typically have gravy?']

4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object

### Explore the dessert dishes

In [11]:
# Explore the dessert dishes.  How many eat pies and 
# how many do not eat pies?
apple_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'].isnull()
pumpkin_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'].isnull()
pecan_isnull = data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'].isnull()

data['ate_pies'] = apple_isnull & pumpkin_isnull & pecan_isnull

data['ate_pies'].value_counts()

False    876
True     182
Name: ate_pies, dtype: int64

### Analyze the age column

In [20]:
# Convert age to numeric values

def convertAge(strAge):
    if pd.isnull(strAge):
        return None
    else:
        return int(strAge.split(' ')[0].replace('+', ''))

#print(convertAge('20 - 29'))

data['int_age'] = data['Age'].apply(convertAge)

print(data['int_age'].describe())

count    1025.000000
mean       39.383415
std        15.398493
min        18.000000
25%        30.000000
50%        45.000000
75%        60.000000
max        60.000000
Name: int_age, dtype: float64


##### Findings regarding the age conversion method
The above age conversion method of taking the value of lower end of each age range would skew actual age distribution lower, which may not render a true depiction of the ages of survey participants.


### Analyze money spent 

In [21]:
# Convert money colomn
def convertMoney(strMoney):
    if pd.isnull(strMoney):
        return None
    else:
        result = strMoney.split()[0]
        if result == 'Prefer':
            return None
        else:
            return int(result.replace('$', '').replace(',', ''))
        
#print(convertMoney('Prefer not to answer'))

data['int_income'] = data['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(convertMoney)

data['int_income'].describe()

count       889.000000
mean      74077.615298
std       59360.742902
min           0.000000
25%       25000.000000
50%       50000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64

##### Finding regarding money conversion method:
The above money conversion method of taking the lower end of money range as the conversion value would skew the actual money distribution lower, which is not true depiction of the incomes of survey partipants.   

### Explore how the distance someone travels for Thanksgiving dinner relates to their income level

It's safe to hypothesize that people earning less money could be younger, and would travel to their parent's houses for Thanksgiving. People earning more are more likely to have Thanksgiving at their house as a result.

In [24]:
# Explore how the distance someone travels for Thanksgiving dinner relates to their income level.

# people earning under 150000
data[data['int_income'] < 150000]['How far will you travel for Thanksgiving?'].value_counts()


Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64

In [25]:
# people earning over 150000
data[data['int_income'] > 150000]['How far will you travel for Thanksgiving?'].value_counts()


Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64

##### Findings regarding travel distance at Thanksgiving in relation to income

From the above results, there are total 689 survey participants with income under 150K, 102 with income above 150K.  Ratios in each travel categories for the the above income groups are following:

   home,  local, not too far, far away
< 150k,  40.7,  29.5,  21.8,  8.0
> 150K,  48.0.  24.5,  15.7,  11.8

For income above 150k participants group, there are higher percentage of staying home or traveling far away compared to income under 150k group, indicating higher income group are either hosts or have means to travel far away.

##### Conclusion:
The result is not clean-cut as hypothsis.

### Explore average age and income of respondents who attend 'Friendsgiving'.  

In the US, a "Friendsgiving" is when instead of traveling home for the holiday, you celebrate it with friends who live in your area. Both questions seem skewed towards younger people. Let's see if this hypothesis holds up.

##### Average age of respondents with respect to attending 'Friendsgiving'

In [28]:
# Explore average age of respondents with respect to attending 'Friendsgiving'

tableAge = data.pivot_table(index="Have you ever tried to meet up with hometown friends on Thanksgiving night?", columns='Have you ever attended a "Friendsgiving?"', values='int_age')

tableAge

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


The average age for those who attended a 'Friendsgiving' is about 37 if participants answered 'No' to question 'Have you ever tried to meet up with hometown friends on Thanksgiving night?', which is 5 years younger to the group of no-trying and no-attending participants.

On the other hand the average age for those attending a 'Friendsgiving' is about 34 if participants answered 'Yes' to question of trying, which is 8 years younger to the group of yes-trying and no-attending participants

The average age are close for those who didn't attend a 'Friendsgiving' whether they answer 'No' or 'Yes to the trying question.


##### Average income of respondents with respect to attending 'Friendsgiving'

In [33]:
# Explore average income wtih attending 'Friendsgiving'

tableIncome = data.pivot_table(index="Have you ever tried to meet up with hometown friends on Thanksgiving night?", columns='Have you ever attended a "Friendsgiving?"', values='int_income')

tableIncome

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


The average income for those who attend a 'Friendsgiving' is slightly lower than those who didn't when participants answer 'No' to trying question.

The average income for those who attend a 'Friendsgiving' is much lower than those who didn't when participants answer 'Yes' to trying question.

The average incomes for those who didn't attend a 'Friendsgiving' are close whether participants answered 'Yes' or 'No' to trying quesiton.

##### Conclusion:
Yes, the hypothesis holds up. Those who seek out hometown friends and attend 'Friendsgiving' tend to be younger and with lower income.

In [None]:
# work on Black Friday

data['Will you employer make you work on Black Friday?'].value_counts()


In [None]:
# The most common dessert people eat


In [None]:
# The most common complete meal people eat

In [None]:
# Find regional patterns in the dinner menus

In [None]:
# Find age, gender, and income based patterns in dinner menus
