# Analysis of Thanksgiving dinner in the US
This project is to analyze what people ate in thanksgiving day according their religion and income. 

Dataset: https://github.com/fivethirtyeight/data/tree/master/thanksgiving-2015

1058 respondents on Nov. 17, 2015 answered the survey of the following questions about their thanksgiving. 

# Read Data

In [17]:
import pandas as pd

data = pd.read_csv("thanksgiving.csv",encoding='latin-1')

In [18]:
data.head()

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain
3,4337933040,Yes,Turkey,,Baked,,Bread-based,,Homemade,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$200,000 and up",Pacific
4,4337931983,Yes,Tofurkey,,Baked,,Bread-based,,Canned,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$100,000 to $124,999",Pacific


In [19]:
data.columns

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

In [20]:
data.shape

(1058, 65)

# Keep data for people who celebrate thinksgiving

In [21]:
data["Do you celebrate Thanksgiving?"].unique() 
# check how many different data in the column, might be yes, Yes, no, No and none...

array(['Yes', 'No'], dtype=object)

In [22]:
data_celebrate = data[data["Do you celebrate Thanksgiving?"] == "Yes"]
#filter out any rows where the presponse is not yes.

In [23]:
data_celebrate["Do you celebrate Thanksgiving?"].unique() 
# check if last step is correct

array(['Yes'], dtype=object)

# The main dish at thanksgiving

In [24]:
data_celebrate["What is typically the main dish at your Thanksgiving dinner?"].unique() 

array(['Turkey', 'Tofurkey', 'Other (please specify)', 'Ham/Pork',
       'Turducken', 'Roast beef', nan, 'Chicken', "I don't know"], dtype=object)

In [25]:
main_dish = data_celebrate["What is typically the main dish at your Thanksgiving dinner?"]
pd.Series.value_counts(main_dish)

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

In [26]:
gravy_tofurkey = data_celebrate["Do you typically have gravy?"][main_dish == "Tofurkey"]
gravy_tofurkey.head()

4     Yes
33    Yes
69     No
72     No
77    Yes
Name: Do you typically have gravy?, dtype: object

# How many people eat apple, pecan or pumpkin pie

In [27]:
is_apple = pd.isnull(data_celebrate["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"])
is_pumpkin = pd.isnull(data_celebrate["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"])
is_Pecan = pd.isnull(data_celebrate["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"])
is_oneofthree = is_apple&is_pumpkin&is_Pecan # null = True 


In [28]:
pd.value_counts(is_oneofthree)

False    876
True     104
dtype: int64

True means 104 people don't eat any pie of apple, pecan or pumpkin. 
False means 876 people eat at least one of pies of apple, pecan or pumpkin. 

In nutshell, 843 people eat apple,pecan or pumpkin pie, 137 not. 

# Age distribusion 

In [29]:
data_celebrate["Age"].unique()

array(['18 - 29', '30 - 44', '60+', '45 - 59', nan], dtype=object)

In [30]:
type(data_celebrate)

pandas.core.frame.DataFrame

In [31]:
def extract_age(column):
    if pd.isnull(column):
        return None
    column = column.split(" ")[0]
    column = column.replace("+","")
    return int(column)

int_age = data_celebrate["Age"].apply(extract_age)

In [32]:
int_age.describe()

count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: Age, dtype: float64

In [33]:
type(int_age)

pandas.core.series.Series

In [34]:
data_celebrate["int_age"] = int_age

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


The age groups of respondents are fairly evenly distributed.

# Income distribution

In [35]:
data_celebrate["How much total combined money did all members of your HOUSEHOLD earn last year?"].unique()

array(['$75,000 to $99,999', '$50,000 to $74,999', '$0 to $9,999',
       '$200,000 and up', '$100,000 to $124,999', '$25,000 to $49,999',
       'Prefer not to answer', '$10,000 to $24,999',
       '$175,000 to $199,999', '$150,000 to $174,999',
       '$125,000 to $149,999', nan], dtype=object)

In [36]:
def extract_income(column):
    if pd.isnull(column):
        return None
    column = column.split(" ")[0]
    if column == "Prefer":
        return None
    column = column.replace("$","")
    column = column.replace(",","")
    return int(column)

int_income = data_celebrate["How much total combined money did all members of your HOUSEHOLD earn last year?"].apply(extract_income)

In [37]:
int_income.describe()

count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: float64

In [38]:
data_celebrate["int_income"] = int_income

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [39]:
data_celebrate.head()

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region,int_age,int_income
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic,18.0,75000.0
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central,18.0,50000.0
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain,18.0,0.0
3,4337933040,Yes,Turkey,,Baked,,Bread-based,,Homemade,,...,No,No,,Urban,30 - 44,Male,"$200,000 and up",Pacific,30.0,200000.0
4,4337931983,Yes,Tofurkey,,Baked,,Bread-based,,Canned,,...,No,No,,Urban,30 - 44,Male,"$100,000 to $124,999",Pacific,30.0,100000.0


# The relationship between income and how far they will travel for Thankgiving

In [44]:
int_income_under_150000= data_celebrate["How far will you travel for Thanksgiving?"][int_income < 150000]

In [56]:
int_income_under_150000.value_counts()

Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64

In [57]:
int_income_under_150000.value_counts(normalize = True)

Thanksgiving is happening at my home--I won't travel at all                         0.407837
Thanksgiving is local--it will take place in the town I live in                     0.294630
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    0.217707
Thanksgiving is out of town and far away--I have to drive several hours or fly      0.079826
Name: How far will you travel for Thanksgiving?, dtype: float64

In [58]:
int_income_150000= data_celebrate["How far will you travel for Thanksgiving?"][int_income > 150000]

In [59]:
int_income_150000.value_counts()

Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64

In [60]:
int_income_150000.value_counts(normalize = True)

Thanksgiving is happening at my home--I won't travel at all                         0.480392
Thanksgiving is local--it will take place in the town I live in                     0.245098
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    0.156863
Thanksgiving is out of town and far away--I have to drive several hours or fly      0.117647
Name: How far will you travel for Thanksgiving?, dtype: float64

It appears that more younger people(40% of them),assuming younger people earn less than older people, choose to celebrate at their house, 60% of them might at their parents' house, while older people (family income > 150000) , 48% of them choose to celebrate at their house. 

# The average ages of people who spend Thanksgiving with friends

In [48]:
is_friends = data_celebrate["Have you ever tried to meet up with hometown friends on Thanksgiving night?"] == "Yes"

In [49]:
is_friendsgiving = data_celebrate["Have you ever attended a \"Friendsgiving?\""] == "Yes"

In [50]:
is_either = is_friends|is_friendsgiving 

In [51]:
pd.pivot_table(data_celebrate, index = is_either.values, values = "int_age")

Unnamed: 0,int_age
False,42.283702
True,37.666667


In [52]:
pd.pivot_table(data_celebrate, index = "Have you ever tried to meet up with hometown friends on Thanksgiving night?", columns = "Have you ever attended a \"Friendsgiving?\"", values = "int_age")

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


# The average income of people who spend Thanksgiving with friends

In [53]:
pd.pivot_table(data_celebrate, index = "Have you ever tried to meet up with hometown friends on Thanksgiving night?", columns = "Have you ever attended a \"Friendsgiving?\"", values = "int_income")

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


It appears that younger people are more likely to attend a friendsgiving or meet up with hometown friends on Thanksgiving. 