# Analyzing Thanksgiving dinners

The dataset is stored in the thanksgiving.csv file. It contains 1058 responses to an online survey about what Americans eat for Thanksgiving dinner. Each survey respondent was asked questions about what they typically eat for Thanksgiving, along with some demographic questions, like their gender, income, and location. This dataset will allow us to discover regional and income-based patterns in what Americans eat for Thanksgiving dinner.

The dataset has 65 columns, and 1058 rows. Most of the column names are questions, and most of the column values are string responses to the questions. Most of the columns are categorical, as a survey respondent had to select one of a few options. For example, one of the first column names is What is typically the main dish at your Thanksgiving dinner?. The potential responses are:

- Turkey
- Other (please specify)
- Ham/Pork
- Tofurkey
- Chicken
- Roast beef
- I don't know
- Turducken
Most of the columns follow the same question/response format as the above. There are also quite a few NaN values in the columns, which occurred when a survey respondent didn't fill out a question because they didn't want to, or it didn't apply to them.

We won't enumerate every single column now, but here are descriptions of some of the most important:

- RespondentID -- a unique ID of the respondent to the survey.
- Do you celebrate Thanksgiving? -- a Yes/No reponse to the question.
- How would you describe where you live? -- responses are Suburban, Urban, and Rural.
- Age -- resposes are one of several categories, such as 18-29, and 30-44.
- How much total combined money did all members of your HOUSEHOLD earn last year? -- one of several categories, such as $75,000 to $99,999.

In this project, we'll explore the data, and try to find interesting patterns. 

In [3]:
import pandas as pd

data = pd.read_csv("thanksgiving.csv", encoding="Latin-1")

print(data.head())

   RespondentID Do you celebrate Thanksgiving?  \
0    4337954960                            Yes   
1    4337951949                            Yes   
2    4337935621                            Yes   
3    4337933040                            Yes   
4    4337931983                            Yes   

  What is typically the main dish at your Thanksgiving dinner?  \
0                                             Turkey             
1                                             Turkey             
2                                             Turkey             
3                                             Turkey             
4                                           Tofurkey             

  What is typically the main dish at your Thanksgiving dinner? - Other (please specify)  \
0                                                NaN                                      
1                                                NaN                                      
2                            

In [4]:
print(data.columns)

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

In [5]:
ans = data["Do you celebrate Thanksgiving?"].value_counts()
print(ans)

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64


In [6]:
print(data["What is typically the main dish at your Thanksgiving dinner?"].value_counts())

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64


In [10]:
tofurkey = data[data["What is typically the main dish at your Thanksgiving dinner?"] == "Tofurkey"]
print(tofurkey["Do you typically have gravy?"])

4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object


In [19]:
apple_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"].isnull()
pumpkin_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"].isnull()
pecan_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"].isnull()
ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull
print(ate_pies.value_counts())

False    876
True     182
dtype: int64


In [22]:
def decode_age(age_as_string):
    if pd.isnull(age_as_string):
        return None
    else:
        no_plus_sign = age_as_string.replace('+', '')
        just_first_word = no_plus_sign.split(' ')[0]
        return int(just_first_word)

data['int_age'] = data["Age"].apply(decode_age)
print(data['int_age'].describe())

count    1025.000000
mean       39.383415
std        15.398493
min        18.000000
25%        30.000000
50%        45.000000
75%        60.000000
max        60.000000
Name: int_age, dtype: float64


## About `int_age`:

It is important to note that the method used to get the `int_age` column uses the minimum age for each interval. It is not an actual depiction of the actual ages (which should be higher).

In [23]:
def decode_money(money_as_string):
    if pd.isnull(money_as_string) or money_as_string == "Prefer not to answer":
        return None
    else:
        no_symbols = money_as_string.replace('$', '').replace(',','')
        just_first_word = no_symbols.split(' ')[0]
        return int(just_first_word)

data['int_income'] = data["How much total combined money did all members of your HOUSEHOLD earn last year?"].apply(decode_money)
print(data['int_income'].describe())

count       889.000000
mean      74077.615298
std       59360.742902
min           0.000000
25%       25000.000000
50%       50000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64


## About `int_income`:

It is important to note that the method used to get the `int_income` column uses the minimum income for each interval. It is not an actual depiction of the actual incomes (which should be higher).

In [38]:
low_income_data = data[data["int_income"] < 150000]
travel_distances = low_income_data["How far will you travel for Thanksgiving?"]
total_respondents = travel_distances.count()
low_income_percents = travel_distances.value_counts().apply(lambda x: x/total_respondents*100)
print(travel_distances.value_counts())
print(low_income_percents)

Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64
Thanksgiving is happening at my home--I won't travel at all                         40.783745
Thanksgiving is local--it will take place in the town I live in                     29.462990
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    21.770682
Thanksgiving is out of town and far away--I have to drive several hours or fly       7.982583
Name: How far will you travel for Thanksgiving?, dtype: float64


In [39]:
high_income_data = data[data["int_income"] >= 150000]
travel_distances = high_income_data["How far will you travel for Thanksgiving?"]
total_respondents = travel_distances.count()
high_income_percents = travel_distances.value_counts().apply(lambda x: x/total_respondents*100)
print(travel_distances.value_counts())
print(high_income_percents)

Thanksgiving is happening at my home--I won't travel at all                         66
Thanksgiving is local--it will take place in the town I live in                     34
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    25
Thanksgiving is out of town and far away--I have to drive several hours or fly      15
Name: How far will you travel for Thanksgiving?, dtype: int64
Thanksgiving is happening at my home--I won't travel at all                         47.142857
Thanksgiving is local--it will take place in the town I live in                     24.285714
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    17.857143
Thanksgiving is out of town and far away--I have to drive several hours or fly      10.714286
Name: How far will you travel for Thanksgiving?, dtype: float64


## Thanksgiving travel distance X income

We can observe that it's more common for low income people to travel for Thanksgiving. However, higher income people tend to travel farther when they do.

In [46]:
data.pivot_table(index = 'Have you ever tried to meet up with hometown friends on Thanksgiving night?', columns = 'Have you ever attended a "Friendsgiving?"', values = "int_age")

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


In [47]:
data.pivot_table(index = 'Have you ever tried to meet up with hometown friends on Thanksgiving night?', columns = 'Have you ever attended a "Friendsgiving?"', values = "int_income")

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


## Friendsgivings X age and income

We can observe that Friendsgiving, or spending Thanksgiving with friends, is a relatively new phenomenon. People who reported attending Friendsgivings were younger and had lower income in average.

In [82]:
dessert_columns = [col for col in data.columns if "dessert" in col]

desserts = []
for col in dessert_columns:
    a = data[data[col].notnull()]
    desserts.append(a[col].value_counts())
desserts = [[d.index[0], d.values[0]] for d in desserts]
sorted(desserts, key= lambda x: x[1], reverse = True)

[['None', 295],
 ['Ice cream', 266],
 ['Cookies', 204],
 ['Cheesecake', 191],
 ['Other (please specify)', 134],
 ['Brownies', 128],
 ['Apple cobbler', 110],
 ['Peach cobbler', 103],
 ['Carrot cake', 72],
 ['Fudge', 43],
 ['Blondies', 16],
 ['pie', 13]]