# Analizing Thanksgiving Dinner Data

### In this project, I'll be working with Jupyter notebook, and analyzing data on Thanksgiving dinner in the US. The dataset is stored in the thanksgiving.csv file. It contains 1058 responses to an online survey about what Americans eat for Thanksgiving dinner

#### you can download the dataset here: <a href='https://github.com/gergirod/data_science/blob/master/thanksgiving.csv'>Dataset</a>




In [1]:
import pandas as pd
data = pd.read_csv('thanksgiving.csv', encoding="Latin-1")
data.head(3)

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain


## displaying all the columns from the dataset

In [2]:
data.columns

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

## filter rows where column "Do you celebrate Thanksgiving?" is Yes

In [45]:
data = data[data['Do you celebrate Thanksgiving?'] == 'Yes']

## Explore Main Dishes

In [4]:
dishes = data['What is typically the main dish at your Thanksgiving dinner?']
dishes.value_counts()

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

In [5]:
data[data['What is typically the main dish at your Thanksgiving dinner?'] == 'Tofurkey']['Do you typically have gravy?']

4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object

## Figuring Out What Pies People Eat

In [6]:
apple_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'])
pumpkin_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'])
pecan_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'])

ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull
ate_pies.describe()

count       980
unique        2
top       False
freq        876
dtype: object

## Converting Age to Numeric

In [7]:
def age_to_numeric(age_str):
    if pd.isnull(age_str):
        return None
    age_string = age_str.split(" ")[0]
    age_string = age_string.replace("+","")
    return int(age_string)

data['int_age'] = data['Age'].apply(age_to_numeric)
data['int_age'].describe()
    
    

count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64

## Converting Income to Numeric

In [8]:
def income_to_numeric(income_str):
    
    if pd.isnull(income_str):
        return None
    string_income = income_str.split(" ")
    if(string_income[0] == "Prefer"):
        return None
    string_income_number = string_income[0].replace('$', "")
    string_income_number = string_income_number.replace(',', "")
    return int(string_income_number)

data['int_income'] = data['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(income_to_numeric)

data['int_income'].describe()
    

count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64

## Correlating Travel Distance and Income

In [9]:
people_under_150000 = data[data['int_income']<150000]['How far will you travel for Thanksgiving?']
people_under_150000.value_counts()

people_over_150000 = data[data['int_income']>150000]['How far will you travel for Thanksgiving?']

people_over_150000.value_counts()



Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64

## Linking Friendship and Age

In [10]:

data.pivot_table(index='Have you ever tried to meet up with hometown friends on Thanksgiving night?',
                columns = 'Have you ever attended a "Friendsgiving?"', values='int_age')



"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


In [11]:
data.pivot_table(index='Have you ever tried to meet up with hometown friends on Thanksgiving night?',
                columns = 'Have you ever attended a "Friendsgiving?"', values='int_income')

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

## Figure out the most commong dessert people eat

In [13]:
dessert_frame = data.iloc[:,39:50]
dessert_frame

def count_not_null(column):    
    is_null = pd.isnull(column)
    return len(column[is_null == False])

most_common_dessert = dessert_frame.apply(count_not_null)
sort_most_common_dessert = most_common_dessert.sort_values(ascending = False)
sort_most_common_dessert

Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - None                      295
Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Ice cream                 266
Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Cookies                   204
Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Cheesecake                191
Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Other (please specify)    134
Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Brownies                  128
Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Apple cobbler             110
Which of these desserts do you typically have at Thanksgiving 

## Figure out the most common complete meal people eat

In [38]:
main_dish = data['What is typically the main dish at your Thanksgiving dinner?']
side_frame = data.iloc[:,11:26]
pie_frame = data.iloc[:,26:38]
dessert_frame = data.iloc[:,39:50]

most_common_dish = main_dish_frame.value_counts().sort_index()
most_common_side = side_frame.apply(count_not_null)
most_common_pie = pie_frame.apply(count_not_null)
most_common_dessert = dessert_frame.apply(count_not_null)

print(most_common_dish.sort_values(ascending = False))
print(most_common_side.sort_values(ascending = False))
print(most_common_pie.sort_values(ascending = False))
print(most_common_dessert.sort_values(ascending = False))

main_dish_frame

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64
Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Mashed potatoes                     817
Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Rolls/biscuits                      766
Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Green beans/green bean casserole    686
Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Yams/sweet potato casserole         631
Which of these side dishes aretypically served at your Thanksgiving dinner? Please sele

0                       Turkey
1                       Turkey
2                       Turkey
3                       Turkey
4                     Tofurkey
5                       Turkey
6                       Turkey
7                       Turkey
8                       Turkey
9       Other (please specify)
11                      Turkey
12                    Ham/Pork
13                      Turkey
14                      Turkey
15                      Turkey
16                   Turducken
17                      Turkey
18                      Turkey
19                      Turkey
20                      Turkey
21                      Turkey
23                      Turkey
24                      Turkey
25                      Turkey
26                      Turkey
27                      Turkey
28      Other (please specify)
29                      Turkey
30      Other (please specify)
32                      Turkey
                 ...          
1024                    Turkey
1025    

#### the above code show that the monst common complete meals is Turkey(main dish), Mashed potatoes (side dish), Pumpkin (pie) and Ice cream (dessert)

## Identify how many people work on Black Friday?

In [39]:
worki_thanksgiving = data['Will you employer make you work on Black Friday?']

In [43]:
work_thanksgiving.value_counts()

Yes              43
No               20
Doesn't apply     7
Name: Will you employer make you work on Black Friday?, dtype: int64