In [1]:
import pandas as pd

data = pd.read_csv("thanksgiving.csv", encoding="Latin-1")

In [2]:
data = data[data["Do you celebrate Thanksgiving?"] == "Yes"]
# print(data["Do you celebrate Thanksgiving?"].value_counts())

In [3]:
# print(data["What is typically the main dish at your Thanksgiving dinner?"].value_counts())

In [4]:
tofurkey = data[data["What is typically the main dish at your Thanksgiving dinner?"] == "Tofurkey"]
# print(tofurkey['Do you typically have gravy?'])

In [8]:
apple_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"].isnull() 
pumpkin_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"].isnull() 
pecan_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"].isnull() 
ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull

In [9]:
ate_pies.value_counts()

False    876
True     104
dtype: int64

In [10]:
def convert_age(age):
    if pd.isnull(age):
        return None
    else:
        split = age.split(" ")
        first = split[0]
        drop_plus = first.replace("+", "")
        drop_plus = int(drop_plus)
        return drop_plus

data['int_age'] = data["Age"].apply(convert_age)

In [11]:
data['int_age'].describe()

count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64

# Age findings
The mean age is interesting, in that it is close to 40. 

In [13]:
def convert_income(income):
    if pd.isnull(income):
        return None
    else:
        split = income.split(" ")
        first = split[0]
        if first == "Prefer":
            return None
        drop_plus = first.replace("$", "").replace(",", "")
        drop_plus = int(drop_plus)
        return drop_plus

data['int_income'] = data["How much total combined money did all members of your HOUSEHOLD earn last year?"].apply(convert_income)

In [14]:
data['int_income'].describe()

count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64

# Income Description Evaluation

* it's interesting to see that the mean income is just around 75,000, or what one might consider middle class (or upper middle class, depending on the area in the US).
* What's important to keep in mind, however, is that this result looks at the lower value of the income groupings used in the study, and is not an exact indicator of income, more a rough look at income groups


In [19]:
under_150k = data[data['int_income'] < 150000]
over_150k = data[data['int_income'] > 150000]

In [20]:
under_150k['How far will you travel for Thanksgiving?'].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64

In [21]:
over_150k['How far will you travel for Thanksgiving?'].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64

# Distance Traveled by Income
* It appears as though the distance traveled by those who make less than 150K per year follows a similar pattern to those who make more than 150K per year; more research is needed to see how many participants fall into each bucket to make a better correlation. 
* It does not appear that those with less income travel farther for Thanksgiving than those who have more income, at least on a per question grouping basis (there are more total people who travel for Thanksgiving in the under 150K bracket, but then again there are simply more people in this bracket than in the over 150K bucket).


In [25]:
friends_pivot = data.pivot_table(
    index="Have you ever tried to meet up with hometown friends on Thanksgiving night?",
    columns = 'Have you ever attended a "Friendsgiving?"', 
    values = "int_age")

friends_pivot


"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


In [26]:
avg_friends_income = data.pivot_table(
    index = "Have you ever tried to meet up with hometown friends on Thanksgiving night?",
    columns = 'Have you ever attended a "Friendsgiving?"',
    values = "int_income"
)

avg_friends_income

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


# Friends on Thanksgiving
## For the younger people, with less money

* The results indicate that the average age for those who have either attended "Friendsiving" (at 37) or met up with friends Thanksgiving night (at 41) or both (at 34) are lower than those who have done neither (at 42). 
* Additionally, the indicators show income levels falling along the same lines:
- income of those who have attended Friendsgiving and not met with friends at home: 73K
- income of those who have met with friends and not attending a Friendsgiving: 78K
- income of those who have done both: 66K
- income of those who have done neither: 79K

It would appear as though those with less money, and who are younger, are spending more time with friends on thanksgiving than at home with their families.

In [27]:
data.columns

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

In [64]:
work_thksgv_gender = data.pivot_table(
    index= "Will you employer make you work on Black Friday?",
    columns = 'What is your gender?',
    values = "RespondentID",
    aggfunc='count'
)

work_thksgv_age = data.pivot_table(
    columns= ["Will you employer make you work on Black Friday?", "What is your gender?"],
    index = 'int_age',
    values = "RespondentID",
    aggfunc='count'
)

In [66]:
work_thksgv_gender

What is your gender?,Female,Male
Will you employer make you work on Black Friday?,Unnamed: 1_level_1,Unnamed: 2_level_1
Doesn't apply,3,4
No,4,16
Yes,24,18


# Working on Thanksgiving: More women, younger men

* From the data it appears that more women than men work on Thanksgiving (24 women responded that their employer would make them work on the holiday, as opposed to 18 men), though this may be a reflection of retail's labor force comprising of more women than men; more research should be done on that concept before any conclusions are drawn. 
* Additionally, if we break the gender of those who work in Thanksgiving down by age, it looks as though t