In [6]:
import pandas as pd

In [7]:
data = pd.read_csv("thanksgiving.csv", encoding="Latin-1")

Now, all rows for non-Thanksgiving celebrators, as this analysis is focused on those who do celebrate.

In [8]:
data = data[data["Do you celebrate Thanksgiving?"] == "Yes"]

The code below will verify that it has appropriately filtered.

In [9]:
data["Do you celebrate Thanksgiving?"].value_counts()

Yes    980
Name: Do you celebrate Thanksgiving?, dtype: int64

In [10]:
data["What is typically the main dish at your Thanksgiving dinner?"].value_counts()

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

In [11]:
tofu_df = data[data["What is typically the main dish at your Thanksgiving dinner?"] == "Tofurkey"]

In [12]:
tofu_df["Do you typically have gravy?"].value_counts()

Yes    12
No      8
Name: Do you typically have gravy?, dtype: int64

*****Pie*****

This portion of analysis will determine the popularity of pies. Respondants had the opportunity to select "all that apply" regarding which pies they eat. First, 

In [13]:
apple_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple"].isnull()

In [14]:
pumpkin_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin"].isnull()

In [15]:
pecan_isnull = data["Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan"].isnull()

In [16]:
ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull

In [17]:
ate_pies.value_counts()

False    876
True     104
dtype: int64

In [18]:
pie_eaters = [apple_isnull, pumpkin_isnull, pecan_isnull]
for i, elem in enumerate(pie_eaters):
    print("{:02d}".format(i), "-",elem.value_counts())

00 - False    514
True     466
Name: Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple, dtype: int64
01 - False    729
True     251
Name: Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin, dtype: int64
02 - True     638
False    342
Name: Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan, dtype: int64


In [19]:
def age_parser(elem):
    if pd.isnull(elem):
        return None
    age = elem.split(" ")[0]
    age = age.replace("+","")
    return int(age)

data["int_age"] = data["Age"].apply(age_parser)

In [20]:
data["int_age"].value_counts()

45.0    269
60.0    258
30.0    235
18.0    185
Name: int_age, dtype: int64

In [21]:
data["int_age"].describe()

count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64

# Findings



Based on the lower and upper quartile, it appears that this distribution is mostly even. It should be noted, that due to the method of extracting the ages, the results should note be fully relied upon.

In [22]:
def extract_money(elem):
    if pd.isnull(elem):
        return None
    income = elem.split(" ")[0]
    income = income.replace("$","")
    income = income.replace(",","")
    if income == "Prefer":
        return None
    return int(income)

In [23]:
data["int_income"] = data["How much total combined money did all members of your HOUSEHOLD earn last year?"].apply(extract_money)

In [25]:
data["int_income"].value_counts()

25000.0     166
75000.0     127
50000.0     127
100000.0    109
200000.0     76
10000.0      60
0.0          52
125000.0     48
150000.0     38
175000.0     26
Name: int_income, dtype: int64

In [26]:
data["int_income"].describe()

count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64

In [30]:
less_inc_data = data[data["int_income"]<50000]
more_inc_data = data[data["int_income"]>150000]

In [31]:
less_inc_data["How far will you travel for Thanksgiving?"].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         106
Thanksgiving is local--it will take place in the town I live in                      92
Thanksgiving is out of town but not too far--it's a drive of a few hours or less     64
Thanksgiving is out of town and far away--I have to drive several hours or fly       16
Name: How far will you travel for Thanksgiving?, dtype: int64

In [29]:
more_inc_data["How far will you travel for Thanksgiving?"].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64

# To Do

This is a helpful exploration example for how the pandas package can help glean insights with relative ease. Next thing to focus on is how to do some math on this, such as what the percentage of total each of these responses was.

In [32]:
?pd.pivot_table

In [35]:
friends_pivot = pd.pivot_table(data,index="Have you ever tried to meet up with hometown friends on Thanksgiving night?",columns='Have you ever attended a "Friendsgiving?"', values="int_age")

In [36]:
print(friends_pivot)

Have you ever attended a "Friendsgiving?"                  No        Yes
Have you ever tried to meet up with hometown fr...                      
No                                                  42.283702  37.010526
Yes                                                 41.475410  33.976744


In [37]:
data.pivot_table(index="Have you ever tried to meet up with hometown friends on Thanksgiving night?",columns='Have you ever attended a "Friendsgiving?"', values="int_age")

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


### Findings

Those who haven't attended a "Friendsgiving" are older - with no discernable age influence on whether they meet up with friends or not. Of the younger crowd who has attended one, there is a noticable age difference for whether they meet up with friends - the older crowd doesn't meet up as much as the younger ones do. 