# Setup

In [24]:
import re
import pandas
raw_data = pandas.read_csv('thanksgiving.csv', encoding='Latin-1')

# Getting familiar with the data

In [25]:
celebration_query_column = raw_data['Do you celebrate Thanksgiving?']
participant_counts = celebration_query_column.value_counts()
print(participant_counts)

# Filter out participants who do not celebrate
data = raw_data[raw_data['Do you celebrate Thanksgiving?'] == 'Yes']

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64


# Analysing by Dish

## Main Dish

In [26]:
main_dish_counts = data['What is typically the main dish at your Thanksgiving dinner?'].value_counts()
print(main_dish_counts)

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64


Most of the US prefers a Turkey dinner specifically:

In [27]:
only_tofurkey = data[data['What is typically the main dish at your Thanksgiving dinner?'] == 'Tofurkey']
gravy_with_tofurkey = only_tofurkey['Do you typically have gravy?']
gravy_with_tofurkey_counts = gravy_with_tofurkey.value_counts()
print(gravy_with_tofurkey_counts)

Yes    12
No      8
Name: Do you typically have gravy?, dtype: int64


# American Pie

In [28]:
apple_isnull = pandas.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'])
pumpkin_isnull = pandas.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'])
pecan_isnull = pandas.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'])
ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull
counts_ate_pies = ate_pies.value_counts()
print(counts_ate_pies)

False    876
True     104
dtype: int64


104 participants did not eat any pies and 876 participants ate Apple, Pecan or Pumpkin pies.

# Analysing Age

In [37]:
# convert_range_to_integer
# param: str_range (String)
# returns integer or None if null is passed
# Example: Providing '18 - 29' returns 18
def convert_range_to_integer(str_range):
    if pandas.isnull(str_range):
        return None
    else:
        # split the range on first whitespace and return whatever is before it
        # Example: '18 - 29' returns '18'
        result = str_range.split(None, 1)[0]
        # replace '+' if exists
        result = re.sub('\+', '', result)
        return int(result)

    
data['int_age'] = data['Age'].apply(convert_range_to_integer)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


0    18.0
1    18.0
2    18.0
3    30.0
4    30.0
Name: int_age, dtype: float64

In [38]:
print(data['int_age'].describe())

count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64


An overviews shows that there are few participants who are in the age group of 18 - 30. With the median higher than the mean, it looks like the ages are slightly skewed to the right with more participants above 40. Large variation in the age of participants - standard deviation is 15 years.

More analysis will be required to get a clearer depiction of participant ages.

# Household Income

In [46]:
def convert_money_string_to_int(str_money):
    money = re.sub(r'[\$,]', '', str_money)
    return int(money)
    
    
def convert_money_range_to_int(str_range):
    if pandas.isnull(str_range):
        return None
    else:
        value = str_range.split(None, 1)[0]
        if value == 'Prefer':
            return None
        else:
            return convert_money_string_to_int(value)

In [45]:
income = data['How much total combined money did all members of your HOUSEHOLD earn last year?']
data['int_income'] = income.apply(convert_money_range_to_int)
print(data['int_income'].describe())

count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


With the mean being at $75,965 and the median close at $75,000 and a large spread - standard deviation is $59,068, the data covers participants from a wide range of economic backgrounds.

In [54]:
income_less_than_15k = data[data['int_income'] < 15000]
travelling_distance = income_less_than_15k['How far will you travel for Thanksgiving?']
travelling_distance_counts = travelling_distance.value_counts()
print(travelling_distance_counts)

Thanksgiving is happening at my home--I won't travel at all                         46
Thanksgiving is local--it will take place in the town I live in                     38
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    22
Thanksgiving is out of town and far away--I have to drive several hours or fly       6
Name: How far will you travel for Thanksgiving?, dtype: int64


Participants who earn less than $15000 are more likely to travel for Thanksgiving than celebrate in their own home. Most of those not celebrating in their own home will not be travelling too far.

# Friends and Friendsgiving

In [56]:
data.pivot_table(
    index='Have you ever tried to meet up with hometown friends on Thanksgiving night?',
    columns='Have you ever attended a "Friendsgiving?"',
    values='int_age'
)

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


In [57]:
data.pivot_table(
    index='Have you ever tried to meet up with hometown friends on Thanksgiving night?',
    columns='Have you ever attended a "Friendsgiving?"',
    values='int_income'
)

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


1. Younger participants (average ages: 34 - 37) are more likely to either have attended or attend a Friendsgiving or have tried meeting up with hometown friends.

2. Participants who have attended a Friendsgiving or have tried to meet up with hometown friends in the past are also likely to be earning on average $6000 to $13000 less than those who did not, suggesting that these participants are likely to be younger in age - which explains their earnings and is consistent with the previous finding shown in point 1.