# Description

In this project, I'll be working with Jupyter notebook, and analyzing data on Thanksgiving dinner in the US.The dataset came from FiveThirtyEight.

The dataset is stored in the thanksgiving.csv file. It contains 1058 responses to an online survey about what Americans eat for Thanksgiving dinner. Each survey respondent was asked questions about what they typically eat for Thanksgiving, along with some demographic questions, like their gender, income, and location. This dataset will allow us to discover regional and income-based patterns in what Americans eat for Thanksgiving dinner.

The dataset has 65 columns, and 1058 rows. Most of the column names are questions, and most of the column values are string responses to the questions.

In this i explored:

1. What main dishes people tend to eat during Thanksgiving dinner

2. What Pies People Eat

3. The average age and houselhold income of the participants of this online survey

4. Looked at the connection between Travel Distance And Houselhold Income

5. What relationship exists betwwen tendency to spend Thanksgiving with friends and age


## Reading the file into data and setting the encoding to 'Latin=1'

In [47]:
import pandas as pd
data = pd.read_csv('thanksgiving.csv',  encoding="Latin-1")
data.head()

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain
3,4337933040,Yes,Turkey,,Baked,,Bread-based,,Homemade,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$200,000 and up",Pacific
4,4337931983,Yes,Tofurkey,,Baked,,Bread-based,,Canned,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$100,000 to $124,999",Pacific


## Displaying all the columns in data

In [48]:
print(data.columns)

Index(['RespondentID', 'Do you celebrate Thanksgiving?',
       'What is typically the main dish at your Thanksgiving dinner?',
       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',
       'How is the main dish typically cooked?',
       'How is the main dish typically cooked? - Other (please specify)',
       'What kind of stuffing/dressing do you typically have?',
       'What kind of stuffing/dressing do you typically have? - Other (please specify)',
       'What type of cranberry saucedo you typically have?',
       'What type of cranberry saucedo you typically have? - Other (please specify)',
       'Do you typically have gravy?',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',
       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',
       'Which of these side dishes aretypically served

This uses the value_counts() method to display the different answers for the question 'Do you celebrate Thanksgiving?'

This also creates another dataframe where the colmun 'Do you celebrate Thanksgiving?' ONLY consists of those who answered 'Yes' 

In [49]:
celebrate_counter = data['Do you celebrate Thanksgiving?'].value_counts()
print(celebrate_counter)

#This filters the dataframe to only contain the 'Yes' answer for the column 'Do you celebrate Thanksgiving?' 
data_edit = data[data['Do you celebrate Thanksgiving?'] == 'Yes']
print(data_edit['Do you celebrate Thanksgiving?'].value_counts())

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64
Yes    980
Name: Do you celebrate Thanksgiving?, dtype: int64


## Exploring what main dishes people tend to eat during Thanksgiving dinner

In [50]:
#Exploring what main dishes people tend to eat during Thanksgiving dinner
common_dish = data['What is typically the main dish at your Thanksgiving dinner?']
print(common_dish.value_counts())

#Displaying the gravy column for those that ate 'Tofurkey' has the main dish
gravy_column = data[common_dish == 'Tofurkey']['Do you typically have gravy?']
print('\n',gravy_column)

Turkey                    859
Other (please specify)     35
Ham/Pork                   29
Tofurkey                   20
Chicken                    12
Roast beef                 11
I don't know                5
Turducken                   3
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

 4      Yes
33     Yes
69      No
72      No
77     Yes
145    Yes
175    Yes
218     No
243    Yes
275     No
393    Yes
399    Yes
571    Yes
594    Yes
628     No
774     No
820     No
837    Yes
860     No
953    Yes
Name: Do you typically have gravy?, dtype: object


## Checking Out What Pies People Eat

In [51]:
#Checking how many Null answers were f]given for this questions
apple_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'])
pumpkin_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'])
pecan_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'])

no_pies = apple_isnull & pumpkin_isnull & pecan_isnull
print(no_pies.value_counts())

False    876
True     182
dtype: int64


## Converting Age To Integer

In [52]:
# convert a single string to an appropriate integer value
def age_extract(column):
    if pd.isnull(column):
        return None
    #Splits the string on the space character
    column = column.split(' ')[0]
    #replaces the '+' with space
    column = column.replace('+','')
    #column = pd.to_numeric(column)
    return int(column)

data['int_age'] = data['Age'].apply(age_extract)
print(data['int_age'].describe())


count    1025.000000
mean       39.383415
std        15.398493
min        18.000000
25%        30.000000
50%        45.000000
75%        60.000000
max        60.000000
Name: int_age, dtype: float64


## Findings So Far

Looking at the displayed result I have derived the following:

1. The average age of the participants is 39.38 years old
2. The minumum age is 18 years whilke the maximum is 60 years
3. The number of people that responded correctly to the question was 1025 people
4. 

N.B: THIS RESULT IS NOT A TRUE description OF THE DATA BECAUSE:

1. I took a lot of liberty by only taking the first part of the age range
2. For the 60+ case I didn't account for ages above 60 years old

## Converting Income To Integer

In [53]:
#convert a single string to an appropriate integer income value
def income_extract(column):
    if pd.isnull(column):
        return None
    column = column.split(' ')[0]
    if column == 'Prefer':
        return None
    column = column.replace('$','')
    column = column.replace(',','')
    return int(column)

data['int_income'] = data['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(income_extract)
print(data['int_income'].describe())

count       889.000000
mean      74077.615298
std       59360.742902
min           0.000000
25%       25000.000000
50%       50000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64


## Findings So Far 

Looking at the displayed result I have derived the following:

1. The average household income salary is $74,077.62

2. The highest amount earned by a household is $200,000

3. The number of people that responded correctly to the question was 889 people.

4. Only about 25% of the population earned at least $100,000

N.B: THIS RESULT IS NOT A TRUE description OF THE DATA BECAUSE:

1. I took a lot of liberty by only taking the first part of the income range

2. For the $200,000+ case I didn't account for income above 200,000 

## Correlating Travel Distance And Income

In [54]:
#Checks how far people earning UNDER $150,000 will travel.
less_150k = data[data['int_income'] < 150000]['How far will you travel for Thanksgiving?']
print(less_150k.value_counts())
print(less_150k.describe())

Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64
count                                                   689
unique                                                    4
top       Thanksgiving is happening at my home--I won't ...
freq                                                    281
Name: How far will you travel for Thanksgiving?, dtype: object


In [55]:
#Checks how far people earning AT LEAST $150,000 will travel.
greater_150k = data[data['int_income'] >= 150000]['How far will you travel for Thanksgiving?']
print(greater_150k.value_counts())
print('\n', greater_150k.describe())

Thanksgiving is happening at my home--I won't travel at all                         66
Thanksgiving is local--it will take place in the town I live in                     34
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    25
Thanksgiving is out of town and far away--I have to drive several hours or fly      15
Name: How far will you travel for Thanksgiving?, dtype: int64

 count                                                   140
unique                                                    4
top       Thanksgiving is happening at my home--I won't ...
freq                                                     66
Name: How far will you travel for Thanksgiving?, dtype: object


## Findings So Far 

UNDER $150,000:

1. A total of 689 people fall within this bracket 

2. A lot of them do thanksgiving either at home(281) or within the town they stay(203), so they rarely travel or travel minimal distance

3. Only a few people travel a great distance for thanksgiving(55)

AT LEAST $150,000:

1. Only a total of 140 people earn at least that

2. Most of those in this bracket would not be travelling as they are holding thanksgiving at home(66) or atleast within their town(34) 

3. Only a few actually travel a great distance away from home (15 people)

### Overall

1. The percentage of AT LEAST(47.14%) staying at home is higher than  UNDER(40.78%). This is inline with the hypothesize 'People earning more are more likely to have Thanksgiving at their house as a result'.

2. The percentage of UNDER(7.98%) going a far distance is lower compared to AT LEAST(10.71%)




## Linking Friendship And Age

In [56]:
#pivot table showing the average age of respondents
data.pivot_table( 
                      index ='Have you ever tried to meet up with hometown friends on Thanksgiving night?', 
                      columns = 'Have you ever attended a "Friendsgiving?"', 
                      values = 'int_age')
#print(holdup)

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


In [57]:
#pivot table showing the average income of respondents
data.pivot_table( 
                      index ='Have you ever tried to meet up with hometown friends on Thanksgiving night?', 
                      columns = 'Have you ever attended a "Friendsgiving?"', 
                      values = 'int_income')

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


## Findings So Far

1. It is people with lower income that tend to hang out with friends during thanksgiving (attend a Friendsgiving and meet up with friends on Thanksgiving night)

2. The data shows a correlation between age tendency to meet up with friends and spend thanksgiving (Friendsgiving). Younger people tend to do this.

### Overall

This proves the hypothesis that the questions

1. Have you ever tried to meet up with hometown friends on Thanksgiving night?

2. Have you ever attended a "Friendsgiving?

was geared towards younger people (who also tend to have less money) and who tend to practice this often.