### Parcourir le dataset

In [2]:
import pandas as pd

data = pd.read_csv('thanksgiving.csv', encoding= 'latin-1')
data.head()

Unnamed: 0,RespondentID,Do you celebrate Thanksgiving?,What is typically the main dish at your Thanksgiving dinner?,What is typically the main dish at your Thanksgiving dinner? - Other (please specify),How is the main dish typically cooked?,How is the main dish typically cooked? - Other (please specify),What kind of stuffing/dressing do you typically have?,What kind of stuffing/dressing do you typically have? - Other (please specify),What type of cranberry saucedo you typically have?,What type of cranberry saucedo you typically have? - Other (please specify),...,Have you ever tried to meet up with hometown friends on Thanksgiving night?,"Have you ever attended a ""Friendsgiving?""",Will you shop any Black Friday sales on Thanksgiving Day?,Do you work in retail?,Will you employer make you work on Black Friday?,How would you describe where you live?,Age,What is your gender?,How much total combined money did all members of your HOUSEHOLD earn last year?,US Region
0,4337954960,Yes,Turkey,,Baked,,Bread-based,,,,...,Yes,No,No,No,,Suburban,18 - 29,Male,"$75,000 to $99,999",Middle Atlantic
1,4337951949,Yes,Turkey,,Baked,,Bread-based,,Other (please specify),Homemade cranberry gelatin ring,...,No,No,Yes,No,,Rural,18 - 29,Female,"$50,000 to $74,999",East South Central
2,4337935621,Yes,Turkey,,Roasted,,Rice-based,,Homemade,,...,Yes,Yes,Yes,No,,Suburban,18 - 29,Male,"$0 to $9,999",Mountain
3,4337933040,Yes,Turkey,,Baked,,Bread-based,,Homemade,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$200,000 and up",Pacific
4,4337931983,Yes,Tofurkey,,Baked,,Bread-based,,Canned,,...,Yes,No,No,No,,Urban,30 - 44,Male,"$100,000 to $124,999",Pacific


In [3]:
data_columns = data.columns

### Suppresssions des lignes des personnes qui ne fêtent pas thanksgiving

In [4]:
# compter le nombre de valeurs
data['Do you celebrate Thanksgiving?'].value_counts()

Yes    980
No      78
Name: Do you celebrate Thanksgiving?, dtype: int64

In [5]:
# filtrer avec les réponses 'Yes'
data = data[data['Do you celebrate Thanksgiving?']== 'Yes']

In [6]:
# Check value_counts
data['Do you celebrate Thanksgiving?'].value_counts()

Yes    980
Name: Do you celebrate Thanksgiving?, dtype: int64

### Exploration des repas thanksgiving

In [7]:
data['What is typically the main dish at your Thanksgiving dinner?'].value_counts()
tofurkey = data[data['What is typically the main dish at your Thanksgiving dinner?'].values == 'Tofurkey']

In [8]:
tofurkey['What is typically the main dish at your Thanksgiving dinner?'].value_counts()

Tofurkey    20
Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64

On constate que seulement 20 personnes ont mangé de la dinde au tofu pour leur diner de Thanksgiving

Ici on peut voir la répartition entre ceux qui ont mangé leur dinde au tofu avec ou sans sauce : 

In [9]:
tofurkey['Do you typically have gravy?'].value_counts()

Yes    12
No      8
Name: Do you typically have gravy?, dtype: int64

### Tendances des desserts ? 
Combien de personnes ont au moins mangé une tarte à la pomme ou pecan ou citrouille ? 

In [10]:
apple_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'])
pumpkin_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'])
pecan_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'])
pies = pumpkin_isnull & pecan_isnull & apple_isnull
pies.value_counts()

False    876
True     104
dtype: int64

On constate que 104 personnes n'ont pas mangé de tartes, et 876 personnes en ont mangé au moins une (soit pomme, citrouille ou pécan)

### Répartion des ages des sondés

On crée une fonction qui convertit l'age en entier et prend la première valeur de la catégorie

In [11]:
def extract_age(age_str):
    if pd.isnull(age_str):
        return None
    age_str = age_str.split(' ')[0]
    age_str = age_str.replace('+', '')
    return int(age_str)

On applique la fonction sur la colonne 'Age' et on crée une nouvelle colonne 'int_age'

In [12]:
data['int_age'] = data["Age"].apply(extract_age)

On regarde la répartition des ages

In [13]:
data['int_age'].describe()

count    947.000000
mean      40.089757
std       15.352014
min       18.000000
25%       30.000000
50%       45.000000
75%       60.000000
max       60.000000
Name: int_age, dtype: float64

On se rend compte que même si l'age est donné approximativement, on se rend compte que les ages sont distribués équitablement

### Répartition des revenus des sondés

In [14]:
def extract_income(income_str):
    if pd.isnull(income_str):
        return None
    income_str = income_str.split(' ')[0]
    if income_str == 'Prefer':
        return None
    income_str = income_str.replace('$', '')
    income_str = income_str.replace(',', '')
    return int(income_str)

data['int_income'] = data['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(extract_income)

In [15]:
data['int_income'].describe()

count       829.000000
mean      75965.018094
std       59068.636748
min           0.000000
25%       25000.000000
50%       75000.000000
75%      100000.000000
max      200000.000000
Name: int_income, dtype: float64

### Corrélation entre distance et revenus¶

Question : comment voyagent les sondés qui gagnent moins de 150000 $ ?

In [16]:
data_inf_150 = data[data['int_income'] < 150000]
data_inf_150['How far will you travel for Thanksgiving?'].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         281
Thanksgiving is local--it will take place in the town I live in                     203
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150
Thanksgiving is out of town and far away--I have to drive several hours or fly       55
Name: How far will you travel for Thanksgiving?, dtype: int64

Commentaire : 281 personnes qui célèbrent Thanksgiving chez eux, et 203 le célèbrent dans la meme ville et enfin 205 personnes voyagent pour célébrer Thanksgiving. 

Question : comment voyagent les sondés qui gagnent plus de 150000 $ ?

In [18]:
data_sup_150 = data[data['int_income'] > 150000]
data_sup_150['How far will you travel for Thanksgiving?'].value_counts()

Thanksgiving is happening at my home--I won't travel at all                         49
Thanksgiving is local--it will take place in the town I live in                     25
Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16
Thanksgiving is out of town and far away--I have to drive several hours or fly      12
Name: How far will you travel for Thanksgiving?, dtype: int64

Commentaire : on note une légère correlation entre les revenus et le fait de voyager pour Thanksgiving. Tendance à plus voyager pour les personnes avec un revenu plus faible, ce qui s'explique par leur jeune age et ils rentrent sans doute à la maison. Pour le vérifier il faudrait aussi prendre en compte l'âge pour voir si cette tendance se confirme.

### Corrélation entre passer Thanksgiving avec des amis et l'âge

In [19]:
friendsgiving_age = data.pivot_table(index = 'Have you ever tried to meet up with hometown friends on Thanksgiving night?', columns='Have you ever attended a "Friendsgiving?"', values='int_age')

In [20]:
friendsgiving_age

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,42.283702,37.010526
Yes,41.47541,33.976744


#### Commentaires : 
    - en moyenne les personnes qui ont répondu non aux deux questions ont en moyenne 42 ans
    - en moyenne les personnes qui ont répondu oui aux deux questions (c'est-à-dire qu'ils ont essayé de le célébrer entre ami plutôt qu'en famille) on 34 ans.
    Les personnes les plus jeunes sont plus susceptibles de passer Thanksgiving entre amis qu'en famille

### Corrélation entre passer Thanksgiving avec des amis et les revenus

In [21]:
friendsfgiving_income = data.pivot_table(index = 'Have you ever tried to meet up with hometown friends on Thanksgiving night?', columns='Have you ever attended a "Friendsgiving?"', values='int_income')

In [22]:
friendsfgiving_income

"Have you ever attended a ""Friendsgiving?""",No,Yes
Have you ever tried to meet up with hometown friends on Thanksgiving night?,Unnamed: 1_level_1,Unnamed: 2_level_1
No,78914.549654,72894.736842
Yes,78750.0,66019.736842


#### Commentaires : 
- en moyenne les personnes qui ont répondu non aux deux questions gagnent 79000 dollars
- en moyenne les personnes qui ont répondu oui aux deux questions gagnent 66000 dollars

Les personnes à faible revenu sont plus susceptible de passer Thanksgiving entre amis qu'en famille

In [23]:
% matplotlib inline