**In the below code I am using the results of a satisfaction survey among Starbucks clients in Malaysia to illustrate the use of pivot tables in Pandas.
Pivot tables are a quick and visually helpful way to understand how data is distributed and what groups and clusters can be formed.**

In [None]:

import numpy as np 
import pandas as pd 

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))



I am using here the database that has the answers scaled, for an easier analysis:

In [None]:

df = pd.read_csv('/kaggle/input/starbucks-customer-retention-malaysia-survey/Starbucks satisfactory survey encode cleaned.csv')
df.head()

Let's see what variables are in the database and what would be interesting to see:

In [None]:
for col in df.columns:
    print(col)

Firstly, I would like to see if there is a difference between males and females in their time spend in the coffee shop and how much they spend; the scales are: 
- for gender 0: Male 1: Female
- for time spent: 0: Below 30 mins, 1: 30 mins to 1h, 2: 1h to 2h, 3: 2h to 3 h, 4: More than 3h
- for spendings amount: 0: Zero 1: Less than RM20(EUR4), 2: RM 20 to RM40(EUR4 to EUR8), 3: More than RM40(EUR8)

In [None]:
gender_df = df[['gender', 'timeSpend', 'spendPurchase']]
gender_pivot = pd.pivot_table(gender_df,index=['gender'],aggfunc=[np.mean,len])
gender_pivot #note we have an almost equal distribution of genders in the sample

In [None]:
print(df['timeSpend'].value_counts())
print(df['spendPurchase'].value_counts())

The majority of both males and females spend less than one hour, and males, mostly, are spending more time in the coffee shop; they are also more inclined to purchase more.

Let's see what difference is between how much people spend and the way the evaluate the prices at Starbucks:
- income is scaled: 1-5, 1 - Very Bad, 5 - Excellent


In [None]:
price_df = df[['spendPurchase', 'priceRate']]
price_pivot = pd.pivot_table(price_df,index=['spendPurchase'],aggfunc=[np.mean])
price_pivot

Not surprisingly, people that are happy with the prices are also spending more.
Let's add a split by income:
- income is scaled 0: Less than RM25000 (EUR5000) 1: RM25000 – RM50000 (5000-10000EUR), 2: RM50000 – RM100000 (10000-20000EUR), 3: RM100000 – RM150000 (20000-30000EUR), 4: More than RM150000 (EUR30000)

In [None]:
income_df = df[['income', 'spendPurchase', 'priceRate']]
income_pivot = pd.pivot_table(income_df,index=['income'],aggfunc=[np.mean])
income_pivot

There is a clear difference between people that gain between income intervals 1-2 (5000-20000 EUR) and 3-4 (> 20000 EUR) in terms of how much they spend. In terms of their opinion about the price rates, there doesn't seem to be much difference of opinion, with the exception of the highest income group that has the best opinion.