### Market Segmentation Analysis: Case Study (Fast Food)

The purpose of this case study is to offer illustration of market segmentation analysis using a different empirical data set. The dataset for <br>
this case study is obtained from the book “Market Segmentation Analysis - Understanding It, Doing It, and Making It Useful” is written by Sara <br> Dolnicar, Bettina Grün and Friedrich Leisch.

#### Collecting Data

The data set contains responses from 1453 adult Australian consumers relating to their perceptions of McDonald’s with respect to the following <br> attributes: YUMMY, CONVENIENT, SPICY, FATTENING, GREASY, FAST, CHEAP, TASTY, EXPENSIVE,
HEALTHY, and DISGUSTING. For each of those attributes, respondents provided either a YES response (indicating that they feel McDonald’s possesses this attribute), or a NO response (indicating that McDonald’s does not possess this attribute). <br>
In addition, respondents indicated their AGE and GENDER. Had this data been collected for a real market segmentation study, additional information – such as details about their dining out behaviour, and their use of information channels –
would have been collected to enable the development of a richer and more detailed description of each market segment.

#### Exploring Data

First we explore the key characteristics of the data set by loading the data set and inspecting basic features such as the variable names, the <br> sample size, and the first five rows of the data:

In [1]:
# Importing Libraries
import pandas as pd
import numpy as np

mcdonalds_df = pd.read_csv('dataset/mcdonalds.csv')

In [2]:
# Listing the variables names in the dataset
mcdonalds_df.columns

Index(['yummy', 'convenient', 'spicy', 'fattening', 'greasy', 'fast', 'cheap',
       'tasty', 'expensive', 'healthy', 'disgusting', 'Like', 'Age',
       'VisitFrequency', 'Gender'],
      dtype='object')

In [3]:
# Printing the dimensions of the dataset 
mcdonalds_df.shape

(1453, 15)

In [23]:
# Getting the First five rows of the dataset
mcdonalds_df.head()

Unnamed: 0,yummy,convenient,spicy,fattening,greasy,fast,cheap,tasty,expensive,healthy,disgusting,Like,Age,VisitFrequency,Gender
0,No,Yes,No,Yes,No,Yes,Yes,No,Yes,No,No,-3,61,Every three months,Female
1,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,2,51,Every three months,Female
2,No,Yes,Yes,Yes,Yes,Yes,No,Yes,Yes,Yes,No,1,62,Every three months,Female
3,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,No,No,Yes,4,69,Once a week,Female
4,No,Yes,No,Yes,Yes,Yes,Yes,No,No,Yes,No,2,49,Once a month,Male


As we can see from the output, the first respondent believes that McDonald’s is not
yummy, convenient, not spicy, fattening, not greasy, fast, cheap, not tasty, expensive,
not healthy and not disgusting. This same respondent does not like McDonald’s
(rating of −3), is 61 years old, eats atMcDonald’s every three months and is female.
This quick glance at the data shows that the segmentation variables (perception
of McDonald’s) are verbal, not numeric. This means that they are coded using
the words YES and NO. This is not a suitable format for segment extraction. We
need numbers, not words. To get numbers, we slice the segmentation variables and convert them from verbal YES/NO to numeric binary.

In [13]:
from sklearn.preprocessing import LabelBinarizer, OrdinalEncoder
from sklearn.compose import make_column_transformer

cat_col_names = mcdonalds_df.columns[:11]

mcdonalds_dfTransformed = make_column_transformer(
    (OrdinalEncoder(dtype='int'), cat_col_names),
    remainder='passthrough')

In [31]:
df_transformed = pd.DataFrame(mcdonalds_dfTransformed.fit_transform(mcdonalds_df), columns=mcdonalds_df.columns)

In [43]:
# Checking the mean of the segmentation variables
df_transformed[cat_col_names].mean()

yummy         0.552650
convenient    0.907777
spicy         0.093599
fattening     0.867171
greasy        0.526497
fast          0.900206
cheap         0.598761
tasty         0.644184
expensive     0.357880
healthy       0.198899
disgusting    0.242946
dtype: float64

The average values of the transformed binary numeric segmentation variables indicate that about half of the respondents (55%) perceive McDonald’s as YUMMY, 91% believe that eating at McDonald’s is CONVENIENT, but only 9% think that McDonald’s food is SPICY.

In [50]:
from sklearn.decomposition import PCA
mc_donalds_pca = PCA(n_components=11)
pca_components = mc_donalds_pca.fit_transform(df_transformed[cat_col_names])
df_components = pd.DataFrame(data=pca_components, columns=["PC"+str(i) for i in range(1,12)])

In [91]:
# Gathering data for checking the importance of components
std_components = df_components.std()
var_components = mc_donalds_pca.explained_variance_ratio_
cumvar_components = var_components.cumsum()
imp_components = pd.DataFrame()
imp_components["Standard Deviation"] = std_components
imp_components["Proportion of Variance"] = var_components
imp_components["Cumulative Proportion"] = cumvar_components
imp_components

Unnamed: 0,Standard Deviation,Proportion of Variance,Cumulative Proportion
PC1,0.75705,0.299447,0.299447
PC2,0.607456,0.192797,0.492244
PC3,0.504619,0.133045,0.62529
PC4,0.398799,0.083096,0.708386
PC5,0.337405,0.059481,0.767866
PC6,0.310275,0.0503,0.818166
PC7,0.289697,0.043849,0.862015
PC8,0.275122,0.039548,0.901563
PC9,0.265251,0.036761,0.938323
PC10,0.248842,0.032353,0.970677


Results from principal components analysis indicate that the first two components capture about 50% of the information contained in the segmentation variables.

In [98]:
df_components_factor = pd.DataFrame(data=mc_donalds_pca.components_, columns=["PC"+str(i) for i in range(1,12)])

In [102]:
df_components_factor.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
PC1,-0.476933,0.36379,-0.304444,0.055162,-0.307535,0.170738,-0.280519,0.013041,0.572403,-0.110284,0.045439
PC2,-0.155332,0.016414,-0.062515,-0.142425,0.277608,-0.34783,-0.059738,-0.113079,-0.018465,-0.665818,-0.541616
PC3,-0.006356,0.018809,-0.037019,0.197619,0.07062,-0.355087,0.707637,0.375934,0.40028,-0.075634,0.14173
PC4,0.116232,-0.034094,-0.322359,-0.354139,-0.073405,-0.406515,-0.385943,0.589622,-0.160512,-0.005338,0.25091
PC5,0.304443,-0.063839,-0.802373,0.25396,0.361399,0.209347,0.03617,-0.138241,-0.002847,0.008707,0.001642
PC6,-0.108493,-0.086972,-0.064642,-0.097363,0.10793,-0.594632,-0.086846,-0.627799,0.166197,0.239532,0.339265
PC7,-0.337186,-0.610633,-0.14931,0.118958,-0.128973,-0.103241,-0.040449,0.14006,0.076069,0.428087,-0.489283
PC8,-0.471514,0.307318,-0.287265,-0.002547,-0.210899,-0.076914,0.360453,-0.072792,-0.639086,0.079184,0.019552
PC9,0.329042,0.601286,0.024397,0.067816,-0.003125,-0.261342,-0.068385,0.029539,0.066996,0.454399,-0.490069
PC10,-0.213711,0.076593,0.192051,0.763488,0.287846,-0.178226,-0.349616,0.176303,-0.185572,-0.038117,0.157608


In [110]:
import matplotlib.pyplot as plt



InvalidIndexError: (slice(None, None, None), 0)