Conjoint Analysis - Step by Step

1. Fazer o ranking dos estímulos em conjunto
2. Transformar cada linha do conjunto de estímulos em dummy variável
3. Estimar os efeitos principais com regressão linear (se for ranking), 
ou estimar com logit ou probit se a decisão é 1=choose ou 2=notchoose
4. Variável dependente (y) é o ranking
5. Variável independente (X) são os estímulos em forma de dummy


In [1]:
import numpy as np
import pandas as pd

In [2]:
caInputeDF = pd.read_csv("ConjointInput.csv", sep = ";")

Input = a ranking of 3 different features (TV Size, TV Type, TV Color) with 3(32",37",42")<br>
, 2 (LCD, Plasma), 3(Black, Silver, Anthrazit) different stimuli types.

In [3]:

caInputeDF

Unnamed: 0,Stimulus,Rank
0,A1B1C1,2
1,A1B1C2,3
2,A1B1C3,1
3,A1B2C1,5
4,A1B2C2,6
5,A1B2C3,4
6,A2B1C1,8
7,A2B1C2,9
8,A2B1C3,7
9,A2B2C1,11


## First step is to introduce dummy variables for every stimulus<p>
There are in total 9 differen stimuli, and 18 different combinations


In [4]:
ConjointDummyDF = pd.DataFrame(np.zeros((18,9)), columns=["Rank","A1", "A2", "A3",
                                                    "B1","B2", 
                                                    "C1", "C2",
                                                    "C3"])

In [5]:
ConjointDummyDF.Rank = caInputeDF.Rank

for index, row in caInputeDF.iterrows(): 
    stimuli1, stimuli2, stimuli3 = caInputeDF["Stimulus"].ix[index][:2], \
    caInputeDF["Stimulus"].ix[index][2:4], caInputeDF["Stimulus"].ix[index][4:6]
    
    
    ConjointDummyDF.ix[index, [stimuli1,stimuli2,stimuli3]] = 1

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  after removing the cwd from sys.path.
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  import sys


In [6]:
ConjointDummyDF.head()

Unnamed: 0,Rank,A1,A2,A3,B1,B2,C1,C2,C3
0,2,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
1,3,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
2,1,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
3,5,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
4,6,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0


## Insert the proper Stimulus names

In [7]:
fullNames = {"Rank":"Rank", \
           "A1": "32\" (81cm)","A2": "37\" (94cm)","A3": "42\" (107cm)", \
          "B1": "Plasma", "B2":"LCD", \
           "C1":"Silver", "C2":"Black", "C3": "Anthrazit",\
          }

ConjointDummyDF.rename(columns=fullNames, inplace=True)

In [8]:
ConjointDummyDF.head()

Unnamed: 0,Rank,"32"" (81cm)","37"" (94cm)","42"" (107cm)",Plasma,LCD,Silver,Black,Anthrazit
0,2,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
1,3,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
2,1,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
3,5,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
4,6,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0


## Estimate Main Effects with a linear regression
<p>

There are different ways for parameter estimation beside linear regression depending on what kind of rating you have.<br>
For example using Probit or Logit is the output is not a rank but a decision (1=chose stimulus, 0 = no choice).

In [9]:
import statsmodels.api as sm

  from pandas.core import datetools


In [10]:
ConjointDummyDF.columns

Index(['Rank', '32" (81cm)', '37" (94cm)', '42" (107cm)', 'Plasma', 'LCD',
       'Silver', 'Black', 'Anthrazit'],
      dtype='object')

In [11]:
X = ConjointDummyDF[[u'32" (81cm)', u'37" (94cm)', u'42" (107cm)', u'Plasma',\
       u'LCD', u'Silver', u'Black', u'Anthrazit']]
X = sm.add_constant(X)
Y = ConjointDummyDF.Rank

In [12]:
Y

0      2
1      3
2      1
3      5
4      6
5      4
6      8
7      9
8      7
9     11
10    12
11    10
12    14
13    15
14    13
15    17
16    18
17    16
Name: Rank, dtype: int64

In [13]:
X

Unnamed: 0,const,"32"" (81cm)","37"" (94cm)","42"" (107cm)",Plasma,LCD,Silver,Black,Anthrazit
0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
1,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
2,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
3,1.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
4,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
5,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
6,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0
7,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0
8,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
9,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0


In [14]:

linearRegression = sm.OLS(Y, X). fit()
linearRegression.summary()

  "anyway, n=%i" % int(n))


0,1,2,3
Dep. Variable:,Rank,R-squared:,1.0
Model:,OLS,Adj. R-squared:,1.0
Method:,Least Squares,F-statistic:,2.376e+30
Date:,"Wed, 06 Jun 2018",Prob (F-statistic):,1.56e-179
Time:,16:03:43,Log-Likelihood:,566.43
No. Observations:,18,AIC:,-1121.0
Df Residuals:,12,BIC:,-1116.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,4.3846,6.95e-16,6.31e+15,0.000,4.385,4.385
"32"" (81cm)",-4.5385,2.14e-15,-2.12e+15,0.000,-4.538,-4.538
"37"" (94cm)",1.4615,2.14e-15,6.83e+14,0.000,1.462,1.462
"42"" (107cm)",7.4615,2.14e-15,3.48e+15,0.000,7.462,7.462
Plasma,0.6923,1.54e-15,4.48e+14,0.000,0.692,0.692
LCD,3.6923,1.54e-15,2.39e+15,0.000,3.692,3.692
Silver,1.4615,2.14e-15,6.83e+14,0.000,1.462,1.462
Black,2.4615,2.14e-15,1.15e+15,0.000,2.462,2.462
Anthrazit,0.4615,2.14e-15,2.16e+14,0.000,0.462,0.462

0,1,2,3
Omnibus:,2.105,Durbin-Watson:,0.545
Prob(Omnibus):,0.349,Jarque-Bera (JB):,1.403
Skew:,0.451,Prob(JB):,0.496
Kurtosis:,1.971,Cond. No.,9.92e+16


## Part worth values & relative importance of the stimuli
<p>
Importance of Stimuli= Max(beta) - Min(beta)
<br>
Relative Importance of Stimuli = Importance of Stim / Sum(Importance of all Stimuli)

In [28]:
rangePerFeature

[[-4.538461538461538, 1.4615384615384652, 7.4615384615384555],
 [-4.538461538461538, 1.4615384615384652, 7.4615384615384555],
 [0.6923076923076924, 3.6923076923076934]]

In [34]:
importance

[11.999999999999993, 11.999999999999993, 3.000000000000001]

In [36]:
relative_importance

[44.4, 44.4, 11.1]

In [38]:
fullNames.values()

dict_values(['Rank', '32" (81cm)', '37" (94cm)', '42" (107cm)', 'Plasma', 'LCD', 'Silver', 'Black', 'Anthrazit'])

In [39]:
tempRange

[1.4615384615384597, 2.4615384615384617, 0.46153846153846273]

In [15]:
importance = []
relative_importance = []

rangePerFeature = []

begin = "A"
tempRange = []
for stimuli in fullNames.keys():
    if stimuli[0] == begin:
        tempRange.append(linearRegression.params[fullNames[stimuli]])
    elif stimuli == "Rank":
        rangePerFeature.append(tempRange)
    else:
        rangePerFeature.append(tempRange)
        begin = stimuli[0]
        tempRange = [linearRegression.params[fullNames[stimuli]]]
        

In [41]:
linearRegression.params

const          4.384615
32" (81cm)    -4.538462
37" (94cm)     1.461538
42" (107cm)    7.461538
Plasma         0.692308
LCD            3.692308
Silver         1.461538
Black          2.461538
Anthrazit      0.461538
dtype: float64

In [16]:
for item in rangePerFeature:
    importance.append( max(item) - min(item))

for item in importance:
    relative_importance.append(100* round(item/sum(importance),3))

In [33]:
importance

[11.999999999999993, 11.999999999999993, 3.000000000000001]

In [17]:

partworths = []

item_levels = [1,3,5,8]

for i in range(1,4):
    part_worth_range = linearRegression.params[item_levels[i-1]:item_levels[i]]
    print (part_worth_range)

32" (81cm)   -4.538462
37" (94cm)    1.461538
dtype: float64
42" (107cm)    7.461538
Plasma         0.692308
dtype: float64
LCD       3.692308
Silver    1.461538
Black     2.461538
dtype: float64


In [18]:
meanRank = []
for i in ConjointDummyDF.columns[1:]:
    newmeanRank = ConjointDummyDF["Rank"].loc[ConjointDummyDF[i] == 1].mean()
    meanRank.append(newmeanRank)

    
#total Mean or, "basic utility" is used as the "zero alternative"
totalMeanRank = sum(meanRank) / len(meanRank)



partWorths = {}
for i in range(len(meanRank)):
    name = fullNames[sorted(fullNames.keys())[i]]
    partWorths[name] = meanRank[i] - totalMeanRank

In [27]:
meanRank

[3.5, 9.5, 15.5, 8.0, 11.0, 9.5, 10.5, 8.5]

In [19]:
partWorths

{'32" (81cm)': -6.0,
 '37" (94cm)': 0.0,
 '42" (107cm)': 6.0,
 'Anthrazit': -1.0,
 'Black': 1.0,
 'LCD': 1.5,
 'Plasma': -1.5,
 'Silver': 0.0}

### Summary & Results

In [20]:
print ("Relative Importance of Feature:\n\nMonitor Size:",relative_importance[0], "%",\
"\nType of Monitor:", relative_importance[1], "%", "\nColor of TV:", relative_importance[2], "%\n\n")

print ("--"*30)

print ("Importance of Feature:\n\nMonitor Size:",importance[0],\
"\nType of Monitor:", importance[1],  "\nColor of TV:", importance[2])

Relative Importance of Feature:

Monitor Size: 44.4 % 
Type of Monitor: 44.4 % 
Color of TV: 11.1 %


------------------------------------------------------------
Importance of Feature:

Monitor Size: 11.999999999999993 
Type of Monitor: 11.999999999999993 
Color of TV: 3.000000000000001


What would be the optimal product bundle? <p>
42", LCD, Black

In [20]:
#As array that looks like X
#Must include Constant!

optBundle = [1,0,0,1,0,1,0,1,0]
print ("The best possible Combination of Stimuli would have the highest rank:",\
linearRegression.predict(optBundle)[0])

The best possible Combination of Stimuli would have the highest rank: 17.999999999999993


Or using the Partworths:

In [21]:
#Optimal Bundle:
#42", LCD, Black

optimalWorth = partWorths["42\" (107cm)"] + partWorths["LCD"] + partWorths["Black"]

print ("Choosing the optimal Combination brings the user an additional ", optimalWorth, "'units' of utility")

Choosing the optimal Combination brings the user an additional  8.5 'units' of utility
