Skip to content
A PoC for a fractional attribution model
Branch: master
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
.idea added function to strip special characters from channel names Apr 20, 2019

Markov Chains for Attribution Modeling

This is a proof-of-concept I built out that leverages a first order Markov chain to reallocate conversions in the manner explained by Anderl, Eva and Becker, Ingo and Wangenheim, Florian V. and Schumann, Jan Hendrik in " Mapping the Customer Journey: A Graph-Based Framework for Online Attribution Modeling"

If this concept is new, check out the post on markov chain attribution modeling.

In this package we run simulations of user journeys based on the values of the transition matrix to derive the removal effect conversion rate. The higher the number of simulations the longer it takes since it needs to be run for each channel or tactic that you pass to the model. Higher simulation sizes are required, however, the larger your data get and the more potential transition states exist between the different channels. I have found that for a month or so of real website data, niter=5000 tends to give reasonable results, however it does take a while to run through these (about a minute or two).


There is an amazing R package called ChannelAttribution which does this as well as higher-order models very well. My day-to-day workflow is centered around Python so I wanted to build out a version to A) have something I can go-to for connecting directly to SQL tables to path journeys and B) better understand the process by which these attribution models are generated. ChannelAttribution makes it very easy to faceroll a fractional attribution model without really understanding what's going on. Which is great! But if you want to understand better, building your own tends to help.

To get started quickly you can install via pip.


pip install markov-model-attribution


  • This package currently accepts a single-column Pandas dataframe.
  • Each path should begin with "start" and end with either "conv" or "null".
  • Each path should be delimited by " > "

The arguments to pass are niter and paths, where the former is an integer representing the number of simulations to run and paths is the Pandas dataframe containing your paths.

import markov_model_attribution as mma
import pandas as pd

# generate a sample dataset
df = pd.DataFrame({'Paths':['start > rem > rem > conv',
                           'start > pro > sem > conv',
                           'start > pro > null',
                           'start > sem > conv',
                           'start > pro > pro > sem > rem > conv',
                           'start > pro > pro > null',
                           'start > aff > rem > conv',
                           'start > pro > pro > null',
                           'start > sem > sem > null']})

model = mma.run_model(niter=500, paths=df)

Once you have the model constructed you can access a couple of things to compare how a fractional model does against a standard last touch model.

You can access these via


# This outputs a dictionary containing the markov conversion count
# {'pro': 1, 'rem': 1, 'sem': 2}

# This outputs the last touch conversions for comparison
# {'pro': 0, 'rem': 2, 'sem': 2}

You can also access the removal effect matrix of the underlying result.


#        paths      prob  minus_pro  minus_rem  minus_sem
# 0   pro>null  0.333333   0.000000   0.333333   0.333333
# 1    pro>pro  0.333333   0.000000   0.333333   0.333333
# 2    pro>sem  0.333333   0.000000   0.333333   0.333333
# 3   rem>conv  0.666667   0.666667   0.000000   0.666667
# 4    rem>rem  0.333333   0.333333   0.000000   0.333333
# 5   sem>conv  0.666667   0.666667   0.666667   0.000000
# 6    sem>rem  0.333333   0.333333   0.333333   0.000000
# 7  start>pro  0.666667   0.666667   0.666667   0.666667
# 8  start>rem  0.166667   0.166667   0.166667   0.166667
# 9  start>sem  0.166667   0.166667   0.166667   0.166667
You can’t perform that action at this time.