# Cegpy 
### Example 5: Customisation of Agglomarative Heirachical Clustering (AHC) algorithm parameters

 
Firstly, let's look at the default behaviour. To do that, we will need to create a staged tree object and run the AHC algorithm on it.

In [1]:
from cegpy import StagedTree
import pandas as pd
%pip install openpyxl

dataframe = pd.read_excel("falls.xlsx")
staged_tree = StagedTree(dataframe)

staged_tree.calculate_AHC_transitions()

Collecting openpyxl
  Downloading openpyxl-3.1.2-py2.py3-none-any.whl (249 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m250.0/250.0 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting et-xmlfile
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.1.2
Note: you may need to restart the kernel to use updated packages.


{'Merged Situations': [('s9', 's5'),
  ('s7', 's11', 's13', 's22'),
  ('s25', 's8', 's16', 's12'),
  ('s24', 's23', 's15', 's14'),
  ('s6', 's10'),
  ('s26', 's17'),
  ('s0',),
  ('s1',),
  ('s2',),
  ('s3',),
  ('s4',)],
 'Log Likelihood': -68671.59685606208}

Then we can access the default settings that were used by the AHC algorithm above. Note that the default settings are fixed by the structure of the tree, the edge data does not affect them.

The hyperstage is a list of lists whose elements are nodes from the tree, such that only nodes that are in the same sublist can be considered for merging in the AHC. The sublists do not have to be mutually exclusive.

In [2]:
# As no hyperstage was given, this is the default hyperstage
staged_tree.hyperstage

[['s0'],
 ['s1', 's2', 's3', 's4'],
 ['s5', 's9'],
 ['s6', 's10'],
 ['s7',
  's8',
  's11',
  's12',
  's13',
  's14',
  's15',
  's16',
  's17',
  's22',
  's23',
  's24',
  's25',
  's26']]

Similarly, we can look at the default alpha and the default prior.

Alpha is the imaginary sample size that is set at the root and spread uniformly through the tree. At each node, the alpha is split evenly over its children, and so on until the leaves are reached. This determines the prior for each node. The class determines what the default alpha should be, which is calculated as the maximum number of children for any of the situation nodes in the tree.

In [3]:
staged_tree.alpha

4

So, with an alpha of 4, the following edge priors are produced (where the elements in the dictionary returned are `Edge: Prior`):

In [4]:
staged_tree.prior

{('s0', 's1', 'Communal Assessed'): Fraction(1, 1),
 ('s0', 's2', 'Communal Not Assessed'): Fraction(1, 1),
 ('s0', 's3', 'Community Assessed'): Fraction(1, 1),
 ('s0', 's4', 'Community Not Assessed'): Fraction(1, 1),
 ('s1', 's5', 'High Risk'): Fraction(1, 2),
 ('s1', 's6', 'Low Risk'): Fraction(1, 2),
 ('s2', 's7', 'High Risk'): Fraction(1, 2),
 ('s2', 's8', 'Low Risk'): Fraction(1, 2),
 ('s3', 's9', 'High Risk'): Fraction(1, 2),
 ('s3', 's10', 'Low Risk'): Fraction(1, 2),
 ('s4', 's11', 'High Risk'): Fraction(1, 2),
 ('s4', 's12', 'Low Risk'): Fraction(1, 2),
 ('s5', 's13', 'Not Referred & Not Treated'): Fraction(1, 6),
 ('s5', 's14', 'Not Referred & Treated'): Fraction(1, 6),
 ('s5', 's15', 'Referred & Treated'): Fraction(1, 6),
 ('s6', 's16', 'Not Referred & Not Treated'): Fraction(1, 4),
 ('s6', 's17', 'Not Referred & Treated'): Fraction(1, 4),
 ('s7', 's18', "Don't Fall"): Fraction(1, 4),
 ('s7', 's19', 'Fall'): Fraction(1, 4),
 ('s8', 's20', "Don't Fall"): Fraction(1, 4),
 ('s8

However, if you want to set the priors manually, this is not the format of the input param. The prior param is a list of lists containing each of the edge priors in alphabetical order of the edge labels. This can also be returned for the default:

In [5]:
staged_tree.prior_list

[[Fraction(1, 1), Fraction(1, 1), Fraction(1, 1), Fraction(1, 1)],
 [Fraction(1, 2), Fraction(1, 2)],
 [Fraction(1, 2), Fraction(1, 2)],
 [Fraction(1, 2), Fraction(1, 2)],
 [Fraction(1, 2), Fraction(1, 2)],
 [Fraction(1, 6), Fraction(1, 6), Fraction(1, 6)],
 [Fraction(1, 4), Fraction(1, 4)],
 [Fraction(1, 4), Fraction(1, 4)],
 [Fraction(1, 4), Fraction(1, 4)],
 [Fraction(1, 6), Fraction(1, 6), Fraction(1, 6)],
 [Fraction(1, 4), Fraction(1, 4)],
 [Fraction(1, 4), Fraction(1, 4)],
 [Fraction(1, 4), Fraction(1, 4)],
 [Fraction(1, 12), Fraction(1, 12)],
 [Fraction(1, 12), Fraction(1, 12)],
 [Fraction(1, 12), Fraction(1, 12)],
 [Fraction(1, 8), Fraction(1, 8)],
 [Fraction(1, 8), Fraction(1, 8)],
 [Fraction(1, 12), Fraction(1, 12)],
 [Fraction(1, 12), Fraction(1, 12)],
 [Fraction(1, 12), Fraction(1, 12)],
 [Fraction(1, 8), Fraction(1, 8)],
 [Fraction(1, 8), Fraction(1, 8)]]

### Setting Alpha

When setting alpha, the priors are automatically calculated by the object on initialisation, like so:

In [6]:
# Assume 50 people are starting at the root
staged_tree.calculate_AHC_transitions(alpha=50)
staged_tree.prior

{('s0', 's1', 'Communal Assessed'): Fraction(25, 2),
 ('s0', 's2', 'Communal Not Assessed'): Fraction(25, 2),
 ('s0', 's3', 'Community Assessed'): Fraction(25, 2),
 ('s0', 's4', 'Community Not Assessed'): Fraction(25, 2),
 ('s1', 's5', 'High Risk'): Fraction(25, 4),
 ('s1', 's6', 'Low Risk'): Fraction(25, 4),
 ('s2', 's7', 'High Risk'): Fraction(25, 4),
 ('s2', 's8', 'Low Risk'): Fraction(25, 4),
 ('s3', 's9', 'High Risk'): Fraction(25, 4),
 ('s3', 's10', 'Low Risk'): Fraction(25, 4),
 ('s4', 's11', 'High Risk'): Fraction(25, 4),
 ('s4', 's12', 'Low Risk'): Fraction(25, 4),
 ('s5', 's13', 'Not Referred & Not Treated'): Fraction(25, 12),
 ('s5', 's14', 'Not Referred & Treated'): Fraction(25, 12),
 ('s5', 's15', 'Referred & Treated'): Fraction(25, 12),
 ('s6', 's16', 'Not Referred & Not Treated'): Fraction(25, 8),
 ('s6', 's17', 'Not Referred & Treated'): Fraction(25, 8),
 ('s7', 's18', "Don't Fall"): Fraction(25, 8),
 ('s7', 's19', 'Fall'): Fraction(25, 8),
 ('s8', 's20', "Don't Fall"):

#### Setting prior along with alpha
When setting alpha, if you also try to set the prior param, it will be ignored and a warning is logged.

In [7]:
# Prior param is ignored, and alpha sets the priors.
staged_tree.calculate_AHC_transitions(prior=staged_tree.prior_list, alpha=50)
staged_tree.prior

{('s0', 's1', 'Communal Assessed'): Fraction(25, 2),
 ('s0', 's2', 'Communal Not Assessed'): Fraction(25, 2),
 ('s0', 's3', 'Community Assessed'): Fraction(25, 2),
 ('s0', 's4', 'Community Not Assessed'): Fraction(25, 2),
 ('s1', 's5', 'High Risk'): Fraction(25, 4),
 ('s1', 's6', 'Low Risk'): Fraction(25, 4),
 ('s2', 's7', 'High Risk'): Fraction(25, 4),
 ('s2', 's8', 'Low Risk'): Fraction(25, 4),
 ('s3', 's9', 'High Risk'): Fraction(25, 4),
 ('s3', 's10', 'Low Risk'): Fraction(25, 4),
 ('s4', 's11', 'High Risk'): Fraction(25, 4),
 ('s4', 's12', 'Low Risk'): Fraction(25, 4),
 ('s5', 's13', 'Not Referred & Not Treated'): Fraction(25, 12),
 ('s5', 's14', 'Not Referred & Treated'): Fraction(25, 12),
 ('s5', 's15', 'Referred & Treated'): Fraction(25, 12),
 ('s6', 's16', 'Not Referred & Not Treated'): Fraction(25, 8),
 ('s6', 's17', 'Not Referred & Treated'): Fraction(25, 8),
 ('s7', 's18', "Don't Fall"): Fraction(25, 8),
 ('s7', 's19', 'Fall'): Fraction(25, 8),
 ('s8', 's20', "Don't Fall"):

### Setting Prior

You would set prior instead of alpha if you do not want uniform prior split across the tree. 

The easiest way to do this is to pull out the default `staged_tree.prior_list` produced when initialising the StagedTree object, set the values as you see fit. By default, all dirichlet hyperparamers for the priors are stored in the package as `Fraction` objects for accuracy, however this is not required when inputting the prior values.

In [8]:
from fractions import Fraction
custom_prior = [
    [Fraction(20, 5), Fraction(1, 1), Fraction(1, 1), Fraction(1, 20)],
    [Fraction(1, 2), Fraction(1, 2)],
    [Fraction(1, 2), Fraction(1, 2)],
    [Fraction(1, 2), Fraction(1, 2)],
    [Fraction(1, 2), Fraction(1, 2)],
    [Fraction(1, 6), Fraction(1, 6), Fraction(1, 6)],
    [Fraction(1, 4), Fraction(1, 4)],
    [Fraction(1, 4), Fraction(1, 4)],
    [Fraction(1, 4), Fraction(1, 4)],
    [Fraction(1, 6), Fraction(1, 6), Fraction(1, 6)],
    [Fraction(1, 4), Fraction(1, 4)],
    [Fraction(1, 4), 4],
    [Fraction(1, 4), Fraction(1, 4)],
    [Fraction(1, 12), Fraction(1, 12)],
    [Fraction(1, 12), Fraction(1, 12)],
    [Fraction(1, 12), Fraction(1, 12)],
    [Fraction(1, 8), Fraction(1, 8)],
    [Fraction(1, 8), Fraction(1, 8)],
    [Fraction(1, 12), Fraction(1, 12)],
    [Fraction(1, 12), Fraction(1, 12)],
    [Fraction(1, 12), Fraction(1, 12)],
    [Fraction(1, 8), Fraction(1, 8)],
    [Fraction(1, 8), Fraction(1, 8)],
]

staged_tree.calculate_AHC_transitions(prior=custom_prior)


{'Merged Situations': [('s9', 's5'),
  ('s7', 's11', 's13', 's22'),
  ('s25', 's8', 's16', 's12'),
  ('s24', 's23', 's15', 's14'),
  ('s6', 's10'),
  ('s26', 's17'),
  ('s0',),
  ('s1',),
  ('s2',),
  ('s3',),
  ('s4',)],
 'Log Likelihood': -68671.89442818423}

### Setting a custom hyperstage

When setting a custom hyperstage, everything in the same sublist must have the same number of outgoing edges.

For this example, we will take the default hyperstage and split a sublist into two smaller sublists.

In [9]:
custom_hyperstage = [
    ['s0'],
    ['s1', 's2', 's3', 's4'],
    ['s5', 's9'],
    ['s6', 's10'],
    ['s7', 's8', 's11', 's12', 's13', 's14', 's15'], 
    ['s16', 's17', 's22', 's23', 's24', 's25', 's26'],
]
staged_tree.calculate_AHC_transitions(hyperstage=custom_hyperstage)

{'Merged Situations': [('s9', 's5'),
  ('s24', 's23'),
  ('s8', 's12'),
  ('s13', 's7', 's11'),
  ('s14', 's15'),
  ('s6', 's10'),
  ('s25', 's26', 's16', 's17'),
  ('s0',),
  ('s1',),
  ('s2',),
  ('s3',),
  ('s4',),
  ('s22',)],
 'Log Likelihood': -68684.28843458164}