# Hierarchies in BigARTM

## 1. Method explaination

### Usual ARTM model
__Data:__ documents set $D$, words set $W$, document-word matrix $\{n_{dw}\}_{D \times W}$. 

__Model:__ Denote $p(w|d) = \frac{n_{dw}}{\sum_w n_{dw}}$, $T$ is a topics set. The topic model is
$$ p(w|d) = \sum_{t \in T} p(w|t) p(t|d) = \sum_{t \in T} \phi_{wt} \theta_{td}, \hspace{3cm} (1) $$
with parameters

* $\Phi = \{\phi_{wt}\}_{W \times T}$
* $\Theta = \{ \theta_{td}\}_{T \times D}$

__Parameter learning:__ regularizer maximum likelihood maximization
$$ \sum_d \sum_w n_{dw} \ln \sum_t \phi_{wt} \theta_{td} + \sum_i \tau_i R_i(\Phi, \Theta) \rightarrow max_{\Phi, \Theta} $$
where regularizers $R(\Phi, \Theta) = \sum_i \tau_i R_i(\Phi, \Theta)$ allows introducing additional subject-specific criterias, $\tau_i$ are regularizers' coefficients.

### How hierarchy is constructed from several usual models
#### Hierarchy definition:
* __Topic hierarchy__ is an oriented multipartite (multilevel) graph of topics so that edges connect only topics from neighboring levels. 
* Zero level consists of the only node called __root__.
* Each none-zero level has more topics than previous one. Previous level is called __parent level__.
* If there is edge topic-subtopic in hierarchy, topic is also called __parent topic__ or __ancestor__. 

#### Hierarchy construction:
* Root is associated with the whole collection and doesn't need modeling.
* _Every non-zero level is a usual topic model._
* First level has few topics that are main collection topics. First level topics have the only parent topic (root). 
* For each level with index > 1 we need to to establish parent-children relationship with previous level topics.

### Establishing parent-children relations
When we built parent level, let's denote its topics $a \in A$ (ancestor) and matrices $\Phi^p$ and $\theta^p$.

Let's introduce new matrix factorization problem:
    $$ \phi^p_{wa} = p(w|a) \approx \sum_{t} p(w|t) p(t|a) = \sum_t \phi_{wt} \psi_{ta}$$    
    
with new parameters $\Psi = \{ \psi_{ta} \}_{T \times A}$.

If KL-divergence is a similarity measure between distributions, then we can create regularizer:
   $$ R(\Phi, \Psi) = \sum_w \sum_a \phi_{wa} \ln \sum_t \phi_{wt} \psi_{ta} \rightarrow max_{\Phi, \Psi}  $$.

   $$ \sum_d \sum_w n_{dw} \ln \sum_t \phi_{wt} \theta_{td} + \tau  R(\Phi, \Psi) \rightarrow max_{\Phi, \Psi, \Theta}  $$
Both likelihood and regularizer formulas have common structure. So there is a simple way to train $\Psi$ simultaneously with $\Phi$ and $\Theta$:

_we just add $|A|$ pseudodocuments to collection, each representing parent $\Phi$ column: $n_{aw} = \tau p(w|a)$._

## 2. BigARTM implementation

Hierarchy in BigARTM is implemented in hierarchy_utils module. To build hierarchy, create hARTM instance:

In [None]:
from hierarchy_utils import hARTM

In [None]:
hier = hARTM(self, num_processors=None, class_ids=None,
             scores=None, regularizers=None, num_document_passes=10, reuse_theta=False,
             dictionary=None, cache_theta=False, theta_columns_naming='id', seed=-1)

You should pass to hARTM parameters that are common for all levels. These are the same parameters that you pass to usual ARTM model.

Levels will built one by one. To add first level, use add_level method specifying remaining model parameters (unique for the level):

In [None]:
level0 = hier.add_level(self, num_topics=None, topic_names=None)

This method returns ARTM object so you can work with it as you used: initialize it, fit offline, add regularizer ans scores etc. For example:

In [None]:
batch_vectorizer = artm.BatchVectorizer(data_path="./my_batches", data_format='batches')
dictionary = artm.Dictionary('dictionary')
dictionary.gather(batch_vectorizer.data_path)
level0.initialize(dictionary=dictionary)
level0.fit_offline(batch_vectorizer, num_collection_passes=20)

When first level is fit, you have to add next level:

In [None]:
level1 = hier.add_level(self, num_topics=None, topic_names=None, 
                        parent_level_weight=1, tmp_files_path="")

When you add this level, parent levels phi matrix will be saved into special, parent level batch.
It is the way how pseudoduments are created.
This created batch will be added to other batches when you fit model.
Explaination of add_level parameters:
* parent_level_weight is regularizer's coefficient $\tau$. Token_values in parent level batch will be multiplied by parent_level_weight during learning.
* tmp_files_path is a path where model can save this parent level batch.

These two parameters are ignored during creation of first level.

Now you can learn level1 model by any means. For example:

In [None]:
level1.initialize(dictionary=dictionary)
level1.fit_offline(batch_vectorizer, num_collection_passes=20)

The part of $\Theta$ matrix corresponding to parent level batch is $\Psi$ matrix. To get it, use get_psi method:

In [None]:
psi = level1.get_psi()

Note than level0 has no get_psi method.

You can get levels specifying level_index (from 0 as usually in python so first level has index 0):

In [None]:
some_level = hier.get_level(level_index)

To delete level, use

In [None]:
hier.del_level(level_index)

__Be careful:__ if you delete not the last level, all next levels will be deleted too.

To save hierarchy when it is built use save method:

In [None]:
hier.save(path, model_name="p_wt")

Here path is a path where you want to save hierarchy's files, model_name specifies what matrix to save (as in ARTM.save method).

To load hierarchy, use

In [None]:
hier = hARTM(self, num_processors=None, class_ids=None,
             scores=None, regularizers=None, num_document_passes=10, reuse_theta=False,
             dictionary=None, cache_theta=False, theta_columns_naming='id', seed=-1)
hier.load(path)