# Topics over Time (ToT)

#### Author information

- **Name:** Jaeseong Choe

- **email address:** cjssoote@gmail.com

- **GitHub:** https://github.com/sorrychoe

- **Linkedin:** https://www.linkedin.com/in/jaeseong-choe-048639250/

- **Personal Webpage:** https://jaeseongchoe.vercel.app/

## Part 1. Brief background of methodology

### Overview

- **Topics over Time (ToT) is an extension of LDA that incorporates time as an additional observed variable, modeling how the relevance of topics changes over time.**

### Situation Before ToT

- While LDA can model topics, they do not explicitly model the relationship between the occurrence of topics and time.

### Why ToT Was Introduced

- ToT was introduced from the paper "Topics over Time: A Non-Markov Continuous-Time Model of Topical
Trends." of Wang, X., & McCallum, A. (2006).

- ToT adds time as an observed variable and models the dependency between topic relevance and time.

### Use Cases

- ToT can be used in studying historical trends, such as tracking the popularity of certain subjects over decades.

## Part 2. Key concept of methodology

### Key Concept

- ToT adds time as an observed variable and incorporates it into the generative process of topic modeling.
  

### Generative Process

ToT generates both the words and timestamps for each document, modeling time as a continuous variable. The generative process is:

1. **Topic Distribution for Document**:
   - For each document $d$, draw a topic distribution $\theta_d$ from a Dirichlet:
   $$
   \theta_d | \alpha \sim \text{Dirichlet}(\alpha)
   $$

2. **Topic-Specific Word Generation**:
   - For each word $w_{di}$ in document $d$:
     - Draw a topic $z_{di}$ from $\theta_d$:
     $$
     z_{di} | \theta_d \sim \text{Multinomial}(\theta_d)
     $$
     - Draw a word $w_{di}$ from the topic-specific distribution $\phi_{z_{di}}$:
     $$
     w_{di} | z_{di}, \phi \sim \text{Multinomial}(\phi_{z_{di}})
     $$

3. **Topic-Specific Timestamp Generation**:
   - Draw a timestamp $t_{di}$ for the word from the topic's Beta distribution:
   $$
   t_{di} | z_{di}, \psi \sim \text{Beta}(\psi_{z_{di}})
   $$

![ToT_Graphic](./img/ToT_Graphic.png)

### Mathematical Representation

- **Word Distribution**: 
  Each word is generated from a multinomial distribution parameterized by $\phi_z$ for topic $z$:
  $$
  p(w | z, \phi_z) = \prod_{i=1}^V \phi_{zi}^{w_i}
  $$
  where $\phi_z$ is the multinomial distribution over words for topic $z$.

- **Timestamp Distribution**:
  The timestamp is modeled using a Beta distribution for each topic:
  $$
  p(t | z, \psi_z) = \frac{t^{\psi_{z1} - 1} (1 - t)^{\psi_{z2} - 1}}{B(\psi_{z1}, \psi_{z2})}
  $$
  where $B(\psi_{z1}, \psi_{z2})$ is the Beta function, and $\psi_z = (\psi_{z1}, \psi_{z2})$ parameterizes the Beta distribution.

### Inference

Inference in ToT uses **Gibbs sampling** for approximate posterior inference.

1. **Conditional Probability**:
   The conditional distribution for $z_{di}$ given words and timestamps is:
   $$
   P(z_{di} | w, t, z_{-di}, \alpha, \beta, \Psi) \propto (m_{d z_{di}} + \alpha_{z_{di}} - 1) \cdot \frac{n_{z_{di} w_{di}} + \beta_{w_{di}} - 1}{\sum_{v=1}^{V} (n_{z_{di} v} + \beta_v - 1)} \cdot p(t_{di} | \psi_{z_{di}})
   $$
   where $m_{d z_{di}}$ is the number of words in document $d$ assigned to topic $z_{di}$, and $n_{z_{di} w_{di}}$ is the number of words $w_{di}$ assigned to topic $z_{di}$.

2. **Beta Distribution Parameters**:
   The Beta distribution parameters are estimated via the method of moments:
   $$
   \psi_{z1} = t_z \left( \frac{t_z (1 - t_z)}{s_z^2} - 1 \right)
   $$
   $$
   \psi_{z2} = (1 - t_z) \left( \frac{t_z (1 - t_z)}{s_z^2} - 1 \right)
   $$
   where $t_z$ and $s_z^2$ are the sample mean and variance of timestamps for topic $z$.

### Strength

- ToT captures temporal trends in topic popularity.

### Weakness
- Unfortunately, neither Python nor R exists in libraries that reproduce the ToT model as the formula implemented in the paper.