# Tutorial 3: History of tokens (all_content)

Wikiwho allows to track the whole history of all tokens ever written in Wikipedia. In general, one can find all the revisions that inserted or deleted a token. This means that one can track the evolution of the token, e.g. when was the first time was inserted (its revision and its editor, aka author), or how many times the token has been reinserted or removed.

The basic conceps of how to interpret the data are given in this section.

# 1. Querying all content

To query all the history of all tokens in a revision, one can use the `all_content()` method as follows:


In [None]:
from wikiwho_wrapper import WikiWho
ww = WikiWho(lng='en')
df = ww.dv.all_content(article="bioglass")
df.head(5)

Here is the explanation of the columns of the above dataframe:

 - **article_title**: the name of the title
 - **page_id**: the id of the title
 - **o_rev_id**: The ID of the revision where the token was added for the first time in the article.
 - **o_editor**: The user ID of the editor that added the token for the first time (directly related to o_rev_id). User IDs are integers, are unique for the whole Wikipedia and can be used to fetch the current name of a user. 
 - **token**: Actual token value as string. A token can appear multiple times, i.e. it is not unique. This is because a word can be repeated in the document.
 - **token_id**: The token ID assigned internally by the WikiWho algorithm. The token_id describe a token uniquely because it takes into consideration the context in which the token appear (e.g. paragraph, sentence, position in the sentence). 
 - **in**: revision in which the token has been reinserted
 - **out**: revision in which the token has been removed



# 2. Use cases for in and out columns

The `in` and `out` columns are useful to extract information regarding the history of a token. Here are some use cases:

## 2.1 Multiple editions

The token `'sida'` (`token_id=378`) originally inserted in revision `189370281` (the `-1` in column `in` means that it was the original insertion which is indicated in column `o_rev_id` ), deleted in `189370332`, reinserted in `189371159`, deleted again in `189371182`, reinserted again in `189537330`, and finaly deleted in `191585577`.

In [None]:
df[df['token_id'] == 378]

Notice that we have used the token_id as opposed to the string of the token (i.e. `'sida'`). One can do that, but one should considered that there might be different tokens with with same name, e.g. `'bioglass'` would out the history of several token ids

In [None]:
df[df['token'] == 'bioglass'].head(10)

## 2.2 Reinserted token
The token `bioglass` originally inserted in revision `18064039`, deleted in `758323388`, and reinserted in `758323485`.

In [None]:
df[df['token_id'] == 0]

## 2.3 Never deleted token
The token `is` inserted in revision `18064039` and never taken out.

In [None]:
df[df['token_id'] == 2]

# 3. Basic filters on the dataframe

A `-1`  in the `in` column means that there was no reinsertion, i.e. it was a first-time insertion (`o_rev_id`).

A `-1` in the `out` column means that the token was not taken out for the corresponding `o_rev_id` or `in` first-time insertion or reinsertion.

This can be used to create useful filters:

## 3.1 (First-time) token insertions

In [None]:
df[df['in'] == -1].head()

## 3.2 Token reinsertions

In [None]:
df[df['in'] != -1].head()

## 3.3 Token Deletions

In [None]:
df[df['out'] != -1].head()

## 3.4 Tokens that still exists in the current revision

In [None]:
df[df['out'] == -1].head()

## 4. Number of times that each token has been inserted or reinserted

Another example is checking the most conflicted tokens, by couted how many times they have been instered or resinserted.

In [None]:
sizes = df.groupby(['token_id']).size().sort_values(ascending=False)
sizes.head(10)

In [None]:
from utils.notebooks import get_next_notebook
from IPython.display import HTML
try:
    display(HTML(f'<a href="{get_next_notebook()}" target="_blank">Go to next workbook</a>'))
except:
    HTML('<a href="4. Token history of specific revision.ipynb" target="_blank">Go to next workbook</a>')