# Tutorial 1: History of tokens

### Query parameter explanations:

 - **o_rev_id**: The ID of the revision where the token was added originally in the article.
 - **editor**: The user ID of the editor. User IDs are integers, are unique for the whole Wikipedia and can be used to fetch the current name of a user. 
 - **token**: Actual token value as string.
 - **token_id**: The token ID assigned internally by the WikiWho algorithm, unique per article.
 - **in**: If empty, the token has never been reintroduced after deletion. Else, revisions where the token was *REinserted* after being deleted previously, ordered sequentially by time.
 - **out**: If empty, the token has never been deleted. Else, revisions in which the token was *deleted*, ordered sequentially by time.


In [1]:
from wikiwho_wrapper import WikiWhoAPI, APIQuerier

api = WikiWhoAPI()

querier = APIQuerier(api)

df = querier.all_content(article="bioglass") 
# df = querier.all_content(2161298)
# df = querier.all_content(article="evolution") # takes a minute or so


# Examples


### Case of multiple editions
The token "sida" originally inserted in revision '189370281',deleted in '189370332', reinserted in '189371159', deleted again in '189371182', reinserted again in '189537330', and finaly deleted in '191585577'.

In [5]:
df[df['token_id'] == 378]

Unnamed: 0,article_title,page_id,o_rev_id,o_editor,token,token_id,in,out
381,Bioglass,2161298,189370281,0|129.31.242.26,sida,378,-1,189370332
382,Bioglass,2161298,189370281,0|129.31.242.26,sida,378,189371159,189371182
383,Bioglass,2161298,189370281,0|129.31.242.26,sida,378,189537330,191585577


### Case of token that reinserted
The token 'bioglass' originally inserted in revision '18064039', deleted in '758323388', and reinserted in '758323485'.

In [16]:
df[df['token_id'] == 0]

Unnamed: 0,article_title,page_id,o_rev_id,o_editor,token,token_id,in,out
0,Bioglass,2161298,18064039,0|81.172.143.232,bioglass,0,-1,758323388
1,Bioglass,2161298,18064039,0|81.172.143.232,bioglass,0,758323485,-1


### Case of token that inserted and don't deleted 
The token 'is' inserted in revision '18064039' and remained.

In [18]:
df[df['token_id'] == 2]

Unnamed: 0,article_title,page_id,o_rev_id,o_editor,token,token_id,in,out
3,Bioglass,2161298,18064039,0|81.172.143.232,is,2,-1,-1


### Tokens that exist in the page

In [19]:
df[df['out'] == -1]

Unnamed: 0,article_title,page_id,o_rev_id,o_editor,token,token_id,in,out
1,Bioglass,2161298,18064039,0|81.172.143.232,bioglass,0,758323485,-1
3,Bioglass,2161298,18064039,0|81.172.143.232,is,2,-1,-1
4,Bioglass,2161298,18064039,0|81.172.143.232,a,3,-1,-1
7,Bioglass,2161298,18064039,0|81.172.143.232,of,6,-1,-1
90,Bioglass,2161298,18834333,82590,[[,87,-1,-1
94,Bioglass,2161298,18834333,82590,]],91,-1,-1
173,Bioglass,2161298,78391371,527862,{{,170,-1,-1
178,Bioglass,2161298,78391371,527862,}},175,-1,-1
182,Bioglass,2161298,79583319,1623918,[[,179,-1,-1
183,Bioglass,2161298,79583319,1623918,category,180,-1,-1


### Tokens that have been inserted

In [20]:
df[df['in'] == -1]

Unnamed: 0,article_title,page_id,o_rev_id,o_editor,token,token_id,in,out
0,Bioglass,2161298,18064039,0|81.172.143.232,bioglass,0,-1,758323388
2,Bioglass,2161298,18064039,0|81.172.143.232,®,1,-1,207995408
3,Bioglass,2161298,18064039,0|81.172.143.232,is,2,-1,-1
4,Bioglass,2161298,18064039,0|81.172.143.232,a,3,-1,-1
5,Bioglass,2161298,18064039,0|81.172.143.232,commerical,4,-1,18704296
6,Bioglass,2161298,18064039,0|81.172.143.232,product,5,-1,18907606
7,Bioglass,2161298,18064039,0|81.172.143.232,of,6,-1,-1
8,Bioglass,2161298,18064039,0|81.172.143.232,bioactive,7,-1,779393082
9,Bioglass,2161298,18064039,0|81.172.143.232,glasses,8,-1,18834333
10,Bioglass,2161298,18064039,0|81.172.143.232,.,9,-1,363575227


### Tokens that reinserted

In [14]:
df[(df['out'] == -1) & (df['in'] != -1)]

Unnamed: 0,article_title,page_id,o_rev_id,o_editor,token,token_id,in,out
1,Bioglass,2161298,18064039,0|81.172.143.232,bioglass,0,758323485,-1
444,Bioglass,2161298,191687756,2609833,{{,430,503771225,-1
446,Bioglass,2161298,191687756,2609833,glass,431,503771225,-1
448,Bioglass,2161298,191687756,2609833,science,432,503771225,-1
450,Bioglass,2161298,191687756,2609833,}},433,503771225,-1
1314,Bioglass,2161298,363578879,294325,it,1296,779396265,-1
1316,Bioglass,2161298,363578879,294325,can,1297,779396265,-1
1318,Bioglass,2161298,363578879,294325,be,1298,779396265,-1
1320,Bioglass,2161298,363578879,294325,[[,1299,779396265,-1
1322,Bioglass,2161298,363578879,294325,machined,1300,779396265,-1


### Number of times tokens have been inserted

In [28]:
sizes = df.groupby('token_id').size()
dfs = sizes.to_frame().sort_values(0)
dfs

Unnamed: 0_level_0,0
token_id,Unnamed: 1_level_1
5504,1
7335,1
7336,1
7337,1
7338,1
7339,1
7340,1
7341,1
7342,1
7343,1
