Skip to content


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Wikipedia trending topic detection

Python topic detection module for SparkWiki. The module computes statistics, clustering and assigns topics to clusters of trending Wikipedia pages, extracted using the Anomaly Detection Algorithm. Topic classification model is available here. The module works with all language editions of Wikipedia.


  • Compute degree, betweeness centrality and modularity for clustering the graph by events
  • Match wikipages with their Qids (unique Wikipedia ID)
  • Match wikipages with their corresponding topics
  • Match wikipages with their pageviews
  • Save a new corresponding graph with these attributes
  • Give a graphical topics partition of each cluster


Python libraries
  • numpy, matplotlib, pandas, networkx, requests
  • community
    $ pip install python-louvain
  • googletrans (Optional)
    $ pip install googletrans
Wikipedia graph

Get the graph from SparkWiki projet using PeakFinder module.

Put the graph file into a local folder Python/Results/<Language>/<Language>_<date_start>_<date_end>.

Language: EN, FR, RU, etc.

Date format: YYYYMMDD

Graph file name format: peaks_graph_<date_start>_<date_end>.gexf

Example: Python/Results/EN/EN_20200316_20200331/peaks_graph_20200316_20200331.gexf


To compute the whole pipeline from a graph with the name and folder path in the correct format (cf. Pre-requisites), run the following command in the terminal:

$ python EN 20200316 20200331

The pipeline can also be computed partially. To do that, specify the optional parameter from 1 to 7 to run only a part of the pipeline corresponding to the features described in the table below:

$ python EN 20200316 20200331 1
Parameter value Description
0 Default
1 Compute degree, betweeness centrality and modularity
2 Match Qids
3 Match topics
4 Match pageviews
5 Save graph attributes
6 Give topics repartition per cluster
7 Translate labels into English

Alternatively, one can run the Topics_exctraction.ipynb notebook. The notebook also includes the code generating visualisations.


Every stage of the pipeline generates and saves a .csv file with corresponding results.

The final step creates /Figures folder with figures of the topics partition per cluster.

Also, the final stage creates a graph file with all the computed attributes: filled_graph.gexf

In order to explore the detected topics, the graph can be visualized in Gephi. We used Circle Pack Layout with modularity class as a partitioning attribute.


Wikipedia graphs of trending pages are available in Python/Result for 16/08/2018 to 31/12/2018 and 17/12/2019 to 15/04/2020 periods for DE, EN, ES, FR, IT, RU, ZH languages.

The notebook Topic_comparison.ipynb gives a topic comparaison between EN, FR, RU languages. The figures are saved in Python/Comparison_figures.

Gephi files representing the graphs are also located in /Gephi folder.


Here you can see a visual example. The animation shows trending topics for the last four months of 2018. The graph visualization illustrates the graph computed for the period 1-15 March 2020.

Topics comparaison Topics comparaison Gephi graph (EN_20200301_20200315) Gephi graph example (EN_20200301_20200315)


Wikipedia trending topics detection: SparkWiki

Clustering of trending pages: Community detection

Topic classification model: Language-Agnostic Topic Classification

Labels translation: Googletrans


No description, website, or topics provided.






No releases published


No packages published