A framework to convert Wikipedia article edit histories into ROOT [1] trees for analysis.
wikiTree is designed to efficiently process and reduce large Wikipedia XML dump files into a more manageable ROOT tree format, enabling easier data analysis and manipulation.
| article | Number of Revisions | XML Dump Size (bytes) | ROOT Tree Size (bytes) | Reduction |
|---|---|---|---|---|
| Trains | 1000 | 15301346 | 1153432 | |
| Particle Physics | 1000 | 16942429 | 1287648 |
- Converts Wikipedia XML dumps [2] into ROOT trees
To use wikiTree, follow these steps: