Skip to content

sdivad/WikiTalkParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WikiTalkParser

WikiTalkParser is a library for extracting and parsing Wikipedia talk pages, identifying comments with their signature, date and indentation in the thread structure. In the current version, talk pages are extracted from the WIkipedia API, given in input a list of articles. Only the English language version is supported.

Language

Tested with Python 2.7

Authors

David Laniado and Riccardo Tasso

Limitations/TODO

  • The parser works only for the English Wikipedia. We are currently working to make it multilingual
  • This version was only tested with article talk pages. Support for user talk pages will be added
  • Users are identified via user name, and user id generated by the software (official Wikipedia user ids are not supported)
  • "Outdent" command is currently not managed

References

For further information, see research paper: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

About

A library for extracting and parsing Wikipedia talk pages

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages