Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

README.md

WikiTalkParser

WikiTalkParser is a library for extracting and parsing Wikipedia talk pages, identifying comments with their signature, date and indentation in the thread structure. In the current version, talk pages are extracted from the WIkipedia API, given in input a list of articles. Only the English language version is supported.

Language

Tested with Python 2.7

Authors

David Laniado and Riccardo Tasso

Limitations/TODO

  • The parser works only for the English Wikipedia. We are currently working to make it multilingual
  • This version was only tested with article talk pages. Support for user talk pages will be added
  • Users are identified via user name, and user id generated by the software (official Wikipedia user ids are not supported)
  • "Outdent" command is currently not managed

References

For further information, see research paper: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

About

A library for extracting and parsing Wikipedia talk pages

Resources

License

Releases

No releases published

Languages

You can’t perform that action at this time.