Skip to content

Command line client for Wikidot which lets you downloat a site as a git repository

Notifications You must be signed in to change notification settings

sandsmark/wdotcrawl

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a fork to make a permanent backup of the SCP wiki.

This is a Python command line client for relatively popular wiki hosting http://www.wikidot.com which lets you:

  • List all pages on a site
  • See all revisions of a page
  • Query page source

Most interestingly, it allows you to download the whole site as a Git repository, with proper commit dates, author and comments!

Dependencies

At least:

  • Python 3
  • python-beautifulsoup4
  • python-gitpython
  • python-requests
  • python-tqdm
Examples:
crawl.py http://example.wikidot.com --dump ExampleRepo
crawl.py http://example.wikidot.com --log --page example-page

It uses internal Wikidot AJAX requests to do it's job. If you're from Wikidot, please don't break it. Thank you! We'll try to be nice and not put a load on your servers.

Downloading of large sites might take a while. If anything breaks, just restart the same command, it'll continue from where it crashed.

Useful links:

Wikidot code (very old) which simplifies things a bit:

The descriptions for on-site modules are heavily correlated with AJAX ones:

Someone else did Wikidot AJAX:

TODO

  • Handle deleted images. Probably need to check the diff and check all pages for references if removed from one page.
  • Handle tags (both added and removed).

About

Command line client for Wikidot which lets you downloat a site as a git repository

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%