importing wikipedia articles into plone (using collective.transmogrifier)
Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
collective
dist
.gitignore
MANIFEST.in
README.rst
bootstrap.py
buildout.cfg
import.py
setup.cfg
setup.py
simplewiki-test-pages-articles.xml
simplewiki.cfg

README.rst

Introduction

Import blueprint is not yet done so please help import wikipedia to Plone.

  1. clone project:

    % git clone git://github.com/garbas/collective.blueprint.wikipedia.git
    
  2. run buildout:

    % cd collective.blueprint.wikipedia
    % python boostrap.py
    % bin/buildout
    
  3. run plone and create plone site with id Plone

  4. download wikipedia articles and untar it:

    % wget http://dumps.wikimedia.org/simplewiki/latest/simplewiki-latest-pages-articles.xml.bz2
    % tar jxvf simplewiki-pages-articles-xml.bz2
    
  1. make sure that you point to right xml in config:

    % vim simplewiki.cfg
    
  1. run import:

    % bin/instance run import.py simplewiki.cfg Plone
    

TODO

  • Currently it fails around 20.000 items when trying to import ".htaccess"
  • recognize language wiki links (for now we are stripping them out)