importing wikipedia articles into plone (using collective.transmogrifier)
Python
Pull request Compare This branch is 19 commits ahead of garbas:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
collective
.gitignore
MANIFEST.in
README.rst
bootstrap.py
buildout.cfg
import.py
runimport.py
setup.cfg
setup.py
simplewiki-test-pages-articles.xml
simplewiki.cfg
solr.cfg
wikiversity.cfg

README.rst

Introduction

Import blueprint is not yet done so please help import wikipedia to Plone.

  1. clone project:

    % git clone git://github.com/garbas/collective.blueprint.wikipedia.git
    
  2. run buildout:

    % cd collective.blueprint.wikipedia
    % python boostrap.py
    % bin/buildout
    
  3. run plone and create plone site with id Plone

  4. download wikipedia articles and untar it:

    % wget http://dumps.wikimedia.org/simplewiki/latest/simplewiki-latest-pages-articles.xml.bz2
    % bunzip2 simplewiki-latest-pages-articles.xml.bz2
    
  1. make sure that you point to right xml in config:

    % vim simplewiki.cfg
    
  1. run import:

    % bin/instance run import.py simplewiki.cfg Plone
    

TODO

  • Currently it fails around 20.000 items when trying to import ".htaccess"
  • recognize language wiki links (for now we are stripping them out)