The script is configured with command line parameters (for frequently changing preferences) and the config file. Make yourself familiar with the settings and try running the script in simulation mode first.
How to migrate from the Elasticsearch datastore to the new datastore
Note: This step by step introduction shows a possible migration. Some steps may differ in your setup.
- Create a PostgreSQL database for the datastore and a read only user as described in the docs.
- Either have CKAN installed and the migration script in
ckanext/migration/or copy the
\ckanext\datastoreto the same directory that the
migration.pyis in. The second option allows you to run the migration without having CKAN on the same server or even installed.
- Make sure you have all requirements installed.
- Adapt the
config.pyto your needs.
- Make yourself familiar with the command line options. The option
-hwill show possible command line options and some explanations.
- Make sure that the settings are correct
# simulate the migration of one resource $ python migrate.py -s --max 1 config.py # If no errors occur, try writing to the db $ python migrate.py -s --max 1 config.py # Okay, cool. Let's clear the database and start the real migration. # Go into the datastore db and clear the table that has been created. $ psql … # You can run the migration in parallel to speed things up. Let's try the simulation first. ./simulateall.bash
- Start the migration
# If you want to run the migration in parallel (change the file to your needs!) ./runall.bash # Serial execution $ python migrate.py config.py
- Monitor the progress of the migration. Use
tail -f <logfile>to see what's happening if you pipe the output to a log file.
- Enjoy the new datastore
Good to know
- Simulation means that nothing is written to the db.
- Use the
segmentsoption to run parts of the migration in parallel (see
- new CKAN Datastore (at least the
pip install python-dateutil)
In order to clean the database (in case something goes wrong), use this command to delete all tables.
select 'drop table "' || tablename || '" cascade;' from pg_tables where schemaname = 'public';