New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide Migration-Story for ZODB with Plone from Python 2 to 3 #2525

Open
pbauer opened this Issue Sep 27, 2018 · 9 comments

Comments

5 participants
@pbauer
Member

pbauer commented Sep 27, 2018

ZODB itself is compatible with Python 3 but a DB created in Python 2.7 cannot be used in Python 3 without being modified before.

After some evaluation of different approaches (see https://blog.gocept.com/2018/06/07/migrate-a-zope-zodb-data-fs-to-python-3) https://github.com/zopefoundation/zodbupdate#migration-to-python-3 seems to be a good approach.

If you want to contribute to the documentation or implementation of ZODB Python 3 migration for Plone this README provides some introduction and background information that helps you to get started.

We need to:

  • document this migration-approach for vanilla Plone and for database that contain custom code and addons (plone/documentation#1022)
  • test the migration-script zodbupdate with Plone
    current state and findings in https://github.com/frisi/coredev52multipy/tree/zodbupdate
  • write the required zodbupdate_decode_dict for all the packages in Plone that need it
  • document alternative approaches (e.g. export/import using transmogrifier or plone.restapi)
  • not break blobs on upgrade zopefoundation/zodbupdate#7
  • not break user logins (hashed passwords) on upgrade #2576
  • find a solution to not break the catalog (workaround: clear and rebuild after first startup under python3)
  • provide a python3 migration view for Plone5.2: #2575

improvements for zodbupdate that will make migrator's lifes easier:

@pbauer pbauer added this to the Plone 5.2 milestone Sep 27, 2018

@pbauer pbauer added this to To do in Saltlab Sprint Sep 27, 2018

@pbauer pbauer added this to Open in Python 3 Sep 27, 2018

@pbauer

This comment has been minimized.

Member

pbauer commented Sep 28, 2018

See zopefoundation/Zope#285 for the zodbupdate_decode_dict in Zope.

@davisagli

This comment has been minimized.

Member

davisagli commented Sep 29, 2018

I played with zodbupdate a bit a week or so ago and it seems promising, but will probably take some work to get great results.

Here's the basic path:

  1. Set up a buildout that includes Plone 5.2 in Python 2, plus the zodbupdate and the following mr.developer checkouts:
  1. Run bin/zodbupdate --pack --convert-py3 -f path/to/Data.fs. This will do an in-place migration of the filestorage so make sure you do it on a copy if you want to keep using it in Python 2.
  2. Copy the filestorage over to a buildout with the py3.cfg build of Plone 5.2, on Python 3.
  3. Start the site with bin/wsgi and look for what works and what doesn't. If you find objects with decode errors, figure out what attributes are the problem (i.e. which ones should not be converted from bytes to str -- a pdb in ZODB.serialize is helpful) and add a zodbupdate mapping for those attributes, rinse, and repeat.

So far I only tried this with a fresh Plone site, so no real data. It would be interesting to try with a real site to get a sense of how long the migration takes in practice.

Some remaining issues I found:

  • After migration, logging in does not work. That's because AuthEncoding.is_encrypted expects bytes but is getting str (the hashed passwords in the ZODBUserManager were migrated). This one is tricky because they are in a BTree and we can't just add a zodbupdate mapping to avoid migrating all BTree values to str. So we don't have a good way to target this particular BTree for avoiding str-ification. Maybe we need to convert the str back to bytes at runtime before passing it to AuthEncoding.
  • I got a weird error on the redirect after saving an edit to the homepage -- so far I couldn't figure out which object is causing the problem. (Something tells me catalog?) But after reloading, I could see my edited text.

I'm sure there are more issues. Some things that come to mind to pay attention to:

  • Any attributes that are declared as Bytes or BytesLine in a Zope schema, or string/text/lines in a Zope property, may need "binary" entries in a zodbupdate decode mapping.
  • Likewise for any attributes that store binary data without being declared like that. From the above checkouts we already have mappings for OFS.Image.Pdata and the message of a ZopeVersionControl LogEntry (which is actually a pickle stored inside a ZODB pickle!) We don't need to do this for blobs though.
  • Any attributes that store data which should become unicode text may need a zodbupdate decode mapping to say which encoding to use during conversion (probably utf-8 in most cases). Maybe we want to add an option to zodbupdate to specify a default encoding to avoid adding mappings for the common case.
@thefunny42

This comment has been minimized.

thefunny42 commented Oct 1, 2018

I added the options to migrate zodbupdate to migrate my application to Python3. That worked fine, and we are running Python 3 in production since March. There's some stuff to know:

  • zodbupdate out of the box will break your blobs if you use the filesystem option: the script goes over the whole database, transform the records and recommit them in a new transaction. The blob files need to be renamed in order to match the new transaction id.
  • zodbupdates takes care of two big problems: bytes in date/datetime objects and zodb references, but for other strings that should be converted to bytes, or otherwise decoded, you need to identify them yourself. Hopefully if you used unicode everywhere, there will not be any problems.
  • In Python 3, strings store utf-8 by default, you do not need to change anything there.
  • We did have some issues with zope.index.text, which has an optimisation that stores non unicode code in Python 2 strings, but uses strings in Python 3 (instead of bytes, which would have been the proper thing to do). We basically made an helper script that would go over the indexes and decode them as "raw-unicode-escape" in the database before doing zodbupdate. This would be the strategy to convert strings in btrees for instance, or anything you cannot target with zodbupdate.
  • The only thing we did keep as bytes in our database are password hashes actually. We used unicode and therefore strings in Python 3 everywhere, so we did not have much troubles there.
  • We used a different script to run zodbupdate in production: https://github.com/minddistrict/mdtools.relstorage (version 2.0). This does the same thing, except directly rewrite the records in Postgresql, mostly for speed reason (zodbupdate is not very fast on large databases).

I doubt migrating a Plone site will be easy, and will depend a lot on the extensions that has been installed and the custom code written, since they should be checked for strings/bytes problems.

@icemac

This comment has been minimized.

Contributor

icemac commented Oct 2, 2018

Although zodb.py3migrate cannot be used to do the actual migration (see my blog post), it has an analysis step which shows the objects which might need a conversion. Maybe this is easier than the approach to try out and see what breaks. See https://zodbpy3migrate.readthedocs.io/doc.html#upgrade-workflow for the documentation of the analysis step.

@frisi frisi self-assigned this Oct 2, 2018

frisi added a commit to plone/buildout.coredev that referenced this issue Oct 2, 2018

@frisi

This comment has been minimized.

Contributor

frisi commented Oct 3, 2018

We did have some issues with zope.index.text, which has an optimisation that stores non unicode code in Python 2 strings, but uses strings in Python 3 (instead of bytes, which would have been the proper thing to do).
We basically made an helper script that would go over the indexes and decode them as "raw-unicode-escape" in the database before doing zodbupdate. This would be the strategy to convert strings in btrees for instance, or anything you cannot target with zodbupdate.

Thanks for sharing your experience @thefunny42! could you please email me this script or post the relevant parts here/as gist so i can use it for documenting the migration of plone sites? Thanks a lot!

frisi added a commit to plone/buildout.coredev that referenced this issue Oct 3, 2018

@frisi

This comment has been minimized.

Contributor

frisi commented Oct 3, 2018

i prepared a buildout and documented the process of creating a sample plonesite running python2 and migrate it to python3.

you can find everything under https://github.com/frisi/coredev52multipy/tree/zodbupdate
this should help users new to the topic (eg pickles, string handling in python2 VS python3) understand the problem and how to debug and fix problems during migration.

i also started to document the plone-specific problems and possible solutions there. it is pretty much a summary of @davisagli @thefunny42 and @icemac writeups including some information on where to hook into to fix it.
i'd like to discuss these in the hangout today with you guys

@thefunny42

This comment has been minimized.

thefunny42 commented Oct 3, 2018

Some additional information:

  • The way we debugged our database to see if the migration worked was to unpickle all the records there was in it, we used the zodbsearch/relsearch script you can find in our mdtools repository,
  • I remember something with zope.schema where ASCII fields are based on zope.schema.Bytes in Python 2 but on zope.schema.Text in 3. We did replace in our application BytesLine with ASCIILine (because the BytesLine would actually store things in Python 2 str, and ASCIILine was ok for what we did).
  • Fixing our text index did not require any magic. In Python 2:
def fix_text_index(index):
    if not zope.catalog.text.ITextIndex.providedBy(index):
        return
    words = index.index._docwords
    count = 0
    for k, v in list(words.items()):
        if isinstance(v, str):
            count += 1
            words[k] = v.decode('raw-unicode-escape')
    if count:
        print('Updated {} words.'.format(count))
    return count != 0
@davisagli

This comment has been minimized.

Member

davisagli commented Oct 3, 2018

@frisi I've only skimmed your writeup so far, but it looks really great! The same results I was discovering, but much more clearly written.

@frisi

This comment has been minimized.

Contributor

frisi commented Oct 7, 2018

i removed myself as an assignee as i won't be able to carry on with the zodb-py3 migration in the near future. hope my current findings and documentation will help other contributors to get startet.

@thefunny42 thanks for your comments and fixes on the zodbupdate tickets/PR.

@davisagli could you please have a look at the updated ticket description. i tried to add an overview over the currently known migration tasks and created/linked tickets where i summarized the current state.
threre is also a PR with a rough draft of the database migration in plone/documentation#1022. if you feel that there is important information missing please add it to the docs or the list in this ticket description so we do not forget anything

thank you all for you help on this topic and happy migrating ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment