New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not easy to move content between sites or bulk update data #1373

Open
djay opened this Issue Feb 11, 2016 · 10 comments

Comments

Projects
None yet
5 participants
@djay
Copy link
Member

djay commented Feb 11, 2016

User problem

There are a number scenarios where movement of content is useful.

  • User has created a site locally and now wants to put it into production
  • User has staging server where they want to make wide ranging changes which get moved into production in a single transaction
    • also moving production content back into staging. ie resynching.
  • When upgrades are too hard you might want to start afresh and move just the content over
  • This might be an entire site or just a part of the content tree

It would potentially need to synch most/all of the following

  • localroles/sharing
  • workflow state
  • versions?
  • custom content types out of the box
  • ?

There is some related usecases which could also get covered by a solution here

  • The need to export metadata of content in a useful format, for example to do an audit of your site. For example as a CSV.
  • Import metadata for existing objects, eg bulk rename, update creation dates etc.
  • Ability to deliver a theme with example content easily.

See discussion in https://community.plone.org/t/summer-of-code-2016/1537/24?u=djay
and provide-a-solution-to-permit-asynchronous-replicat
and https://trello.com/c/e4BCKA8I/35-not-easy-to-move-content-between-sites

Options

see README of https://github.com/collective/collective.importexport for list of existing plugins in this area.

How it's used

  1. (Current) - zexp import/export - requires ZMI access, shell access, has security issues and can be brittle with regard to having the right plugins installed.
  2. (Current) some additional plugins such as collective.jsonify and transmogrifier and often involves shell access to the server which often not possible.
  3. An action in the control panel on grab a whole site and install a whole site. Could be bundled into the import/export of a theme.
    • Asko as been working on support for GS inside a Diazo theme zip which could provide a TTW way to move content between sites. You could freeze some content into GS inside a theme, download it as a zip and then upload it to another site. see https://github.com/collective/collective.themesitesetup
  4. An import/export folder action similar to https://plone.org/products/csvreplicata
    • Respects permissions.
    • in contents and action menu?
  5. Use webdav
  6. Allow for DnD for folders
  7. Import from content located on the server

Formats supported

  1. CSV for metadata + files for content
    • uses CSV for most metadata, files for main content (html, pdf, image etc) and json inside the CSV for any more complex.
    • Export/import of files in zip could be optional so it covers the usecases of synching metadata only.
    • pro: csv format allows editing by non techical users.
    • pro: single metadata file allows to upload files separately using resumable upload widget and then upload the metadata as a seperate step. Helps with large sites.
  2. Uses a big json file like collective.jsonify.
  3. Uses zip file format where metadata is distributed throughout the folders using and items using seperate .json files.

Plan

Ideally if it can be done

  • we support both JSON and CSV and plain files for different usecases (technical and non technical). It should be possible to support complete export then reimport with either CSV or JSON.
  • Allow either single file (json or csv), zip or DnD of folders contents to import via folder contents
  • Allow for mapping your data field names to internal data names via a UI so you don't have do conversions on existing import data before performing your import. At least for simple cases.
  • Reuse plone.restapi for marshalling. Where restapi is missing data (portlets? comments?), extend it to include the needed data.
  • Users, configuration registry and theme are excluded.
@ebrehault

This comment has been minimized.

Copy link
Member

ebrehault commented Feb 11, 2016

+1 on the idea.
Just a reminder: Products.csvreplicata has been originally created to be able to edit Plone contents offline, that's why we picked CSV (as it can be edited by any user using its Office suite).
We did use it to migrate content, and the CSV format turned out to be very bad in that use case.

So I would choose the JSON format, preferably in separate files, as big JSON files are always painful to manage.

About putting it in the core or not:

  • I definitely think easy import/export should be a core feature,
  • but we must keep this core component simple, so, to me, it should only handle the default Plone content types (we might also allow any "regular" Dexterity content type),
  • the ability to handle Archetypes contents, or specific content types (having custom field types for instance or exotic behaviors) should be implemented in dedicated add-ons (extending the core feature).
@djay

This comment has been minimized.

Copy link
Member Author

djay commented Feb 11, 2016

@ebrehault What about the idea of combining json and csv I put above? Simple fields like date, title, path etc are normal csv values and anything it can't handle like dicts etc is dumped as a json string into a csv value, such as sharing settings etc. It's just an idea and I haven't tried it yet but I think it could work.

@ebrehault

This comment has been minimized.

Copy link
Member

ebrehault commented Feb 11, 2016

@djay well, I do not like it much. I prefer real pure JSON. It is much more stable and consistent.

@hvelarde

This comment has been minimized.

Copy link
Member

hvelarde commented Feb 11, 2016

+1 on using pure JSON; CSV files work fine most of the time for content coming from a relational DB but that's not our case; having to handle 2 formats is not a good idea neither, IMO.

probably we must work on exporting from Dexterity only (as this feature seems to be focused on Plone 5); importing should be compatible with older solutions like collective.jsonify; that way we can be able to migrate from older Plone versions.

the size of the file should not be a primary concern at the beginning.

@idgserpro

This comment has been minimized.

Copy link
Member

idgserpro commented Feb 11, 2016

@djay Isn't this a duplicate of #468? Or is this one from an end user perspective, and the other issue from an integrator perspective?

@djay

This comment has been minimized.

Copy link
Member Author

djay commented Feb 12, 2016

@idgserpro crap, yes it is. I did search for it :( One downside of my habbit of naming tickets by problem rather than solution. They tend to have hard to remember names. I will merge them

@djay

This comment has been minimized.

Copy link
Member Author

djay commented Feb 12, 2016

I agree with comments on CSV vs json generally but I think this is a special case that could work.

Data is not relational. True but transmogrifier has demonstrated well that a list of dictionaries is a very useful way to represent plone content. This would be equivalent. For example

path, title, description, authors_json, ...
"/folder1/page1", "A page", "blah, blah", "['djay','hector']",..

Is this harder to parse programatically than one big json? Yes but not much. It's maybe 10 lines of python.

Is this easier for a non technical person to do simple bulk updates on or do filters etc? Yes. They can just use a spreadsheet and save back to CSV. One big json file requires code. Many json files requires more code.

I don't really see the downside. compatibility is with c.jsonify is perhaps the only downside but I don't think that is important. It would be easy to create a converter and/or create a modified version of c.jsonify that supported this format and still ran in very old versions of plone like c.jsonify. Also I think jsonify embeds all the binary data inside the json which is a bit messy.

+1 on only having a single format to support and on only supporting dexterity for now.

@pigeonflight

This comment has been minimized.

Copy link
Member

pigeonflight commented Feb 14, 2016

I would like to suggest that moving of portlets (at minimum standard portlet managers plone.footerportlets, plone.leftcolumn, plone.rightcolumn) needs to be supported.
And default portlets, calendar, news, events, collection, static etc...

I just lost many hours last week on moving portlet content.

@ebrehault

This comment has been minimized.

Copy link
Member

ebrehault commented Feb 14, 2016

@pigeonflight I agree.
The import/export feature should probably offer different options like:

  • include or exclude portlets,
  • include or exclude comments,
  • include or exclude some content types (choosen by the user)
@djay

This comment has been minimized.

Copy link
Member Author

djay commented Mar 1, 2017

Added a plan, however @ebrehault isn't this supposed to be the main PLIP? - #1386

Plan

Ideally if it can be done

  • we support both JSON and CSV and plain files for different usecases (technical and non technical). It should be possible to support complete export then reimport with either CSV or JSON.
  • Allow either single file (json or csv), zip or DnD of folders contents to import via folder contents
  • Allow for mapping your data field names to internal data names via a UI so you don't have do conversions on existing import data before performing your import. At least for simple cases.
  • Reuse plone.restapi for marshalling. Where restapi is missing data (portlets? comments?), extend it to include the needed data.
  • Users, configuration registry and theme are excluded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment