Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
PLIP: Content import/export #1386
Proposer : Eric Bréhault
Seconder : Dylan Jay, @hvelarde
Provide a method to allow an editor to import content, export content or move content securely between sites.
Exporting and importing content from a Plone site should be easy.
There are a number scenarios where import/export of content is useful.
The motivation is try and cover all of this with a single UI but if not the primary usecase is moving content between sites.
It might be relevant to re-use here the
Proposal & Implementation
See #1373 for links to previous discussions
The import/export feature will be applicable to all the Plone 5 default content-types and any regular Dexterity type.
The screen is accessible from the Actions menu on any folder.
The Basic tab would allow to export all the folder and tree contents (and the result is immediately downloaded), or to import contents by uploading a file (about the data format see below).
The Advanced tab would allow to do the same but also to choose:
After import a report is given of how many objects created, updated etc.
The default data format would be a .zip containing:
The Advanced tab will allow to choose a pure JSON format instead of the CSV format. It will be a .zip file containing:
If we choose to use the server ./var folder instead of upload/download, the files are not zipped.
Note: we propose to use CSV as a default format because standard users are more likely to open/edit/manipulate CSV files rather than JSON.
By default the corresponding permission will be assigned to Managers only.
As data can be exposed and manipulated in transit when uploading or downloading contents (see Risks), we just propose to add the following warning:
Import / export processing must/should be done asynchroneously.
+1 on preserving portlet assignment data and comments;
FWIW about legacy approaches:
There used to be RFC822 marshalling and demarshalling (mostly familiar from webdav support), which was a bit of a hack for Archetypes, but had a good framework (plone.rfc822) for Dexterity. Of course, the RFC822 format feels outdated in the current JSON world. I did Archetypes adapters for plone.rfc822 for my own migration use at https://github.com/datakurre/collective.atrfc822
There used to be CMF GS export and import steps for content ("structure"), but it did't get much love from Plone. Quite deprecated, again, but I did experiment with adding support for current content types in https://github.com/datakurre/collective.themesitesetup
About portlets. Portlets have good GS import code, but are lacking export code for context portlets below the root. Some pointers for re-using GS for exporting and importing context portlets for arbitrary objects at https://github.com/datakurre/transmogrifier_ploneblueprints/blob/master/src/transmogrifier_ploneblueprints/portlets.py
@datakurre the marshalling format can't handle any kind of complex data, I think, so it would be limited in what can be transferred. This is aiming to be more similar to jsonify, which transfers everything it can.
Another use case would be to "Backup" the site. Of course that's implicitly the case with this PLIP, it's just not mentioned explicitly.
I have some questions:
@thet I think as files, with a "sidecar" file for metadata (personally, I prefer JSON everything). Maybe my nomenclature is too file-centric ("sidecar" is a term Adobe uses, for example, in asset management to describe XML metadata in a distinct file) -- the JSON is actually the primary thing for the type; all the binary stuff it includes should be cleanly referenced/named, and stored as distinct files.
Using a naming convention alone for association between JSON and binary content makes sense for 1:1 content ("primary field"), but may be unwieldy for content types that store more than one file/binary field and have issues with name collision (depending on implementation). Instead, I like the idea that every content item gets a master JSON file that has names/links to the files included within (maybe stored with some unique scheme like OID-plus-original-filename, but original filename metadata for the blob needs to be preserved as well).
A few caveats/challenges:
(1) What if a (text) field in a content type stores JSON, how does one serialize JSON-within-JSON? We have a few types like this.
(2) Can we preserve all of the following (non-lossy round-trip):
(3) Are we assuming that HTML primary fields should just be kept within the JSON, or kept in distinct files for editing by tools/editors?
@thet @seanupton I realise JSON is preferable for developers and that CSV requires more work. I'm not sure CSV really needs to be integrated into restapi. But I do think its going to be very useful for end user as I've outlined above and that makes it worth the extra complexity.
Imagine an transfer where you want to drop all files of a certain size or a certain content type or by a certain author? That doesn't require a programmer, it would only require very basic excel.
If they wanted to do a bulk change of all the ownerids on the same site, just export metadata only, remove extra columns, upload with the just the path and owner and import.
as much as possible it would be nice to keep filenames and folders that are easily readable rather than folders with pages of OIDs. Again I realise this increase the complexity.
Again I'm imaging an end user case where a hand made import zip is created. They create a csv with a path field so the primary data can be found on import. That could work for additional binary fields too. For example csv col name of lead_image__binary_path__, lead_image__binary__name__.
(3) I'd keep html as a seperate file. Maybe some kind of rule around length or mimetype?
(3): I think having a separated HTML files for rich text fields is much easier for users. But on this point (and also others), we can have a specific option in the Advanced tab for that (so basic mode would export separated HTML files, but in Advanced mode we could choose to put them into the JSON, we could also choose to put the attached fiels as base 64 in JSON or not, etc.)
@djay you are addressing a different use case with your example; the purpose of this PLIP is to have an import/export function in Plone. once we have that, we can start imagining other scenarios.
I would love to have a ZIP file with the whole site exported in JSON format maintaining the structure and exporting the attachments as separate files.
I helped write the PLIP hector. It addresses the use cases listed under
On Wed, 2 Mar 2016 7:37 pm Héctor Velarde firstname.lastname@example.org wrote:
@hvelarde there are some reasons these goals are not really in conflict, I think:
(1) It is possible to start with JSON and bolt on other formats;
(2) It is reasonable to use the "advanced" tab to select formats;
(3) Other formats (CSV, legacy RFC822) are easy to output in common case, though IMHO the JSON format should be the one that absolutely guarantees lossless roundrip;
(4) A compressed archive makes having both CSV and JSON of same content in the same archive neglegible in storage cost.
(5) Designing an "advanced" tab with checkboxes for formats (JSON checked by default) has trivial additional cost, and probably isn't YAGNI.
(6) Supporting multiple formats through some sort of pluggability (adapters?) will encourage contribution of folks who want to scratch an itch beyond some core promise of lossless JSON.
@djay yes, you listed 9 scenarios and one of them is the one I would prefer to move to another issue because almost the whole discussion here has been around it; you prefer CSV because you want to edit the file but that is a different use case.
besides that, this PLIP will be the result of this conversation and not only what you think it should be.
@seanupton yes, so we can leave CSV conversion for another iteration.
So your argument hector is you don't like the usecase??
If we have a chance to solve two problems with one solution then in my book
Unless someone can argue a disadvantage then I think we should shut this
On Thu, 3 Mar 2016 2:41 am Héctor Velarde email@example.com wrote:
my argument, @djay, is the scope of this PLIP; I have spent 10 of the last 15 years dealing with this kind of things and I can smell when a task needs to be split into smaller ones to be accomplished with the minimum amount of time and resources (yes, call it intuition if you like).
as I said before, you have listed 9 scenarios just to emphasize how important is to have content import/export working out of the box in Plone; this will save us a lot of time and effort if implemented the right way and I am compromising our scarce resources to help on that.
can we just move on and focus on the main issue? content import/export.
starting with a well defined scope can help us think on implementation details like content, format, structure, versions supported, etc. then a bunch of details will emerge and we can solve all of them one by one.
worth reading: Advantages of User Stories for Requirements
I like the idea and this is definitely something that Plone should provide out of the box. I share @hvelarde concerns about the scope of the PLIP though. I also think that we should finish plone.restapi first and then build the import/export feature on top of that, not the other way around. Otherwise we risk ending up with two incompatible "APIs" or reinventing the wheel twice. Both tasks are just too big to work on them in parallel in my opinion.
So one way to re-focus this PLIP could be to provide the format and utilities that will be used by the restapi? So when implementing the restapi only one api call needs to be done, i.e.:
I wired that up on plone.api style, which could be one of the first users of this utility/adapter as well.
according to https://docs.google.com/spreadsheets/d/15Cut73TS5l_x8djkxNre5k8fd7haGC5OOSGigtL2drQ/edit#gid=1837812562 this was rejected
@hvelarde I also think it is important, but it hasn't been rejected because of the scope.
Meanwhile, I plan to develop the addon (I proposed it as a GSoC subject but nobody picked it). If you want to help you are welcome :)
@ebrehault it is mostly implemented already in https://github.com/collective/collective.importexport. it just needs work to include json in the csv for aditional fields, and a zip file format for blobs.
Yes, trim the scope please. This is an important feature to have in a CMS.
I think the minimal viable product should start with JSON export/import and then proceed from there. As a developer, that's the format I would use to load and process data structures, with a second format being YAML (it's much nicer and more readable, but less popular), and a distant third being XML.
CSV may be useful for some folks — those who use spreadsheets — but that set of folks has little overlap with the set of folks maintaining and operating Plone sites. CSV also can can't capture hierarchy well, and it's a mess to query / manipulate / et cetera.
I remember using collective.importexport and it really wasn't a nice experience, precisely because of the CSV.
@Rudd-O developers have plenty of options to import export. They aren't the main target IMO. I think you might have a skewed view of who is maintaining websites. There are plenty of cases where developers or devops don't want to get involved for simple cases of importing and exporting content. If its the case of a webmaster and they want migrate the whole site out of plone, then if this function helps them, then great, but I don't think thats the target.
collective.importexport is a start. All good UI's they born out of lots of user testing and feedback. So I'd very much appreciate where you felt you got lost in using that plugin and what your usecase was so I can improve it. But not here.
I will propose this as a GSOC project
Strategically, good content import/export is important because
The end goal is to make plone more approachable for webmasters which will in turn help grow the install base.
The aim would be a online UI which allows
It will be implemented as an addon or extend an existing addon, that can be incorporated into plone as at a later date. c.importexport is example of existing addon that can be extended.
Mainly python. Some UX skills to help create an intuitive UI but this can be provided by the mentor.
Dylan Jay, Eric Bréhault
Addon and a PLIP to include this in core plone.