Alfresco Bulk Import Tool
What Is It?
A high performance bulk import tool for the open source Alfresco Document Management System.
"'High Performance', you say?"
Why yes. Alfresco's built-in mechanisms for moving large amounts of content into the repository (the various file-server protocols, the venerable ACP mechanism, the mind-bogglingly inefficient CMIS standard etc.) all suffer from a variety of limitations that make them a lot slower than the core Alfresco repository. This tool cuts out virtually all of that nonsense, attempts to maximise "mechanical sympathy" (which, for Alfresco, basically means treating your database nicely), and makes one or two large and opinionated assumptions that allows it to be a lot faster than anything else out there.
In terms of benchmarks, the old v1.x versions of the tool have regularly demonstrated sustained ingestion rates of over 500 documents per second in production environments, and in testing, the v2.x version has been shown to be up to 4X faster than 1.x (in specific circumstances, notably for streaming imports).
Older resources (less relevant for v2.0+):
Please see Contributing.
- Contributors list
- Icon adapted from Appzgear on www.flaticon.com.
- Contributing file heavily inspired by the Atom project.
This extension is not supported by Alfresco Software Inc., although a fork of an early, pre-release version of this tool has been included in Alfresco Enterprise since v4.0, and is supported by Alfresco support.
Please note that the embedded fork has never been rebased against upstream, meaning that it is ancient - equivalent to v1.0-RC1 (circa mid-2010). It also introduced a number of serious bugs (e.g. incorrect "source striping" algorithm, no support for Alfresco clusters) that the original edition never had. The embedded fork has also been independently measured to be around 25% slower than the original edition available here.
tl;dr: use of the embedded fork is STRONGLY discouraged!