These scripts assist in uploading an entire directory onto swift. They were intended for directories containing millions of files up to several terabytes large.
- Python 2.7+
- python-mysqldb
- Python Swiftclient
- MySQL database
- The following environment variables
- OS_AUTH_URL
- OS_USERNAME
- OS_TENANT_NAME
- MYSQL_HOST
- MYSQL_USER
- MYSQL_PASSWD
- MYSQL_DB
Step 1. Index target directory with prepareupload.py
$ python prepareupload.py PathTodirectory MysqlTableName
This creates a table MysqlTableName and populates it with paths to all files in PathToDirectory. It outputs the following log files:
- MysqlTableName.prepare.error.log # Will log any file path that failed when written to the database.
- MysqlTableName.prepare.out # A real time log file as file paths are being parsed.
While the above command is running, in a new tab run the following command to watch the progress of the parsing:
$ tail -f MysqlTableName.prepare.out
Step 2. Upload files as stored in step 1.
$ python bulkupload.py containername MysqlTableName 3
This creates 3 processes that reads from MysqlTableName and uploads files into the container containername. If the upload process is stopped, it can be re-run and continue uploading without reuploading already uploaded files. Increase 3 to an appropriate number that your CPU can handle for faster speeds.
This script outputs the following files:
- MysqlTableName.upload.out # Real time progress of upload
- MysqlTableName.error.log # Logs failed uploads
- MysqlTableName.report.log # Created when upload is complete with summary of results.
To check the progress of the upload, run the following command:
$ tail -f MysqlTableName.upload.out
####path-cutoff
Example:
$ python bulkupload.py containername MysqlTableName 3 path-cutoff
When uploading a directory from your filesystem, the folder structure is maintained. But sometimes you may not need the entire path. Say you have files in /Users/John/Doe/assets. By using Doe as your path-cutoff, only the directory structure under assets will be maintained.