A Node.js module, ideal for a chron, that will download data from a Google Spreadsheet and put select columns on an Amazon S3 bucket. To get your spreadsheet key, do
File > Publish to the Web in Google Spreadsheets.
You'll want to create an AWS
credentials.json file like the sample and put it somewhere like
~/.aws/credentials.json and type that path into
Tested on Node 0.10.7
npm install turntable
The options are fairly self-explanatory. The only two that aren't immediately obvious are
output_schemais an array of column names to copy over into your public table. This is useful if you collect reader contact info that you want to keep but that you don't want to make public. Set
falseto copy all columns.
moderatesets options that will only copy over approved rows. Set the name of the moderation column in
column_nameand the string that approves a row in
falseto copy all rows.
- Can only uploads moderator-approved rows.
- Can only uploads the columns you specify in
gdoc_info. Handy in case there are fields you use internally that aren't meant for production. For instance, you might have an "Edited by" or "Written by" column that you want to keep in your document but don't need to show publicly.
- Uploads two copies of your data: 1) the production copy that gets overwritten each time with new data; 2) a timestamped copy that goes into the
backupsdirectory. The default directory is
backupsin the same directory as your
output_path. You can set your own backup directory in the
aws-info. With backups, you can easily revert to an old version if necessary.
You can optionally set up a Twitter bot to deliver notifications by setting
true in the
tweetbot_info object. This can be used mostly likely on a private account for easy team notifications. Setting @-replies for errors could be an effective notification systems. Successes needn't be so noisy.
A note on data privacy
As long as you don't share you key with anyone, publishing to the web doesn't alter your sharing and security preferences for that doc. If there are columns that you don't want visible in your csv on S3. There are two options:
- If you're okay with that data being accessible if someone know the spreadsheet key, then, in the script, you can specify which columns it will copy over to S3 by naming them in the
- If you want more security, create a second sheet with a formula like
=Sheet1!A:Ain Column A,
=Sheet1!B:Bin Column B and so on. If you copy that formula down, it will take the values from Sheet1 only for the columns you specify. The downside: if you don't copy the formula in Sheet 2 to enough rows, then it won't carry over the data. So you have to keep an eye on it and make sure your formula is in all rows. You'll want to overwrite the ajax url to make sure it grabs the proper worksheet.