Skip to content

Historical PowerTrack Download Script

Latest
Compare
Choose a tag to compare
@jimmoffitt jimmoffitt released this 21 Aug 21:35
· 77 commits to master since this release

Introduction

A bash script that manages the downloading of Historical PowerTrack (HPT) data files.

  • Ready to run on Linux and MacOS machines.
  • On Windows this script can be used with a Unix/Linux emulator such as CygWin. See this article for more information on getting CygWin setup.

The script references your Job's list of download links and uses those links to get the files. This list of links is provided by a "results.csv" file, which is available at a Job-specific endpoint in the form of (scroll to the right to see the complete path!):

HPT 1.0:

https://historical.gnip.com/accounts/{ACCOUNT_NAME}/publishers/twitter/historical/track/jobs/{JOB_UUID}/results.csv

HPT 2.0:

http://gnip-api.gnip.com/historical/powertrack/accounts/{ACCOUNT_NAME}/publishers/twitter/jobs/{JOB_UUID}/results.csv

Place this results.csv file into the "input" folder, run the script, and your download process will begin.

Features

  • Interactive command prompt.
  • Ability to download or re-download files.
  • Ability to delete downloaded files.

Requirements

  • A previously finished Historical PowerTrack Job.
  • An environment that supports bash scripts.

Setup

  1. Download and unzip the PTDownload.zip file to a location of your choice. Unzipping this file will deploy the files and folder structure needed to download your data files.

  2. Download your Job's results.csv file and place it in the "input" folder. The results.csv is available at a Job-specific endpoint of the form:

HPT 1.0:

https://historical.gnip.com/accounts/<account_name>/publishers/twitter/historical/track/jobs/<job_uuid>/results.csv

HPT 2.0:

http://gnip-api.gnip.com/historical/powertrack/accounts/{ACCOUNT_NAME}/publishers/twitter/jobs/{JOB_UUID}/results.csv

This project uses the following folder structure:

  • downloads - Where all files are downloaded to.
  • input - Where the results.csv is located (as downloaded from Historical PowerTrack Job).
  • scripts - Bash scripts and support files used by this program.

Usage

Type the following from the command line to install:

  ./run.sh

Follow the command prompt for additional details. The options prompt can be bypassed by passing the desired option as an argument to the run.sh script if necessary.

Troubleshooting

  • If a file failed to download or is corrupt, simply delete the corrupt file from the downloads folder and re-run the run.sh script again. The run.sh script will only download files that don't exist locally.
  • If the script fails to run, on systems where the default scripting engine is not bash, the #!/bin/sh line at the top of the scripts (run.sh, scripts/options.sh and scripts/utilities.sh) will need to be updated to #!/bin/bash