ywow - year's worth of weibo

A simple project to contain a minimal amount of code sufficient for fetching sample data from Sina Weibo, in support of a research project.

Based on the configuration in a local settings file, the script will execute a call to Weibo's 'statuses/public_timeline' API call, then store that data to a local file in utf-8-encoded JSON and a simplified delimited text file. The location of these files will be under the DATA_DIR specified in, and within that, under a DATA_DIR/YYYY/MM/HH hierarchy to spread the files out.

Developed at the GWU Libraries in Washington, DC, USA.

See also LICENSE.txt.


Developed using python 2.7+ for deployment on ubuntu-12.04; your mileage may vary.

  • install ubuntu package dependencies:

      % sudo apt-get install git python-virtualenv s3cmd
  • get this repository:

      % git clone
  • create and activate a virtualenv sandbox for python and dependencies:

      % virtualenv ENV
      % source ENV/bin/activate
  • install requirements:

      % pip install -r requirements.txt
  • create a data directory:

      % mkdir data
  • copy the settings template to a local python file:

      % cp
  • edit appropriately, including the full absolute path to your data directory.

  • NOTE: you will need an active weibo API key and token. For background on obtaining these, see:

      Weibo API docs:
      CMU's API guide:
      Example usage from python library:
      Copy (fork) of above under GW Librares' github account:
  • test the script on the command line:

      % python