Biostars data-mining and statistical analysis.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
data
lib
test
.document
.gitignore
.travis.yml
Gemfile
LICENSE.txt
README.md
README.rdoc
Rakefile
VERSION

README.md

bio-biostars-analytics

Build Status

Data-mining analysis that make use of this gem (newest to oldest):

Installation

Biostars analytics can be installed as a Ruby gem:

gem install bio-biostars-analytics

Statistical analytics requires the installation of R 2.15.0 or later; requires the installation of the plyr package 2.15.1 or later.

Usage

Data-mining: crawl the Biostars forum and retrieve data from the Biostar RESTful API; parameters as of March 2014:

biostars-analytics 96000 54

This will create two files: <date>_api.tsv and <date>_crawled.tsv

Various plots in PNG file format can be generated via:

biostar_api_stats <date>_api.tsv
biostar_crawled_stats <date>_crawled.tsv

Command Line Usage Instructions

Data-Mining

Usage: biostars-analytics max_post_number months_look_back [min_post_number]

Required parameters:
  max_post_number    : highest number (ID) of the post that should
                       be mined for data; the crawler will go over
                       posts min_post_number to max_post_number
  months_look_back   : how many months back should queries to the
                       Biostar API go (1 month = 30 days); default
                       value is 1

Optional parameters:
  min_post_number    : lowest number (ID) of the post that should
                       be mined for data

Output (date matches the script\'s invokation):
  <date>_crawled.tsv : data mined from crawling over posts
  <date>_api.tsv     : data extracted from the Biostar API

Example: mining Biostars in March 2014:
  biostars-analytics 96000 54

Statistics (based on RESTful API data)

Generates plots as PNG files in the current working directory.

Usage: biostar_api_stats apitsvfile

Example (data provided at http://github.com/joejimbo/bioruby-biostars-analytics):
  biostar_api_stats data/20140328_api.tsv

Statistics (based on forum mining/crawling)

Generates plots as PNG files in the current working directory.

Usage: biostar_crawled_stats crawledtsvfile

Example (data provided at http://github.com/joejimbo/bioruby-biostars-analytics):
  biostar_api_stats data/20140328_crawled.tsv

Project Repository

Contributions can be made to the open repository on GitHub:

http://github.com/joejimbo/bioruby-biostars-analytics

The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.

Cite

If you use this software, please cite one of

Biogems.info

This Biogem is published at (http://biogems.info/index.html#bio-biostars-analytics)

Copyright

Copyright (c) 2014 Joachim Baran. See LICENSE.txt for further details.