Skip to content

sharatsc/cdse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CDSE Data science at the command line workshop

(http://www.buffalo.edu/cdse/CDSEdayslanding1/cdse-days2017/faculty_directory/SharatChikkerur1.html)

Abstract

The workshop will present how to combine tools to quickly query, transform and model data using command line tools. The goal is to show that command line tools are efficient at handling reasonable sizes of data and can accelerate the data science process. The content of the workshop is derived from the book of the same name (http://datascienceatthecommandline.com). In addition, we will cover vowpal-wabbit (https://github.com/JohnLangford/vowpal_wabbit) as a versatile command line tool for modeling large datasets.

Setup instructions

  • Download and install virtual box to run virtual machines. Select the appropriate binary from https://www.virtualbox.org/wiki/Downloads
  • Download vagrant to manage pre-built virtual machines. Vagrant is available for most platforms from https://www.vagrantup.com/
  • Install the Datascience toolbox
    
      mkdir data-science
      cd data-science
      vagrant init data-science-toolbox/data-science-at-the-command-line
      vagrant up
      vagrant ssh
    
  • Once in the virtual machine, install the vowpal wabbit library.
  • Clone this repo
    
      git clone https://github.com/sharatsc/cdse/
    
  • If you'd like to run the examples in this repo, you should also install the following

  sudo apt-get install ipython-notebook python-statsmodels python-matplotlib

About

Data science at the command line workshop

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published