# install binary into system path
echo "PATH=/usr/local/bin:$PATH" >> ~/.bash_profile && \
source ~/.bash_profile

# Vowpal Wabbit
Vowpal Wabbit is a scalable, out-of-core machine learning library first developed at Yahoo! Research, and later Microsoft Research. VW is optimized for scalable online learning tasks and is especially useful for building linear models.

These notes are what I took out of [Vowpal Wabbit tutorial for the Uninitiated](https://www.zinkov.com/posts/2013-08-13-vowpal-tutorial/) and the [wiki](https://github.com/JohnLangford/vowpal_wabbit/wiki)

## Installation on Mac
```shell
# install the boost library
brew install boost

# get vw
git clone git://github.com/JohnLangford/vowpal_wabbit.git

# change directory
cd vowpal_wabbit

# install
make

# test
make test


```


## Input Format
```
label |[namespace] [feature1:value1] [feature2:value2] ... 
```
example:
```
1 |MetricFeatures height:1.5 length:2.0 |OtherFeatures NumberOfLegs:4.0 HasStripes:1.0
```

## Training
```
zcat train.vw.gz | vw --cache_file train.cache -f data.model --compressed
```
`--cache-file` is where the data is stored in a format easier for vw to reuse. `-f` specifies the filename of the output model/predictor. By default none is created. `--compressed` will make it a point to try to process the data and store caches and models in a gzip-compressed format.

## Validation and Testing
```
zcat test.vw.gz | vw -t --cache_file test.cache -i data.model -p test.pred
```
the `-t` option tells vw to ignore the labels and not train on the data. The `-i` option specifies the model. The `-p` option specifies the output file of the prediction.

## Model Inspection
If you want a human readable model, use `--readable_model ` instead of `-f`. To perserve the feature names instead of seeing hashes, use `--invert_hash`

## Example - Houston Housing Price
We will predict housing prices with the following features:
```
1. CRIM: per capita crime rate by town 
2. ZN: proportion of residential land zoned for lots over 25,000 sq.ft. 
3. INDUS: proportion of non-retail business acres per town 
4. CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) 
5. NOX: nitric oxides concentration (parts per 10 million) 
6. RM: average number of rooms per dwelling 
7. AGE: proportion of owner-occupied units built prior to 1940 
8. DIS: weighted distances to five Boston employment centres 
9. RAD: index of accessibility to radial highways 
10. TAX: full-value property-tax rate per $10,000 
11. PTRATIO: pupil-teacher ratio by town 
12. B: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town 
13. LSTAT: % lower status of the population 
14. MEDV: Median value of owner-occupied homes in $1000's
```
`housing_data`:
```
24.0 | CRIM:0.00632 ZN:18.0 B:396.9 LSTAT:4.98 AGE:65.2 TAX:296.0 RAD:1.0 CHAS:0.0 NOX:0.538 RM:6.575 INDUS:2.31 PTRATIO:15.3 DIS:4.09 
21.6 | CRIM:0.02731 ZN:0.0 B:396.9 LSTAT:9.14 AGE:78.9 TAX:242.0 RAD:2.0 CHAS:0.0 NOX:0.469 RM:6.421 INDUS:7.07 PTRATIO:17.8 DIS:4.9671 
34.7 | CRIM:0.02729 ZN:0.0 B:392.83 LSTAT:4.03 AGE:61.1 TAX:242.0 RAD:2.0 CHAS:0.0 NOX:0.469 RM:7.185 INDUS:7.07 PTRATIO:17.8 DIS:4.9671 
33.4 | CRIM:0.03237 ZN:0.0 B:394.63 LSTAT:2.94 AGE:45.8 TAX:222.0 RAD:3.0 CHAS:0.0 NOX:0.458 RM:6.998 INDUS:2.18 PTRATIO:18.7 DIS:6.0622 
36.2 | CRIM:0.06905 ZN:0.0 B:396.9 LSTAT:5.33 AGE:54.2 TAX:222.0 RAD:3.0 CHAS:0.0 NOX:0.458 RM:7.147 INDUS:2.18 PTRATIO:18.7 DIS:6.0622 
28.7 | CRIM:0.02985 ZN:0.0 B:394.12 LSTAT:5.21 AGE:58.7 TAX:222.0 RAD:3.0 CHAS:0.0 NOX:0.458 RM:6.43 INDUS:2.18 PTRATIO:18.7 DIS:6.0622 
22.9 | CRIM:0.08829 ZN:12.5 B:395.6 LSTAT:12.43 AGE:66.6 TAX:311.0 RAD:5.0 CHAS:0.0 NOX:0.524 RM:6.012 INDUS:7.87 PTRATIO:15.2 DIS:5.5605 
27.1 | CRIM:0.14455 ZN:12.5 B:396.9 LSTAT:19.15 AGE:96.1 TAX:311.0 RAD:5.0 CHAS:0.0 NOX:0.524 RM:6.172 INDUS:7.87 PTRATIO:15.2 DIS:5.9505 
16.5 | CRIM:0.21124 ZN:12.5 B:386.63 LSTAT:29.93 AGE:100.0 TAX:311.0 RAD:5.0 CHAS:0.0 NOX:0.524 RM:5.631 INDUS:7.87 PTRATIO:15.2 DIS:6.0821 
18.9 | CRIM:0.17004 ZN:12.5 B:386.71 LSTAT:17.1 AGE:85.9 TAX:311.0 RAD:5.0 CHAS:0.0 NOX:0.524 RM:6.004 INDUS:7.87 PTRATIO:15.2 DIS:6.5921 
```
```
vw boston.data.vw --readable_model --invert_hash boston.model
```
`housing_model`:
```
Version 7.3.0
Min label:0.000000
Max label:50.000000
bits:18
0 pairs:
0 triples:
rank:0
lda:0
0 ngram:
0 skip:
options:
:0
^AGE:0.013058
^B:0.013684
^CHAS:3.058681
^CRIM:-0.047248
^DIS:0.385468
^INDUS:-0.052709
^LSTAT:-0.165589
^NOX:3.014200
^PTRATIO:0.124905
^RAD:-0.072961
^RM:0.713633
^TAX:-0.000079
^ZN:0.054472
Constant:4.484257
```
```
^AGE:104042:28.8:0.0188122@1.18149e+09
```
^AGE is the feature name, 104042 is the hashed value for the feature, 28.8 is the value of that feature for this example, 0.0188122 is the weight we have learned thus far for the feature, and 1.1849e9 is a sum of gradients squared over the feature values, which is used for adjusting the weight on the feature.
