# Fwumious Wabbit demo

 to learn more about Fwumious Wabbit go [here]( https://github.com/outbrain/fwumious_wabbit)

## Wabbit showdown: Fwumious vs. Vowpal


this benchmark demonstrates the potential gains from using Fwumious Wabbit as a drop-in replacement for Vowpal Wabbit.

depending on your dataset and system, you can get a significant performance boost by using FW.

not only is it faster per task, it's also more efficient in CPU utilization - which means we can run more concurrent training tasks. this improves our AutoML infrastructure's throughput measured in model evaluations per hour.


having a better througput lets our data scientists get more work done - resulting in better models.

you can read about the speedups [here]( https://github.com/outbrain/fwumious_wabbit/blob/master/SPEED.md)

### system information

In [12]:
!python print_system_info.py

Physical cores: 4
Total cores: 8
Current Frequency: 2900.00Mhz
System: Darwin
Release: 19.6.0
Version: Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
Machine: x86_64
Processor: i386


### the task: Eat-rate prediction


We generate a dataset containing 5,000,000 records documenting animal-food encounters.

each record is made of animal, food and outcome - either the animal eats the food (1) or it doesn't eat it (-1)

Namespace A will be Animal, and B will be Food.

Each animal has its latent type - Herbivore or Carnivore.

Each food has its latent type - Plant or Meat.

We use feature names like "Herbivore-13" and "Plant-55". for vw and fw, these are just strings. But they give us humans a hint as to the expected outcome (herbivores like plants, and carnivores like meat...)

### simple logistic regression showdown



#### genereate dataset

In [18]:
!python cleanup.py

In [19]:
!python generate.py

review the first 10 lines of the generated training data

In [20]:
!head -n 10 train.vw

-1 |A Herbivore-5 |B Meat-0
-1 |A Carnivore-3 |B Plant-0
1 |A Carnivore-0 |B Meat-2
1 |A Carnivore-3 |B Meat-1
1 |A Herbivore-0 |B Plant-2
1 |A Herbivore-0 |B Plant-0
1 |A Herbivore-4 |B Plant-1
1 |A Herbivore-1 |B Plant-3
-1 |A Herbivore-3 |B Meat-2
-1 |A Herbivore-0 |B Meat-3


In [21]:
!gzip train.vw

#### measure vw training time - not reading from cache (cache file is created)

*note that in order for the model to learn successfully - we use feature interactions. if we replace "--interactions AB" with "--keep A --keep B" the model won't improve with training.*

In [22]:
!python clean_caches.py

In [23]:
!(time vw --data train.vw.gz -l 0.1 -b 25 -c --adaptive --sgd --loss_function logistic --link logistic --power_t 0.0 --l2 0.0 --hash all --final_regressor vw_model --save_resume --interactions AB) 2>&1 >/dev/null  | tail -n 1 | awk -F" " '{ print $13" "$14" "$15" "$16" "$17" "$18" "$19" "$20}'

6.30s user 6.54s system 162% cpu 7.919 total


#### measure fw training time - not reading from cache (cache file is created)

In [24]:
!(time ./fw --data train.vw.gz -l 0.1 -b 25 -c --adaptive --fastmath --sgd --loss_function logistic --link logistic --power_t 0.0 --l2 0.0 --hash all --final_regressor fw_model --save_resume --interactions AB) 2>&1 >/dev/null  | tail -n 1 | awk -F" " '{ print $13" "$14" "$15" "$16" "$17" "$18" "$19" "$20}'

0.77s user 0.20s system 93% cpu 1.030 total


#### now repeat the training for vw and fw - this time using the cache:

In [25]:
!(time vw --data train.vw.gz -l 0.1 -b 25 -c --adaptive --sgd --loss_function logistic --link logistic --power_t 0.0 --l2 0.0 --hash all --interactions AB) 2>&1 >/dev/null  | tail -n 1 | awk -F" " '{ print $13" "$14" "$15" "$16" "$17" "$18" "$19" "$20}'

6.12s user 6.83s system 164% cpu 7.863 total


In [26]:
!(time ./fw --data train.vw.gz -l 0.1 -b 25 -c --adaptive --sgd --loss_function logistic --link logistic --power_t 0.0 --l2 0.0 --hash all --interactions AB) 2>&1 >/dev/null  | tail -n 1 | awk -F" " '{ print $13" "$14" "$15" "$16" "$17" "$18" "$19" "$20}'

user 0.07s system 99% cpu 0.844 total 


#### predict using the trained model, and write predictions to output file (not using cache):

reading from easy.vw - a file with 5M records.

first using vw, then using fw:

In [37]:
!(time vw --data easy.vw -t -b 25 -p vw_preds.out --link logistic --initial_regressor vw_model --hash all --interactions AB) 2>&1 >/dev/null  | tail -n 1 | awk -F" " '{ print $13" "$14" "$15" "$16" "$17" "$18" "$19" "$20}'

user 27.96s system 151% cpu 29.323 total 


In [39]:
!(time ./fw --data easy.vw --sgd --adaptive -t -b 25 -p fw_preds.out --initial_regressor fw_model --link logistic --hash all --interactions AB) 2>&1 >/dev/null  | tail -n 1 | awk -F" " '{ print $14" "$15" "$16" "$17" "$18" "$19" "$20" "$21}'

user 0.20s system 98% cpu 2.022 total 


#### evaluate predictions