# ludwig-demo

This notebook is a quick demo demonstrating the basic flow for training a machine learning model using Ludwig.

## getting the data

For the purposes of this demo we will use the [Wine Reviews](https://www.kaggle.com/zynicide/wine-reviews) dataset on Kaggle.

In [1]:
# !pip install kaggle

In [9]:
!../scripts/download_data.sh

wine-reviews.zip: Skipping, found more recently modified local copy (use --force to force download)
Archive:  wine-reviews.zip
  inflating: /mnt/wine-reviews/winemag-data-130k-v2.csv  
  inflating: /mnt/wine-reviews/winemag-data-130k-v2.json  
  inflating: /mnt/wine-reviews/winemag-data_first150k.csv  


The wine reviews dataset contains two fields of interest for this demo. The first is the `description`, a textual review of the wine. The second is the `points`, a value between 0 and 100. In this demo we will feed the contents of the `description` field to the model as the input, and train it to return the `points` it received as the output.

In [4]:
!head -n 2 /mnt/wine-reviews/winemag-data_first150k.csv

,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,US,"This tremendous 100% varietal wine hails from Oakville and was aged over three years in oak. Juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. Balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. Enjoy 2022–2030.",Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz


## model configuration

Models in Ludwig are configured using a configuration document in YAML syntax. The optimization options get to be pretty deep, and we won't cover them here. We'll just use the default paramaters and model architecture for our given problem.

At runtime, Ludwig figures out what model architecture to use by looking at the combination of input and output features stated in the configuration file. This abstracts away model design from the user.

In [10]:
!mkdir ../datasets/
!mkdir ../datasets/wine_reviews/

In [11]:
%%writefile ../datasets/wine_reviews/cfg.yaml
input_features:
    -
        name: description
        type: text

output_features:
    -
        name: points
        type: numerical

Writing ../datasets/wine_reviews/cfg.yaml


And now we can train.

In [12]:
!ludwig train \
    --experiment_name "wine_reviews_initial_0_experiment" \
    --model_name "wine_reviews_initial_0_model" \
    --config_file "../datasets/wine_reviews/cfg.yaml" \
    --dataset "/mnt/wine-reviews/winemag-data_first150k.csv"

2020-12-17 02:42:13.889460: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
███████████████████████
█ █ █ █  ▜█ █ █ █ █   █
█ █ █ █ █ █ █ █ █ █ ███
█ █   █ █ █ █ █ █ █ ▌ █
█ █████ █ █ █ █ █ █ █ █
█     █  ▟█     █ █   █
███████████████████████
ludwig v0.3.1 - Train

2020-12-17 02:42:14.730667: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-12-17 02:42:14.789719: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-17 02:42:14.790584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:00:1e.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2020-12-17 02:42:14.790622: 

Ludwig writes a lot of files to disk: model checkpoints, training summaries, `tfrecords` files, etcetera.

In [9]:
%ls /spell/results/wine_reviews_initial_0_experiment_wine_reviews_initial_0_model/model/

checkpoint                         model_weights.index
[0m[01;34mlogs[0m/                              [01;34mtraining_checkpoints[0m/
model_hyperparameters.json         training_progress.json
model_weights.data-00000-of-00001  training_set_metadata.json


In [17]:
%ls /spell/results/wine_reviews_initial_0_experiment_wine_reviews_initial_0_model/model/logs/

[0m[01;34mtest[0m/  [01;34mtraining[0m/  [01;34mvalidation[0m/


See also our the blog post, [An introduction to AutoML with Ludwig](https://spell.ml/blog/an-introduction-to-automl-with-ludwig-X_OSWhAAACMA6eYD).